ebc ++ 1 2 3execution of the script "++" should sum its three arguments (1+(2+3)), and return 6.Here's the "bytecode engine" (ebc: execute byte code), which retrieves the implementations of bytecodes from the global array cmd:
proc ebc {code argl} { set ::S $argl foreach opcode [split $code ""] { eval $::cmd($opcode) } set ::S }Let's now populate the bytecode collection. The set of all defined bytecodes will be the alphabet of this little RPN language. It may be interesting to note that this language has truly minimal syntax - the only rule is: each script ("word") composed of any number of bytecodes is well-formed. It just remains to check whether it does what we want.Binary expr operators can be treated generically:
foreach op {+ - * /} { set cmd($op) [string map "@ $op" {swap; push [expr {[pop] @ [pop]}]}] } #-- And here's some more hand-crafted bytecode implementations set cmd(d) {push [lindex $::S end]} ;# dup set cmd(q) {push [expr {sqrt([pop])}]} set cmd(^) {push [swap; expr {pow([pop],[pop])}]} set cmd(s) swap #-- The stack routines imply a global stack ::S, for simplicity interp alias {} push {} lappend ::S proc pop {} {K [lindex $::S end] [set ::S [lrange $::S 0 end-1]]} proc K {a b} {set a} proc swap {} {push [pop] [pop]}Instead of enumerating all possible bytecode combinations beforehand (which grows exponentially by alphabet and word length), I use this code from Mapping words to integers to step over their sequence, uniquely indexed by an increasing integer. This is something like the Goedel number of the corresponding code. Note that with this mapping, all valid programs (bytecode sequences) correspond to one unique non-negative integer, and longer programs have higher integers associated:
proc int2word {int alphabet} { set word "" set la [llength $alphabet] while {$int > 0} { incr int -1 set word [lindex $alphabet [expr {$int % $la}]]$word set int [expr {$int/$la}] } set word }Now out for discovery! The toplevel proc takes a paired list of inputs and expected output. It tries in brute force all programs up to the specified maximum Goedel number and returns the first one that complies with all tests:
proc discover0 args { set alphabet [lsort [array names ::cmd]] for {set i 1} {$i<10000} {incr i} { set code [int2word $i $alphabet] set failed 0 foreach {inputs output} $args { catch {ebc $code $inputs} res if {$res != $output} {incr failed; break} } if {!$failed} {return $code} } }But iterating over many words is still pretty slow, at least on my 200 MHz box, and many useless "programs" are tried. For instance, if the test has two inputs and wants one output, the stack balance is -1 (one less out than in). This is provided e.g. by one the binary operators +-*/. But the program "dd" (which just duplicates the top of stack twice) has a stack balance of +2, and hence can never pass the example test. So, on a morning dogwalk, I thought out this strategy:
- measure the stack balance for each bytecode
- iterate once over very many possible programs, computing their stack balance
- partition them (put into distinct subsets) by stack balance
- perform each 'discovery' call only on programs of matching stack balance
proc bc'stack'balance bc { set stack {1 2} ;# a bytecode will consume at most two elements expr {[llength [ebc $bc $stack]]-[llength $stack]} } proc stack'balance code { set res 0 foreach bc [split $code ""] {incr res $::balance($bc)} set res }The partitioning will run for some seconds (depending on nmax - I tried with several ten thousand), but it's needed only once. The size of partitions is further reduced by excluding programs which contain redundant code, that will have no effect, like swapping the stack twice, or swapping before an addition or multiplication. A program without such extravaganzas is shorter and yet does the same job, so it will have been tested earlier anyway.
proc partition'programs nmax { global cmd partitions balance #-- make a table of bytecode stack balances set alphabet [array names cmd] foreach bc $alphabet { set balance($bc) [bc'stack'balance $bc] } array unset partitions ;# for repeated sourcing for {set i 1} {$i<=$nmax} {incr i} { set program [int2word $i $alphabet] #-- "peephole optimizer" - suppress code with redundancies set ok 1 foreach sequence {ss s+ s*} { if {[string first $sequence $program]>=0} {set ok 0} } if {$ok} { lappend partitions([stack'balance $program]) $program } } set program ;# see how far we got }The discoverer, Second Edition, determines the stack balance of the first text, and tests only those programs of the same partition:
proc discover args { global partitions foreach {in out} $args break set balance [expr {[llength $out]-[llength $in]}] foreach code $partitions($balance) { set failed 0 foreach {input output} $args { catch {ebc $code $input} res if {$res != $output} {incr failed; break} } if {!$failed} {return $code} } }But now for the trying. The partitioning helps very much in reducing the number of candidates. For the 1000 programs with Goedel numbers 1..1000, it retains only a fraction for each stack balance:
- -2: 75
- -1: 155 (this and 0 will be the most frequently used)
- 0: 241
- 1: 274
- 2: 155
- 3: 100
% discover 5 6 7 8 dd/+Not bad: duplicate the number twice, divide by itself to get the constant 1, and add that to the original number. However, it fails to work if we add the successor of 0 as another test case:
% discover 5 6 7 8 0 1Nothing coming - because zero division made the last test fail. If we give only this test, another solution is found:
% discover 0 1 d^"Take x to the x-th" power" - pow(0,0) gives indeed 1, but that's not the generic successor function.More experiments to discover the hypot() function:
% discover {4 3} 5 d/+Hm - the 3 is duplicated, divided by itself (=1), which is added to 4. Try to swap the inputs:
% discover {3 4} 5 q+Another dirty trick: get square root of 4, add to 3 - presto, 5. The correct hypot() function would be
d*sd*+qbut my program set (nmax=30000) ends at 5-byte codes, so even by giving another test to force discovery of the real thing, it would never reach a 7-byte code. OK, I bite the bullet, set nmax to 500000, wait 5 minutes for the partitioning, and then:
% discover {3 4} 5 {11 60} 61 sd/+Hm.. cheap trick again - it was discovered that the solution is just the successor of the second argument. Like in real life, test cases have to be carefully chosen. So I tried with another a^2+b^2=c^2 set, and HEUREKA! (after 286 seconds):
% discover {3 4} 5 {8 15} 17 d*sd*+qAfter partitioning, 54005 programs had the -1 stack balance, and the correct result was on position 48393 in that list...And finally, with the half-million set of programs, here's a solution for the successor function too:
% discover 0 1 4711 4712 ddd-^+"d-" subtracts top of stack from itself, pushing 0; the second duplicate to the 0-th power gives 1, which is added to the original argument. After some head-scratching, I find it plausible, and possibly it is even the simplest possible solution, given the poorness of this RPN language.
aa suggests that the simplest solution is
dd/+RS: Yes, but as discussed above, it doesn't work for the successor of 0 (zero division)...
DKF: If you really want a challenge, add something that can operate like a while loop... j
Lessons learned:
- Brute force is simple, but may demand very much patience (or faster hardware)
- The sky, not the skull is the limit what all we can do with Tcl :)
RHS: I added the following operators:
- <, > left, right shift by 1
- &, | logical and, or
- ~ logical not (unary)
- Calculate if a number is odd: dd*&
- Calculate is a number is even: d><s/ (I would guess it would have found dd*&~ eventually too :)
Larry Smith 2005-03-05 - Now, if you were to add a genetic algorithm to replace the brute-force generator, you'd have a very interesting little tool to find algorithms. I've often wondered if such a program could find a reverse algorithm for crypto trapdoor schemes - e.g. could it use the input and output of a known crypto algorithm to come up with a code breaker?