Richard Suchenwirth 2013-11-30 - Another chapter in the
dis2asm saga: The Tcl compiler converts
proc bodies into
bytecode. With tcl::unsupported::
disassemble we can inspect the generated code in "dis"
assembler notation. With tcl::unsupported::
assemble we can convert a (somewhat different) assembler notation "
TAL" to bytecode again. The job of
dis2asm is to convert a string in "dis" to another string in TAL so the original proc works equally well.
Looking at the
dis2asm output, we sometimes notice that it produces code that could be better - in other words, "optimized": doing the same job in less bytecodes, and hence (at least marginally) less time. Examples:
push {}
pop
is an absolutely redundant piece of code: first something is pushed on the stack, and then immediately popped off again. No effect, but 3 bytes in bytecode, and just a little more time needed to run. Or, the second and third line of
jump Lxx
label Lyy;
jump Lzz
are also redundant: because the code above jumps away, Lyy can only be reached by code that explicitly jumps there - only to be redirected to Lzz again. It might as well have directly jumped to Lzz.
The code shown on this page deals with such issues. It is a postprocessor which converts
dis2asm output to another string in the same TAL language, but optimized where possible. To test it, I have extended the
aproc wrapper to accept an -o flag and if present, to run the optimizer on the TAL output:
proc aproc {name argl body args} {
proc $name $argl $body
set res [disasm proc $name]
if {"-x" in $args} {
set res [list proc $name $argl [list asm [dis2asm $res]]]
if {"-o" in $args} {set res [optimize $res]}
eval $res
}
return $res
}
The
optimize proc is a little longer than that, even though it currently just handles the few cases discussed above (and below). It splits the TAL input into a list of lines and iterates over them with
for, so it can also operate on other than the current line. Lines considered redundant are first marked with the prefix "#o", and removed after one pass, so that indexes don't get confused.
In testing conditions, the previous and current instructions are put together in a string - the "peephole". I think e.g. "push pop" is quite self-documenting for a peephole condition... :^)
proc optimize tal {
set last ""
set last2 ""
set lines [split $tal \n]
for {set i 0} {$i < [llength $lines]} {incr i} {
set instr [regexp -inline {[A-Za-z0-9_]+} [lindex $lines $i]]
if {"$last $instr" eq "push pop"} {
lset lines $i-1 #o[lindex $lines $i-1] ;# mark for deletion
lset lines $i #o[lindex $lines $i]
} elseif {"$last $instr" eq "jump jump"} { ;# unreachable jump
lset lines $i #o[lindex $lines $i]
}
set last $instr
}
while 1 {
set tmp {} ;# remove marked lines
foreach line $lines {if ![string match #o* $line] {lappend tmp $line}}
set lines $tmp
set found 0
for {set i 0} {$i < [llength $lines]} {incr i} {
set instr [regexp -inline {[A-Za-z0-9_]+} [lindex $lines $i]]
if {"$last2 $last $instr" eq "jump label jump"} {
set oldTrg [string trimright [lindex $lines $i-1 1] ";"]
set newTrg [lindex $lines $i 1]
lset lines $i-1 #o[lindex $lines $i-1] ;# mark for deletion
lset lines $i #o[lindex $lines $i]
set found 1
break
}
set last2 $last
set last $instr
}
if $found {
set tmp {} ;# remove marked lines
foreach line $lines {
if [regexp "jump.* $oldTrg " $line] {
set line [string map [list " $oldTrg " " $newTrg "] $line]
}
if ![string match #o* $line] {lappend tmp $line}
}
set lines $tmp
} else break
}
return [join $lines \n]
}
Testing: first unoptimized TAL...
% aproc f x {foreach i {a b} {foreach j $x {puts $i,$j}}} -x
proc f x {asm {
push {a b} ;# (0) push1 0 # "a b"
store 1 ;# (2) storeScalar1 %v1 # temp var 1
pop ;# (4) pop
push -1; store 2; pop ;# (5) foreach_start4 0
label L10;
incrImm 2 +1;load 1;load 2
listIndex;store i;pop
load 1;listLength;lt ;# (10) foreach_step4 0
jumpFalse L63 ;# (15) jumpFalse1 +48 # pc 63
;# (17) startCommand +43 1 # next cmd at pc 60
load x ;# (26) loadScalar1 %v0 # var "x"
store 4 ;# (28) storeScalar1 %v4 # temp var 4
pop ;# (30) pop
push -1; store 5; pop ;# (31) foreach_start4 1
label L36;
incrImm 5 +1;load 4;load 5
listIndex;store j;pop
load 4;listLength;lt ;# (36) foreach_step4 1
jumpFalse L58 ;# (41) jumpFalse1 +17 # pc 58
push puts ;# (43) push1 1 # "puts"
load i ;# (45) loadScalar1 %v3 # var "i"
push , ;# (47) push1 2 # ","
load j ;# (49) loadScalar1 %v6 # var "j"
concat 3 ;# (51) concat1 3
invokeStk 2 ;# (53) invokeStk1 2
pop ;# (55) pop
jump L36 ;# (56) jump1 -20 # pc 36
label L58;
push {} ;# (58) push1 3 # ""
pop ;# (60) pop
jump L10 ;# (61) jump1 -51 # pc 10
label L63;
push {} ;# (63) push1 3 # ""
;# (65) done
label Done;
}}
and now, with the -o switch added, the optimized version:
% aproc f x {foreach i {a b} {foreach j $x {puts $i,$j}}} -x -o
proc f x {asm {
push {a b} ;# (0) push1 0 # "a b"
store 1 ;# (2) storeScalar1 %v1 # temp var 1
pop ;# (4) pop
push -1; store 2; pop ;# (5) foreach_start4 0
label L10;
incrImm 2 +1;load 1;load 2
listIndex;store i;pop
load 1;listLength;lt ;# (10) foreach_step4 0
jumpFalse L63 ;# (15) jumpFalse1 +48 # pc 63
;# (17) startCommand +43 1 # next cmd at pc 60
load x ;# (26) loadScalar1 %v0 # var "x"
store 4 ;# (28) storeScalar1 %v4 # temp var 4
pop ;# (30) pop
push -1; store 5; pop ;# (31) foreach_start4 1
label L36;
incrImm 5 +1;load 4;load 5
listIndex;store j;pop
load 4;listLength;lt ;# (36) foreach_step4 1
jumpFalse L10 ;# (41) jumpFalse1 +17 # pc 58
push puts ;# (43) push1 1 # "puts"
load i ;# (45) loadScalar1 %v3 # var "i"
push , ;# (47) push1 2 # ","
load j ;# (49) loadScalar1 %v6 # var "j"
concat 3 ;# (51) concat1 3
invokeStk 2 ;# (53) invokeStk1 2
pop ;# (55) pop
jump L36 ;# (56) jump1 -20 # pc 36
label L63;
push {} ;# (63) push1 3 # ""
;# (65) done
label Done;
}}
Lines marked 58 to 61 are gone, line 41 now jumps directly to L10... but does it still work as before?
% f {0 1 2}
a,0
a,1
a,2
b,0
b,1
b,2
Another potential for optimization appears in the following test:
% aproc f x {foreach i $x {if {$i eq "b"} continue;puts $i}} -x
proc f x {asm {
load x ;# (0) loadScalar1 %v0 # var "x"
store 1 ;# (2) storeScalar1 %v1 # temp var 1
pop ;# (4) pop
push -1; store 2; pop ;# (5) foreach_start4 0
label L10;
incrImm 2 +1;load 1;load 2
listIndex;store i;pop
load 1;listLength;lt ;# (10) foreach_step4 0
jumpFalse L61 ;# (15) jumpFalse1 +46 # pc 61
;# (17) startCommand +34 1 # next cmd at pc 51
load i ;# (26) loadScalar1 %v3 # var "i"
push b ;# (28) push1 0 # "b"
streq ;# (30) streq
jumpFalse L49 ;# (31) jumpFalse1 +18 # pc 49
;# (33) startCommand +14 1 # next cmd at pc 47
jump L10 ;# (42) jump4 -32 # pc 10
jump L51 ;# (47) jump1 +4 # pc 51
label L49;
push {} ;# (49) push1 1 # ""
label L51;
pop ;# (51) pop
push puts ;# (52) push1 2 # "puts"
load i ;# (54) loadScalar1 %v3 # var "i"
invokeStk 2 ;# (56) invokeStk1 2
pop ;# (58) pop
jump L10 ;# (59) jump1 -49 # pc 10
label L61;
push {} ;# (61) push1 1 # ""
;# (63) done
label Done;
}}
Quite a mouthful, from a one-liner
proc... Notice the lines marked (42) and (47), the latter of which is dead code that cannot be reached. I have added code to handle that situation in the
optimize proc above. Another two bytes saved... and there is more. We removed the "jump L51" in (47). Since no other instruction jumps to L51, we can remove the "label L51;" line - and then we have another "push pop" silliness to get rid of as well.
In fact, the whole rats' nest of
jumpFalse L49 ;# (31) jumpFalse1 +18 # pc 49
;# (33) startCommand +14 1 # next cmd at pc 47
jump L10 ;# (42) jump4 -32 # pc 10
jump L51 ;# (47) jump1 +4 # pc 51
label L49;
push {} ;# (49) push1 1 # ""
label L51;
pop ;# (51) pop
can be simplified (tested to work, by manual editing) to just
jumpTrue L10
Yet another test:
% aproc f x {if {$x%2} {set res odd} else {set res even};set res} -x
proc f x {asm {
load x ;# (0) loadScalar1 %v0 # var "x"
push 2 ;# (2) push1 0 # "2"
mod ;# (4) mod
jumpFalse L13 ;# (5) jumpFalse1 +8 # pc 13
push odd ;# (7) push1 1 # "odd"
store res ;# (9) storeScalar1 %v1 # var "res"
jump L17 ;# (11) jump1 +6 # pc 17
label L13;
push even ;# (13) push1 2 # "even"
store res ;# (15) storeScalar1 %v1 # var "res"
label L17;
pop ;# (17) pop
load res ;# (18) loadScalar1 %v1 # var "res"
;# (20) done
}}
The sequence
jumpFalse L13 ;# (5) jumpFalse1 +8 # pc 13
push odd ;# (7) push1 1 # "odd"
store res ;# (9) storeScalar1 %v1 # var "res"
jump L17 ;# (11) jump1 +6 # pc 17
label L13;
push even ;# (13) push1 2 # "even"
label L15; #added by me
store res ;# (15) storeScalar1 %v1 # var "res"
label L17;
pop ;# (17) pop
load res ;# (18) loadScalar1 %v1 # var "res"
contains a few redundancies:
- instead of "store res;jump L17", jump a bit shorter to L15 and reuse the "store res" there;
- the same variable res, that we last stored to, has its value popped and freshly loaded in (17) and (18)
- as res is used nowhere else, it is redundant itself
So the following code snippet should be equivalent:
jumpFalse L13 ;# (5) jumpFalse1 +8 # pc 13
push odd ;# (7) push1 1 # "odd"
jump Done ;# (11) jump1 +6 # pc 17
label L13;
push even ;# (13) push1 2 # "even"
label Done;
Tcl itself compiles this way, when explicit
returns are used:
% aproc f x {if {$x%2} {return odd} else {return even}} -x
proc f x {asm {
load x ;# (0) loadScalar1 %v0 # var "x"
push 2 ;# (2) push1 0 # "2"
mod ;# (4) mod
jumpFalse L12 ;# (5) jumpFalse1 +7 # pc 12
push odd ;# (7) push1 1 # "odd"
jump Done ;# (9) done
;# (10) nop
;# (11) nop
label L12;
push even ;# (12) push1 2 # "even"
;# (14) done
;# (15) done
label Done;
}}
For comparison, another variation from
expr comes close to my Tcl's and my solution - except that it tries to convert the result to numeric, although static code analysis can prove before assembly that there are only two possible outputs, and neither of them is numeric:
% aproc f x {expr {$x%2 ? "odd" : "even"}} -x
proc f x {asm {
load x ;# (0) loadScalar1 %v0 # var "x"
push 2 ;# (2) push1 0 # "2"
mod ;# (4) mod
jumpFalse L11 ;# (5) jumpFalse1 +6 # pc 11
push odd ;# (7) push1 1 # "odd"
jump L13 ;# (9) jump1 +4 # pc 13
label L11;
push even ;# (11) push1 2 # "even"
label L13;
tryCvtToNumeric ;# (13) tryCvtToNumeric
;# (14) done
}}