From bytecodes to wordcodes (or (4-byte)codes ?)The idea is to have instructions coded in 4-byte (or machine words?) taking 4-byte operands; today, the instructions are coded in 1 byte, operands may be 1 or 4 bytes long.Why
- Bytecodes(BC) are for internal consumption: they are typically run by the same Tcl installation that compiled them. It makes sense to use the best internal representation for that installation. If import/export functionality (tclPro) is desired for the BC, it is that interface that should translate to/from a portable representation (and re-introduce space savings if the internal representation is bloated)
- The actual bytecodes have probably a very small memory footprint compared to everything else that is going on in Tcl (I'm guessing!), no measurable negative impact to be expected in this sense
- The BC engine is (slightly) more complex: there are many duplicate instructions, to deal with cases where an operand is represented in 1 or 4 bytes
- Wordcodes allow different (and faster) instruction scheduling mechanisms (direct jumps from instruction end to the start of the next instruction): with runtime table lookups (msvc), or even without (gcc)
- BC are run-time expensive: there is a lot of byte-to-word 0-padding going on at runtime
- Speed tradeoff? More memory traffic, alignment effects (?)
- First the easy one: make all operands wordsized. Slight retouches to the compiler, it is mainly data and parameters that need changes.
- Abstract away the internal representation: need to replace 'unsigned char *pc' declarations with a macro, the Bytecode structure has to be abstracted too. Changes in many many places ...
Inlined procs and TAL (Tcl assembly language, see Parsing, Bytecodes and Execution, The anatomy of a bytecoded command, and [1])These two are not the same, but are closely related.Here the idea is to be able to write portable code that can be optimised to run much faster than plain Tcl (tcllib could be an interesting field of application).These constructs would be used only in time-critical parts of a program. Inline code does not only save the function call overhead, but is an enabler for some TAL constructs that could otherwise not be used. As an example, consider (again ...) Donal's trick with the K combinator
set lst [lreplace [K $lst [set lst {}]] $idx $idx $newVal]One would like to be able to define a proc Kset such that
set lst [lreplace [Kset lst] $idx $idx $newval]produces the same effect of the above. But this cannot be done without recurring to uplevel/upvar (which are slow mechanisms)
proc Kset {lst} { upvar $lst lst0 set lst $lst0 set lst0 {} set lst }A possible TAL syntax for Kset I have in mind would be (pseudocode!)
proc Kset {lst} { upvar $lst lst0 TAL::NOOP $lst0 TAL::POP [set lst0 {}] }where TAL::NOOP does nothing (its arguments stay on the stack), and TAL::POP removes its arguments from the stack; so the original value of lst0 stays on the stacktop and is the return value. But upvar is still there ...So, the real winner would be to be able to insert those TAL codes directly into the bytecodes of the calling proc, but without requiring the programmer to do it by hand (or even know that it is happening) - hence the 'need' for inlined procs.Assorted remarks
- This looks like it can be implemented as an extension: it just requires the definition of new commands (TAL::inlineProc, TAL::NOOP, ...) that have a compile procedure - the tough one is the compile procedure for TAL::inlineProc, the others are probably simple
- TAL does not need to, and probably can't really, expose every opcode of the internal engine; actually, the TAL pseudo-engine doesn't even need to be the same as the internal engine
- In order for TAL to be really useful and fast, it may need some redesign of the engine (see other project below)
- The tough part of this: deciding what can be inlined, and what can't. Inlining makes a mess of scoping rules and stack traces ...
A lower level engine with a hybrid stackThe bytecode engine could use some refactorisation to eliminate duplicated code - same goes for some of the non-engine internal procedures (I have tclVar.c in my mind, but there may be others). The changes also involve defining new lower level operations that would be useful/necessary for TAL programming.In order to do this, the Tcl stack should be hybridised (polymorphised?). Currently it is a stack of (Tcl_Obj *) only (these are the values that Tcl variables can take); I am thinking in terms of also allowing other pointers on the Tcl stack, maybe even integers ...To illustrate the idea: I'm thinking of being able to decompose the different INST_STORE instructions into
(a) get a pointer to the Var structure (different addressing possibilities here) (b) do whatever you have to do to the variable and its value (generic code) (c) possibly different cleanups depending on the addressing modeThis would allow to hold on to the pointer to a variable (TAL code, or compiled code) and avoid repeated lookups.Some categories of primitive instructions I am thinking of (this does look like reprogramming Tcl in Forth, doesn't it?):
- Stack management - push, pop, dup, drop, swap, rot, peek, poke
- Reference counts - incrRefCt, decrRefCt, getRefCt, isShared, dupObject
- Scope management - (deal with the Tcl call stack, namespaces, ...)
- Tcl_Obj ops - get & store (with append/lappend/replace flags), math-ops, ...
- Var/Cmd ops - link, unlink, getVarPtr, getLinkVarPtr, getCmdPtr, getLinkCmdPtr, callTraces
- Flow control - jumps (as today), foreach, return, error, catch, endCatch, invoke, eval, gosub
- Integers (??) - integer basic arithmetic, integer-based flow control
- Higher level ops - essentially, assemblies of the above (inlined as macros)
Non-recursive engine, tail jumps, going stackless, light threads, and similar insanitiesThe way Tcl is implemented is highly recursive at the C level; this poses migration problems when the target only supports small stacks (Palm, for instance).There are actually (1 + 2n) stacks in Tcl, where n is the number of interpreters: the C stack, plus (for each interpreter) the engine's stack of Tcl_Obj and the interpreters callFrame stack. The idea is to get rid of the C stack wherever possible.Presently, the engine generates recursive calls to itself whenever
(i) it invokes a proc (ii) it does INST_EVAL or INST_EXPR (iii) it touches a traced variable (iv) it invokes a C function that invokes the engine, via ''eval'' for instance (v) asynchronous calls are madeIt is relatively easy to avoid these recursive calls in cases (ii) [tested] and (i), by incorporating some code from tclBasic.c and tclProc.c into the engine, creating local variables in an hybrid Tcl stack, and adding a return stack as described below. This technology is insufficient for (iii) and (iv), I haven't yet thought about (v).The tested modification (coded using gcc's labels as values, straight C requires some more work but can be done) goes essentially as follows:
- create a return stack RS in the engine
- detect the points where tou can switch to executing a different bytecode: INST_EVAL, INST_EXPR, INST_INVOKE (if it is a proc's invocation); incorporate there the code that needs to be run before the actual bytecodes are executed; define a returnTarget with the code that needs to be executed on return
- before switching to another bytecode, stash in the return stack the variables you will need on return: codePtr, pc, interp, returnTarget, ...
- modify the exit code of the engine to do an internal return if the RS is not empty, and a real C return otherwise. On internal returns, you restore the saved variables and jump to the returnTarget.
- some more info will need to go into the callFrame stack (maybe RS should also go in here, and not in the Tcl Stack?)
- a call to eval a script/bytecode is now just a scheduler: (a) pushes a callFrame, (b) pushes data into RS, (c) returns to the main executor
- the main executor just executes whatever it finds at the stacktop - asynchronous calls should get there too ...
RS (not Return Stack ;-): I'm amazed by these ideas. I have always missed the ability to define all Tcl commands in Tcl (so far, only for and eval in terms of uplevel) - TAL looks like it may provide the needed primitives. This can turn into a great thing! The TAL engine could grow to be the real core of Tcl, which would learn how to 'set', 'proc', 'foreach' by sourcing TAL code...Re portable representation: how about reusing UTF-8 functionality? Even 4-byte-wide opcodes could start with small numbers, so in "UTF" file storage format 00..7F just take up 1 byte, 80..FFF take two. And: no byte-order problems... (see UTF-8 bit by bit, Unicode and UTF-8)JCW This is great stuff, IMO, Miguel. You may want to dive into Lua [4] - if you haven't already. They moved from a stack machine to a register machine (from 4.0 to 4.1a, IIRC), and reported 15% performance gains. They also always have used 32-bit words (well, on the major platforms, Lua can also go to much smaller architectures). And finally, Lua has always included an assembly-level access (symbolic, of course), which I really liked. It's "hidden" in one of the debug or test libraries - probably to prevent everyone going overboard with this.(JCW again) As for (v), i.e. async calls: how about defining a new Tcl object which has no string representation. It represents a "future", i.e. a value which becomes real at some point in time. When accessing the value in a normal way, the evaluator blocks, waiting for completion. When accessed in some special way, one could peek inside and check whether it is a future, whether it is still in progress, cancel it, etc. One could even turn all of Tcl into a data flow system, or perhaps an ASM (Abstract State Machine), by always returning results from every call this way: "set a [foo $bar]" would start foo, set a to a future representing the result of foo, and then continue processing. The minute the value of a is used, the script would suspend at that particular spot, letting the rest progress, until the value of a is ready. The more practical implication of this, is that async I/O becomes far more transparent, because "set a [read $fd]" could be an async call, with processing continuing as usual.CMcC 20040726: future defines a Tcl_ObjType which has some of the properties suggested by JCW. Just a fun project for an afternoon.Lars H: I quite agree. To avoid using the C stack when it isn't necessary certainly seems like a good thing. As for (v) async calls: are these async as "completion calls from e.g. asyncronuous I/O operations" or more benign events (might happen in any order, but only at an update or similar)? It seems to me that if one avoids using the C stack then the latter kind of things would become more well-behaved, since vwaits wouldn't need to nest in the way that they do today. The futures of JCW is also a very exciting idea, possibly making the language more powerful (in some heavy CS sense). I'm not sure I would like to have all Tcl commands return futures by default, but one could always provide for each core command a futuristic counterpart that would return futures whenever possible, and maybe even use namespace import for choosing which set one wishes to have in the global namespace.Roy Terry: Please forgive a practical plug here. For anyone reading this who is concerned about the overhead of calling procs with upvar, uplevel, global, etc. There is Tmac - a Tcl macro processor package which lets you define proc-like code blocks that are expanded "in-line". You can even generate the code block dynamically using a filter proc. It's always nice to have a fast base language but macros can offer some relief even now.
26jul04 jcw - Not sure what the status of bytecode->wordcode experiments is, but here's another option: both. Put all bytecodes in one vec, all words in another. Advance two independent pointers when running. It uses up one more iterator of which the state must be saved, but both iterators now operate on their native datatype (i.e. *bp++ and *wp++). Jumps need to set them both. That need not be more elaborate than: "wp += *bp++; bp += *bp;". Not sure how effective it is, in terms of on-chip cache effects for example, just wanted to get the idea onto this page.