Updated 2014-12-20 16:48:42 by pooryorick

The Tcl virtual machine executes TAL bytecode.

See Also  edit

bytecode
the instruction set targeted by TAL
TAL
the assembly language for the Tcl virtual machine
TEBC
::tcl::unsupported::assemble
new in 8.5
::tcl::unsupported::disassemble
new in 8.5
Threaded Code Tcl VM
an experimental prototype for a Tcl VM that uses threaded code

Description  edit

The Tcl bytecode compiler does lazy compilation. The compiled object has a component called the literal table in which it stores string literals, including the bodies of things like foreach and if. I'm not yet sure how, but it would seem that when the foreach is evaluated, it can compile its body from the literal table entry, and then substitute the resultant compiled object into the literal table of the compiled object in which it occurs. I consider this quite funky.

RHS: This isn't quite the way it works, if I understand things correctly. When you bytecompile a script, it stores things like the if/while/catch args in the literal table. When you bytecompile something as a proc, it does the conversion to actually bytecompile the if/while/catch/etc commands. There are two different bytecompile commands in C. I forget the name of the generic bytecode command (its something like BytecodeFromAny)... but the one that compiles a proc is TclProcCompileProc.

There's recently been some discussion about using Parrot for Tcl, and it seems as well that we should start to gather some discussion about what we've got before we worry too much about Parrot.

Someone has started writing a Tcl interpreter in parrot. It's got long ways to go still. http://svn.perl.org/parrot/trunk/languages/tcl/

To kick this off, I've put up some tools and toys I've been using to explore the Tcl virtual machine. It can be found here [1]

CMCc: I've made a new release incorporating AK's suggestions and mods (for the most part.)

I think that something like tclVM is necessarily version-dependent (by necessity it has to use internal/undocumented interfaces.)

AK: I should mention it, my platform is Linux.

AK: Got me the tclVM tools. I found one problem. My tcl8.3.so does not have a public variable 'tclInstructionTable'. This prevents tclVM.so from loading. I did the following changes to get it loading:

  • renamed the variable (slashed off the 'tcl' prefix).
  • Added code to 'tcl_InstTable' to initialize the now internal variable, using the private tcl API 'TclGetInstructionTable ()'.

Notice I said loading. It does crash somewhere during execution. ... Hm, got a NULL-pointer. Hm, maybe this is 8.4 specific. Will have to play more. ...

AK: Ok, in my system the variable in the tcl library is 'instructionTable' ... And the moment the code tries to initializes its data for the second instruction it dumps core.

AK: Running it against 8.4, using the original sources ... compile is ok, load is ok, the result of the command 'instTable' looks ok. However the result of
 disasm {set x [clock seconds]}

is empty. Ditto for 'literals'. And running 'tclVM.tcl' still dumps core, but now in a way which suggests memory problems.

So, this extension is geared towards 8.4., and there seem to have been changes which prevents usage with 8.3., especially as it contains copies of internal tcl headers, like 'tclCompile.h'. ... Yes, the definition of the internal structure 'InstructionDesc' changed between 8.3 and 8.4.

AK: Crash - Yep, using tcl core with memory validation (CVS head), running 'tclVM.tcl', and seeing a high guard failure. No time to debug this, sorry.

... Removed reference to 'tclByteCodeType', and changed return value of disasm from 'string' to 'bytearry'. No crashing anymore after that. The crash was possibly due to misinterpretation of the bytecode as UTF8.

It consists of the following commands:
compile string
takes a string, representing a tcl expression, and returns a Tcl_Obj of the compiled object type.
disasm compiled_obj
returns the bytecode component of a compiled object.
literals compiled_obj
returns the literal table of a compiled object
instTable
returns a list of the names of all opcodes and their operands, in opcode order.

Each opcode has an entry in the list as follows: {opcode name, number of bytes in the opcode, stack effect of opcode, number of operands, optype, ...}

Each optype is one of:

  • none
  • int1 - one byte signed integer
  • int4 - four byte signed integer
  • uint1 - one byte unsigned integer
  • uint4 - four byte unsigned integer

escargo 2002-11-13:

Will there eventually be eight-byte signed and unsigned integers?

escargo 2003-03-11:

I can see where these might be necessary for dealing with the new large file systems.

AMG: Tcl versions 8.5 and beyond incorporate the bigint routines from LibTomMath for arbitrary magnitude integers.

JJM: Fixed/adjusted several things in the sources at sourceforge to allow it to work on Windows. Sent changes to author.

CMCc: I don't use Windows, and I don't know anything much about Windows (just enough to avoid it :) ... I'm not sure what to do with your mods. Seem to be a lot of files there. If you like, I'll put your mods up in the file section on SourceForge?

JJM: Yeah, putting the new files on SourceForge would be fine. The changes are actually universal and should work on any platform. Instead of relying on private internal Tcl variables, it now uses documented Tcl C API calls. I'll send you the new(er) version with the correction that AK mentions above as well. I tested it and not only is it more "correct", but a lot faster.