Updated 2013-11-25 17:57:47 by suchenwi

PURPOSE: Provide a working example for Tcl maintainers attempting to add bytecode compilation of a new Tcl command.

Background: Kevin Kenny has been working on a new lset command that makes surgical modifications to Tcl lists. The target is to get decent performance of operations such as Shuffle a list without needing to resort to mysterious usage such as the K combinator.

The lset command takes the syntax:
lset listVar index ?index...? value

In its simplest form,
lset myList $i $x

it says, "replace the $i'th element of variable myList with the content of variable x.

Thus, if myList contains:
    a b c d

after executing
    lset myList 1 e

the variable will contain
    a e c d

Additional index parameters index into sublists. For example, if myList contains:
    {a b c} {d e f} {g h i}

and we execute
    lset myList 1 1 j

then after the operation, myList will contain
    {a b c} {d j f} {g h i}

It is an error to address outside any list: each index argument must be at least zero and less than the length of the corresponding list.

The first cut at [lset] is to implement it as an object command. The general flow is:

  • A. Look up the variable. If it designates a shared object, duplicate the object. Put the object pointer into two temp storage locations, finalValuePtr and listPtr.
  • B. For each index argument except for the last, locate the corresponding element of the Tcl object designated by listPtr. If it is a shared object, duplicate it and replace the element with the new copy. Set listPtr to designate the element.
  • C. Locate the element in listPtr that the last index argument designates. Replace it with newValue.
  • D. Finally, replace the value of the variable with finalValuePtr from step A.

The C code for the object command is:
 int
 Tcl_LsetObjCmd( clientData, interp, objc, objv )
    ClientData clientData;        /* Not used. */
    Tcl_Interp *interp;                /* Current interpreter. */
    int objc;                        /* Number of arguments. */
    Tcl_Obj *CONST objv[];        /* Argument values. */
 {

    Tcl_Obj* listPtr;                /* Pointer to the list being altered. */
    Tcl_Obj* subListPtr;        /* Pointer to a sublist of the list */
    Tcl_Obj* finalValuePtr;        /* Value finally assigned to the variable */
    int index;                        /* Index of the element being replaced */
    int result;                        /* Result to return from this function */
    int listLen;                /* Length of a list being examined */
    Tcl_Obj** elemPtrs;                /* Pointers to the elements of a
                                   list being examined */
    int i;

    /* Check parameter count */

    if ( objc < 4 ) {
        Tcl_WrongNumArgs( interp, 1, objv, "listVar index ?index...? value" );
        return TCL_ERROR;
    }

    /* Look up the list variable */

    listPtr = Tcl_ObjGetVar2( interp, objv[ 1 ], (Tcl_Obj*) NULL,
                              TCL_LEAVE_ERR_MSG );
    if ( listPtr == NULL ) {
        return TCL_ERROR;
    }

    /* Make sure that the list value is unshared. */

    if ( Tcl_IsShared( listPtr ) ) {
        listPtr = Tcl_DuplicateObj( listPtr );
    }

    finalValuePtr = listPtr;

    /*
    ** If there are multiple 'index' args, handle each arg except the
    ** last by diving into a sublist.
     */

    for ( i = 2; i < objc-2; ++i ) {

        /*
        ** Take apart the list
         */

        result = Tcl_ListObjGetElements( interp, listPtr,
                                         &listLen, &elemPtrs );
        if ( result != TCL_OK ) {
            return result;
        }

        /*
        ** Derive the index of the requested sublist
         */

        result = TclGetIntForIndex( interp, objv[i], (listLen - 1), &index );
        if ( result != TCL_OK ) {
            return result;
        }

        if ( ( index < 0 ) || ( index >= listLen ) ) {

            Tcl_SetObjResult( interp,
                              Tcl_NewStringObj( "list index out of range",
                                                -1 ) );
            return TCL_ERROR;
        }

        /*
        ** Extract the appropriate sublist, and make sure that it
        ** is unshared.
         */

        subListPtr = elemPtrs[ index ];
        if ( Tcl_IsShared( subListPtr ) ) {
            subListPtr = Tcl_DuplicateObj( subListPtr );
            result = Tcl_ListObjSetElement( interp, listPtr, index,
                                            subListPtr );
            if ( result != TCL_OK ) {
                return TCL_ERROR;
            }
        } else {
            Tcl_InvalidateStringRep( listPtr );
        }

        listPtr = subListPtr;
    }

    /* Break apart the innermost sublist */

    result = Tcl_ListObjGetElements( interp, listPtr,
                                     &listLen, &elemPtrs );
    if ( result != TCL_OK ) {
        return result;
    }

    /* Convert the last index */

    result = TclGetIntForIndex( interp, objv[objc-2], (listLen - 1), &index );
    if ( result != TCL_OK ) {
        return result;
    }
    if ( ( index < 0 ) || ( index >= listLen ) ) {
        Tcl_SetObjResult( interp,
                          Tcl_NewStringObj( "list index out of range", -1 ) );
        return TCL_ERROR;
    }

    /* Store the result in the list element */

    result = Tcl_ListObjSetElement( interp, listPtr, index, objv[objc-1] );
    if ( result != TCL_OK ) {
        return result;
    }

    /* Finally, update the variable so that traces fire. */

    listPtr = Tcl_ObjSetVar2( interp, objv[1], NULL, finalValuePtr,
                              TCL_LEAVE_ERR_MSG );
    if ( listPtr == NULL ) {
        return TCL_ERROR;
    }

    return result;

 }

This code is much cheaper (30-50% reduction in CPU time on the Shuffle a list trials) than the corresponding logic using the K combinator, but we can do much better still using the bytecode compiler. The chief reason is that the calls to Tcl_ObjGetVar2 and Tcl_ObjSetVar2 are fairly expensive lookups in a hash table, while local variables in the stack frame can be compiled to lookups that operate in constant time. Let's sketch out what the bytecode will have to look like for
    lset varName ind1 ind2 value

To begin with, all of the parameters will have to go on the stack, except possibly for the variable. The variable name may or may not go on the stack, depending on whether it uses a frame slot.

In order for traces to fire in order, the parameters must be substituted from left to right. Thus far, then, the compilation procedure for lset has to:

  • call TclPushVarName to put the variable name on the stack if needed.
  • call TclCompileTokens for each of the index arguments in turn, so that they are pushed on the stack from left to right.
  • call TclCompileTokens for the value argument, too.
  • Now comes one tricky part. We have to get the value of the variable. We need a new bytecode command, like the index word in Forth, to pull the variable name that we pushed up above. Then we need to emit the appropriate load instruction, which is one of INST_LOAD_SCALAR1, INST_LOAD_SCALAR4, INST_LOAD_SCALAR_STK, INST_LOAD_ARRAY1, INST_LOAD_ARRAY4, INST_LOAD_ARRAY_STK, or INST_LOAD_STK. (Whew!)
  • Now we're able to emit a bytecode instruction for the lset operation itself. This instruction will have a parameter that is the number of index arguments, so that it can work with them off the stack. It will begin by popping the variable value off the stack (reducing its ref count to 1 again if the value is shared), and duplicating the value if necessary. The rest of the code will more or less follow the object command implementation. The stack after this instruction will contain the value that is to be put into varName, and the variable name if the first step pushed it.
  • Finally, we store the result back into the variable by emitting the appropriate store instruction, which is one of INST_STORE_*, where * is SCALAR1, SCALAR4, SCALAR_STK, ARRAY1, ARRAY4, ARRAY_STK or STK.

So it appears that we need a new compilation procedure for the lset command, plus two new bytecode instructions: INST_STKINDEX and INST_LSET.

Jeffrey Hobbs:

How to add a byte-compiled command:

Most examples are found in tclCompCmds.c, which is the best place to start for examples. Files that must be edited to add a byte-compiled command are (this applies for core commands):

  • tclCompile.h: add any instructions you need
  • tclCompile.c: add the description of the instructions (for compiler debugging)
  • tclCompCmds.c: add your command that will byte-compile
  • tclBasic.c: in the list of commands, make sure your byte-compiled command is used
  • tclInt.h: add the prototype of your command for byte-compiling
  • tclExecute.c: add the code for any new instructions.

At this time, there is no way to add a byte-compiled command outside of the core.

DKF - if what you're doing requires a new bytecode to compile, then it is probably never going to be practical to do as an extension since the list of core bytecodes is definitely growing. Mind you, if you can make a decent (and generic) case for it, getting a new bytecode into the core is not hard.

KBK - Donal is right, of course; I wrote this piece primarily as a guide for Core maintainers.

Nevertheless, I think there's an important nugget of truth on this page.

There are only a couple of reasons that a command sees a BIG improvement by being bytecode compiled. One is that it's a control structure, and the gains achieved by compiling its dependent scripts inline are significant. The other, and commoner, reason is that it performs some sort of read-modify-write function on a Tcl variable. Examples include [append], [incr], [lappend], and [lset].

I think there's a generic framework that could be exported for these.

Here's a brief explanation of my thoughts with illustrations in Tcl.

We begin with Donal's K combinator:
proc K { x y } { return $x }

Now consider a procedure, [read_modify_write] that accepts a variable name and a function (actually, an incomplete command). It appends the content of the variable to the command using K to do a destructive read. It evaluates the result, and replaces the variables value by the result.
proc read_modify_write { v f } {
    set cmd1 "K \$[list $v] [list [list set $v {}]]"
    set cmd2 "$f \[$cmd1\]"
    set cmd3 "set [list $v] \[$cmd2\]"
    # Print a trace message to show the command being executed.
    puts stderr "[info level 0] results in [list $cmd3]"
    return [uplevel 1 $cmd3]
}

What's this good for? Let's do [append] using this command:
proc catenate { y x } {
    return $x$y
}
 
proc Append { xVar args } {
    upvar 1 $xVar x
    foreach a $args {
        read_modify_write x [list catenate $a]
    }
    return $x
}

And we could do the same with [incr]:
proc y+ { y x } {
    return [expr { $x + $y }]
}
 
proc Incr { xVar { delta 1 } } {
    upvar 1 $xVar x
    read_modify_write x [list y+ $delta]
}

Or let's do an [Lpop] command that's something like a simplified [lvarpop]: it pulls an element off the front of a list.
proc delete_first { x } {
    return [lrange $x 1 end]
}
 
proc Lpop { listVar } {
    upvar 1 $listVar q
    set retval [lindex $q 0]
    read_modify_write q delete_first
    return $retval
}

Now presume an extension that wanted to bytecode [Lpop]. If we think of [Lpop] as being defined in terms of [delete_first], then the extension can actually bytecode it without introducing any new bytecodes -- assuming that we add Forth-like "exch" and "roll" to the engine. For example, if x is not a local variable, the code burst would look like:
    push1        "x"    # Get x's name
    over         0      # Duplicate the name
    loadScalarStk       # Get x's value
    dup
    push1        "0"
    lindexMulti  1      # Extract first element.
                        # at this point stack has retval, $x, "x"
    exch                # now it's $x, retval, "x"
    push1        "1"
    push1        "delete_first"
    invokeStk1   2      # Call 'delete_first' with objc==2
                        # Now the stack has new_x, retval, "x"
    roll         3      # "x", new_x, retval
    exch                # new_x, "x", retval
    storeScalarStk      # new_x, retval
    pop                 # retval

Now the [delete_first] command has to assume that nobody calls it from outside the [Lpop] procedure: it probably suffices to hide it inside a package. On entry, it knows that the refcount of objv[1] is at least 2: one reference for objv itself, and one reference for the variable. It panics if this condition does not hold, and otherwise decrements the refcount. On return, it increments the count again. In between, Tcl_IsShared gives the correct information.

There's no doubt plenty wrong with this idea, but I think that it could be made to work -- and provide a very generic framework for a certain flavor of bytecoded extension.

AK From discussion on the chat it becomes clear that the framework discussed here would actually be a public API to the compilation environment used by compilation procedures. This would allow extensions to emit bytecodes for their own commands, provided that do not try to declare bytecodes beyond the ones defined in the core. This API would essentially be a low-level bytecode assembler. This meshes well with MS's bytecode engine ideas where he proposes a Tcl Assembly Language (TAL) at the Tcl level. It would simply be a reflection of the lowe-level API up into the tcl level.

DKF 12Dec02 - Just a quick note to people writing bytecoded implementations of structural commands (i.e. those that contain pieces of code that you want to have compiled into the flow of bytecodes; for, if, switch, and while are all structural commands); when you hand off a piece of source for sub-compilation, you must make sure that it is a substring of the string that you yourself are part of (e.g. by handing in one of your parse tokens). Fail to do this (e.g. by passing in a copy instead) and you will get strange and very hard to debug crashes.

DKF 13Sep03 - A few extra notes about the extra non-obvious things to watch out for when writing a new bytecoded expr operator. All new bytecodes must be inserted at the end of the bytecode list to prevent breaking of tbcload. Because that means that expr operator bytecodes will not be contiguous, you must also update IllegalExprOperandType() in tclExecute.c to get the operator string right. Apart from that, adding a new operator should be fairly straight-forward if you grok expr's type-promotion rules.

DKF 16Nov07 - People wishing to experiment should get a copy of [Tcl 8.5] and use tcl::unsupported::disassemble.