Updated 2014-11-27 17:38:42 by pooryorick

CMcC Two fundamental types in tcl core are Tcl_Obj and Command.

Both maintain some state, are refcounted, can be named and define some intrinsic operations.

Purposes:

The first element of a string to be evaluated must be a Command. The primary carrier of value is a Tcl_Obj.

Tcl_Obj's are arguments to and results from evaluation of Commands.

In short: Commands transform Tcl_Objs into Tcl_Objs.

Anatomy:

Tcl_Obj is typed, and provides the following operations:

  • Deletion - state is freed
  • Duplication - copy self's state to target
  • Serialization - update self's string representation
  • Mutation - transform state and type of object to this, or error.

Tcl_Obj provides for a string representation and an internal representation comprising storage for up to two pointers.

Command provides the following operations:

  • Deletion - free state and references to this Command
  • Evaluation - given a processing context and some arguments, produce a result.
  • Compilation - transform this command into a compiled form.
  • Tracing - invocation of Command may be traced.

Command provides state comprising: a binding, a list of namespace import references to this command, and most importantly: client state associated with the Deletion and Evaluation operations.

Similarities and Differences:

Both can be deleted.

Tcl_Obj can be duplicated, and can mutate some other value to itself, and itself to a string.

Command can be evaluated and compiled. Since a Command is inherently named, duplication makes no sense, because duplicating a (name,value) pair is always an identity. However, rename can be considered a similar kind of function, in that it creates identical state with a new binding.

Command doesn't have a string representation, doesn't provide a Serialisation operation. This makes sense, to some extent, because its state (comprising C functions) is not meaningful as strings.

proc is the only way to transform a Tcl_Obj into a Command in vanilla tcl. itcl and xotcl provide other ways to generate Command from Tcl_Obj.

Unification:

Notwithstanding the differences between them: If a Command were represented as a Tcl_Obj, how would it look?

If we wanted to create a Tcl_ObjType which wrapped a Command, we would be trying to provide meaningful equivalence between the Tcl_Obj operations (Deletion, Duplication, Serialization, Mutation) and the Command operations (Deletion, Evaluation, Compilation)

  1. Deletion is directly analogous - a CommandObj would invoke Command's deletion function.
  2. Compilation is the transformation of internal state into ByteCode, essentially, so it is analogous to Tcl_Obj Serialization, except that not all Commands are able to be compiled, whereas all Tcl_Objs have a string representation.
  3. Duplication is similar to rename
  4. Mutation is similar to proc
  5. Evaluation has no direct equivalent, except that all Commands can be evaluated much as all Tcl_Obj can be serialized.

Evaluation and Serialization seem most closely analogous, in that All Commands can be Evaluated and All Tcl_Obj can be Serialized. It may be that a CommandObj would use Tcl_Obj Serialize to Evaluate.

Command consumes Tcl_Obj arguments and generates a Tcl_Obj result, remaining (itself) unchanged, Tcl_Obj can clone itself, and can absorb (almost digest or metabolise, also shimmering) other Tcl_Objs into itself.

Most Tcl_Obj processing would be consumed in metabolising, just as most Command processing is consumed in evaluating. Both mutation and evaluation result in a transformed object (from the list of args to a result in Command, and from the object to its new form in Tcl_Obj.)

Much of the design of Tcl_Obj revolves around the need to refcount, much of the design of Command revolves around the need to bind.

RHS Fascinating information there. Let me start by saying thank you for so neatly summarizing all that information into an understandable format. There are some things, however, that I disagree with.

If we agree that we want to have unnamed commands (I use the term command to mean commands and procedures in this context), then I would define the following equivalences:

  1. Deletion is directly analogous
  2. Duplication - would be the same for commands, only (perhaps) the copy would not have a name. Perhaps it would be useful to make the distinction that a command name is to a command what a variable name is to a Tcl_Obj. Its a name that points to a command, rather than an inherint part of the command.
  3. Serialization - Convert the command to its string representation. Since actual commands don't have string reps, this would be an error for them.
  4. Mutation - this is what I would consider evaluation. However, a command needs a context in which to mutate (namespace, stack level, arguements, etc), whereas a Tcl_Obj does not.

MS sees less meaning in the concept of Tcl_Obj than what is apparent here, and thinks that these explorations would better be based on the workings of tcl7.* where they were not present to muddy the issues.

Indeed, Tcl_Obj are in a deep sense just strings. The fact that they are able to cache an internal representation of the last meaning that Tcl assigned to the string is just a performance hack, and carries no deep signification in itself.

To be continued ...

NEM isn't sure where this is going. There are some statements I would disagree with. For instance, the first element of a string to be evaluated is required to be a command name, which can be used to look up a Command. Any Command as Tcl_Obj scheme is going to need a valid string rep so that if "mutation" occurs we can still get back the command at a later date. At present, Tcl requires that all commands are named, so that there is always a string rep. There are three TIPs proposing lambda in Tcl (187, 194 and 196 IIRC), which I presume is what this page is about. They differ mainly in how they approach this question of string rep. If I may summarize on behalf of the authors of these TIPs (please correct me if I get anything wrong), the stances are:
187
Create a special form ("lambda arglist body args", IIRC) which will be recognized by the interpreter before normal (name based) command lookup.
194
Create a (named) command ("apply") which takes a lambda (whose string rep is [list arglist body]) and evaluates it with any arguments.
196
No string rep as such. Check if first element has a commandObj internal rep and use it. Doesn't deal with mutation at all, IIRC.

My personal favourite is 194, as it requires no changes to the evaluation strategy, although auto-expansion of leading word would make it more convenient in certain usages. 187 is a fair alternative, as it does provide a mechanism for dealing with "mutation". I do not consider 196 a good alternative, as it doesn't deal with this issue, and would IMHO lead to unpredictable results as it relies on the current state of the internal rep of the Tcl_Obj, which is merely a temporary cache, rather than a type (despite the naming of various structures like Tcl_ObjType). One issue raised by the author of 196 (and hinted at in this page, I think) is the idea of C-coded anonymous commands. As has been pointed out, the problem here is that there is no natural string rep of such a command. The current Tcl way around this is to assign a name to the command, and require explicit cleanup. An alternative (based on 194, and the ideas in TOOT), would be to use (say) the address of the command in memory, and have a command equivalent to the "apply" suggested. For instance:
 set cmd [get-c-command] ;# returns something like [list ::c-apply 0x03f456db]
 eval $cmd arg arg ...

The "c-apply" command could convert the hex value to a pointer and cast that to appropriate type (Tcl_ObjCommand). Now, there is automatic cleanup of this anonymous command from the Tcl level. The C command doesn't ever disappear, but is that even possible in C? The key point is that it doesn't rely on a command name (or other name, such as a hashtable key) which would need to be cleaned up. It is, of course, also an utterly disgusting hack which is fragile, dangerous and non-portable. Which is why I prefer to not consider C-coded anonymous commands at all.

RHS 24August2004 The use of the apply command means that the command needs to be re-bytecompiled every time its used in a -command respect, doesn't it? For example:
 lsort $aHugeList -command [lambda {a b} { .... }]

In the above context, does it mean the lambda would have to be bytecompiled each time it was called from the lsort command? Hm, perhaps not. Since the apply command takes, as its first arguement, all the information needed to construct the unnamed command object, that Tcl_Obj would be converted to some type of UnnameCommand Obj, bytecompiled, and then never change its internal rep. Does that sound right?

NEM Yes, that is correct. In the reference implementation for 194 (which Miguel did), what actually happens is that a normal (named) proc is created (with a unique name), and then its entry is removed from command table. This Proc structure (with associated bytecode after first call) is associated with the internal rep of the first argument to apply. To illustrate:
 set foo [lambda {a b} {expr {$a + $b}}] ;# Assuming a [lambda] constructor
 {*}$foo 1 2 ;# Gives 3

Now, the rep of $foo is a list of two elements:
                Tcl_Obj ($foo as a whole)
                   |
      +------------+----------+
      |                       |
   Tcl_Obj                 Tcl_Obj (the lambda)
 { ::apply               {{a b} {expr {$a + $b}}}   }
      ^                       ^
      |                       |
  element 1                element 2

Element 1 has an internal rep of cmdName (or something like that) which caches the lookup of the ::apply command. Element 2 is a two element list, but instead of a list internal rep, it has a lambda internal rep, which caches the generated byte code.

So long as neither the containing list or the second element are mutated then the bytecode will stick around. I believe that lsort -command is Tcl_Obj safe, so you should be ok in this example. Other things like widget bindings which are string based (and do substitutions) are less good, and would probably result in excessive recompilation. It should be noted that this is a problem of any of the schemes proposed (rather, it's a problem of the string-based widget binding stuff, IMO, but we're stuck with it) and isn't unique to TIP 194. One possible problem related to TIP 194 was auto-expansion and trying to convert the list to a cmdName during lookup, which could destroy the internal rep and cached byte-code (if auto-expansion takes place after normal command lookup). I did have a brief look to see if this was a real problem, but I can't remember what I decided. :( I don't think it's an insurmountable problem, but worth bearing in mind. TIP 194 doesn't propose auto-expansion, and this implementation issue can be left until such a TIP were to be proposed.

DGP As you look at this, be aware that the Command struct has a public counterpart Tcl_Command that is passed through various parts of the C API.

CMcC This page came from discussions with DGP about struct Command - its ability to store state, its importance in tcl, and the differences between it and struct Proc.

It also arose from consideration of scope, and the observation that the Command structure appears exclusively in static scope, whereas the Var structure appears in either. This disparity has direct bearing on the lambda problem, and consideration of the differences between Tcl_Obj (the values of variable names) and Command (the values of command names) might shed some light on the nature of the problem.

Looking closer at it, perhaps the appropriate comparison is between Var and Command, and Tcl_Obj and Proc, respectively.

On the scope page, CMcC, 2004-10-15, stated that upon reflection, this page is misgiven.