Updated 2013-10-27 16:55:17 by pooryorick

The fact is that not Everything is a list. Every command is a list, but individual words comprising the command are not necessarily themselves lists.

Description  edit

On Everything is a string, LES April 27,2004: said there:
[..] the more I use Tcl, the more I am convinced that everything is a list. And that's one of the best features in Tcl.

Since this questions is worth further exploration I set up this page. In recent explorations of Tcl I got convinced that the EIAS point of view is not only a shortcoming, but it could be even a great benefit to consider that Everything is a list of strings.

First I want to discuss the execution model of Tcl. Fact is, that the Endekalogue (or Dodekalogue as of Tcl 8.5) strives to, but is not sufficient to understand how Tcl works, since it only describes syntactic conventions. Fact is also, that already in the *dekalogs we learn implicitly about lists. Now, how does Tcl work:

  • A Tcl script is a string, that is firstly split into one or more commands, which are evaluated from first to last, unless one of them alters this execution order.
  • Each command is split into a sequence of words which are subject to substitutions.
  • The first word of the resulting sequence of words is used to look up the command implementation, which, when found, is applied to the rest of the words. Or in a different terminology: the rest of the words on are given as arguments to the first, which is a command.

We already can see, by virtue of this paraphrasing that sequences are at the core of the Tcl language.

Now to the data nature of lists in Tcl. Some selected citations from the EIAS page:

  • KJN 2004-11-04: The endekalogue does not define lists - it leaves it to commands to decide whether to interpret an argument as a list (and, implicitly, to define what is meant by a list).
  • RHS 2004-11-05: [..] you want you be able to say "this is a list" and "this is not a list". In Tcl, there's no such concept. You can only say "I want to treat this like a list" and "I want to treat this like some-other-thing-that-isn't-a-list".
  • NEM 2005-07-25: [..] what is at the heart of this debate is not [..], but rather a recognition that the notion of a "type" of a value is extrinsic to the value itself.

In fact, a command is free to interpret its arguments as any sort of thing, e.g., a string, list, integer, float, etc.

the built-in list commands can take a string and interpret it as a list, as shown in this slightly modified example from shimmering:
% set x {a  b c} ;# -> a  b c
puts $x
a  b c

[puts] receives the value of x, which it interprets as a string. Consider the following:
# lappend x d
a b c d
# puts $x
==> a b c d
# set x
a b c d

[lappend] converts $x into a list, and then appends d to it. As a result of the conversion, the double space between the first two elements has been lost. puts expects a string, so it converts x back to a string. x has now been used as both a list and a string. Internally, Tcl may be caching the various representations for improved performance, but at the script level, what matters is that x can be used as both types.

An Indecent Proposal by LEG  edit

or: How to get rid of the {*} expansion prefix or: How to benefit most from the notion that Everything is a list (of strings).

Up to Tcl 8.4, [eval] was the only mechanism available to interpolate lists into a command. In Tcl 8.5 the syntax of the Tcl language has been changed, introducing the expansion prefix {*} which can be put in front of any word to provide this {expand}ing.

LEG suggests the following language change:

  • every command returns a list of words, which may be empty.
  • command substitution expands the invoking commandline with the list of words returned by the command
  • return returns the list of words in its commandlines to the invoking stack frame

The proposed behaviour can be emulated by prepending any [bracket] expression with the {*} expansion prefix and enclosing the arguments of any return statements in a [list ..] construction (without the {*}), if there is more then one return value.
# proc valueOf x {return [list $x]}
# set x {a  b c}
a  b c
# valueOf $x
{a  b c}
# valueOf {*}[set x]
wrong # args: should be "valueOf x"

The problem is that any command can now return a number of values, and valueOf only accepts only one. We rewrite valueOf as variadic function, using the tautology [list {*}$args] = $args.
# proc valueOf args {return $args}
# valueOf {*}[set x]
a b c
# remember!
# valueOf $x
{a  b c}

Features:

  • list representation from gset x, string representation from $x
  • The idiom: use set x instead of return $x, will break if x contains a list: the former returns a list, the later returns a list with one element (which is a list).

An example, stressing the Functional Programming style:
namespace import ::tcl::mathop::*
proc map {prefix args} {
    set r {}
    foreach e $args {lappend r [uplevel $prefix $e]}
    set r
}
if {& [map {file exists} [map {file join / etc} passwd shadow group]]} {
   puts "We seem to be on a unix box with shadow passwords"
}

Which seems fairly more natural than:
if {& {*}[map {file exists} {*}[map {file join / etc} passwd shadow group]]} {
...

Hint for people not used to Functional Programming: try to read the idiom from right to left: a list of names gets converted into a list of filesystem paths gets converted into a list of flags, which are all 'and'ed together.

A typical "old style" implementation for comparision:
set files {}
foreach file {passwd shadow group} {
    lappend files [file join / etc $file]
}
set flag 0
foreach file $files {
    set flag [expr {$flag & [file exists $file]}]
}
if {$flag} {$flag} {
   puts "We seem to be on a unix box with shadow passwords"
}

NEM: Why not just not use a variadic function when you don't want one?
proc map {f xs} {
    set ys [list]
    foreach x $xs { lappend ys [{*}$f $x] }
    return $ys
}
if {[& [map {file exists} [map {file join / etc} {passwd shadow group}]]]} {
   ...
}

LEG 2008-12-13: Just made a slight correction to the above to make it work (missing braces).

Note: While playing with this, I found some gotcha's in the use of {*}:
# proc valueOf args {return $args}
# set x {a  b c}
# set y "{*}[valueOf $x]"
{*}{a  b c}
# set y "{*}$x"
{*}a  b c

I would have expected the result to be a b c and a b c respectively, since e.g. first {*}$x should be substituted by a b c, and then the '"' quotes stripped off. When reading the Dodekalogue I realize, that the {*} expansion occurs only at the start of a word, which explains the shown behaviour.

are you saying that one gotcha is that {*} behaves as documented?

LEG 2008-12-13: yes, who reads documentation? :) What is cool with Tcl is, that normally I'm able to write down a script by intuition and it 'just works' (tm)

Another thing I just found out:
# set x {*}{}
==> can't read "x": no such variable
# set x {*}[list]
==> can't read "x": no such variable

The empty list gets expanded into nothing and the resulting commandline is: set x. If you set a variable via the expansion prefix and your command eventually returns an empty list, you might get the above error message. Workaround: Always initialize variables before setting them via the {*} prefix.

Note that the proposed semantic change to [bracket] expansion would have to deal with a sensible interpretation of substituting a list of words returned by a [bracket] expression into a string. I guess the most simple one would be to use the string representation of the list.

The extensive use of variadic functions as well as introducing (return) values for loops would further even more the Simplification of the Tcl language and make scripts more understandable. See Simplification of the Tcl language for the respective discussion.

aricb 2008-12-10: I'm intrigued by the prospect of [return] being able to return more than one value, but it seems to me that your proposal to eliminate {*} is fundamentally flawed. You say that command substitution would expand arguments, so that $var and [set var] would behave differently. Consider this:

What if I want to return a list? In Tcl 8.5, I can do:
return [list $value1 $value2 $value3]

but under your proposal, if I read it correctly, this would be equivalent to
return $value1 $value2 $value3

[list], [dict create], [lreplace], etc. would no longer return lists but a bunch of scalar values. To make matters worse, what if I wanted to return two lists? The following hypothetical command:
return [list $value1 $value2] [list $value3 $value4]

would be equivalent to
return $value1 $value2 $value3 $value4

which is not at all correct.

You cannot allow square brackets to do argument expansion without seriously botching up the way Tcl handles lists. If you are going to allow [return] to return multiple values, you have to do it in such a way that the integrity of the values is preserved. In other words, when command substitution takes place, the command must be replaced with a number of words equal to the number of arguments to the [return] statement that terminated the execution of the command.
proc returnOneValue {} {
    return [list value1A value1B]
}

proc returnTwoValues {} {
    return value1 value2
}

proc acceptOneArg {arg} {
    puts "arg: $arg"
}

proc acceptTwoArgs {arg1 arg2} {
    puts "arg1: $arg1"
    puts "arg2: $arg2"
}
% acceptOneArg a
arg: a

% acceptOneArg [returnOneValue]
arg: value1A value1B

% acceptTwoArgs a b
arg1: a
arg2: b

% acceptTwoArgs [returnTwoValues]
arg1: value1
arg2: value2

% acceptOneArg [returnTwoValues]
wrong # args: should be "acceptOneArg arg"

%acceptTwoArgs [returnOneValue]
wrong # args: should be "acceptTwoArgs arg1 arg2"

In a Tcl interpreter where commands could return more than one value, you could do away with {*}$list, but not by replacing it with [set $list]; you would need a new command which takes one argument (a list) and returns each member as a separate value: [expand $list].

It should be noted that allowing a variable number of return values would introduce a serious incompatibility regarding procs that call [return] with no arguments. In Tcl 8.5, this returns an empty string. If return were variadic, [return] with no arguments would truly return nothing. Any script that relies on argumentless [return] returning an empty string would break.

In the end, as useful as multiple return values might be, I think we are better off with the syntax and semantics that are currently in place.

LEG 2008-12-11: Good point. I see that more would have to be done than I thought: [list $value1 $value2] would have to return just one value after expansion.

aricb 2008-12-12: The point was not that [list] should have a special behavior (it should not) but that square brackets must not trigger argument expansion. The fact that you frame the discussion in terms of expansion indicates that you are not really returning multiple values; you are returning a single value (a list) which must be expanded. So you are not proposing any new behavior for [return]. The new behavior comes because square brackets will now trigger argument expansion. In your proposal, to return multiple values, you return a list, which gets expanded on the other end.

But as you noted in your discussion, Tcl does not specify when something should or should not be treated as a list, and values which were intended to be scalar can be misinterpreted as lists if the programmer is not careful. To prevent misinterpretations, then, the programmer will be forced to always package return values as a list. So your proposal doesn't simplify anything at all; on the contrary, it complicates the task of programming by forcing programmers to type return [list $result] every time rather than return $result. [list ...] is seven characters. {*} is three. So:

  1. arguably you are not saving anybody any effort
  2. while you might remove one wart from the language ({*}), it is a wart that occurs relatively rarely in most people's code, and you are replacing it with a wart that would be orders of magnitude more frequent (return [list $result])

If you want to be able to return multiple values, the solution is not to make brackets perform argument expansion (and redefine a host of commands that return lists); the solution is to make [return] variadic and specify that a command surrounded by brackets would be substituted with one word for every argument passed to the command's [return] statement. So, if you want to return one value, you type [return $value]; if you want to return two values, you type [return $value1 $value2], etc. The catch (which is also a catch in your proposal) is that commands that return no values would get substituted with zero words, whereas in today's Tcl they are substituted with one word--an empty string. This catch is such a major issue that I think we are better off with the current idiom--use {*} to expand arguments as needed, and assume by default that when a command returns a list, it intended to return a list.

(a few hours later) Sorry, the above comes across in a harsher tone than I intended it. I really do think the idea of returning multiple values is interesting, and don't mean any hard feelings, despite the fact that I don't like the thought of having square brackets cause argument expansion.

LEG 2008-12-13: no problem, aricb. This are musings about what if everything were a list, so I just try to figure out how to make it work. My first reply also was somewhat short. I'll try to expand a little more.

As stated as third point under "What if", return returns the list of words on its commandlines to the invoking stackframes.

All arguments to [return] would be wrapped into a list. Symbolically: return a b {c d} -> {a b {c d}} If there are no arguments we get an empty list. A single argument is wrapped into a list with one element.

Upon expansion the unaltered argument list of return is given back to the caller. If there were just one argument the expansion is precisely this argument. return x -> {x} yielding x to the caller.

One problem arises, as stated above by aricb, and shown by the {*}{} gotcha: the expansion of the empty list returns 'nothing' to the caller. At least two solutions come to mind: return the empty string when expanding the empty list, or return the empty list itself, which IMO is more consistent. Anyway 'set x [list]' should keep setting x to {}.

The reciprocal operation to expansion is list, which takes a list of arguments and wraps them into a list. To obtain the same effect as now, list would need to double-wrap the arguments as in: list a b c -> {{a b c}}. After automatic bracket expansion we get: {a b c}.

We see the disadvantage of automatic bracket expansion: the need to wrap-expand in a lot of function calls, if not in the most of them. This disadvantage however could be nullified by optimization in the byte-code compiler.

Another question is, can script-compatiblity with actual Tcl be acchieved at all with automatic bracket expansion? At least all other list returning builtin commands like lrange, lappend, etc. would also have to double-wrap their results.