Updated 2017-03-16 13:42:24 by pooryorick

array is a built-in ensemble of commands that manipulates Tcl's array variables. Array variables can also be manipulated using arrayName(key) syntax.

Synopsis  edit

array anymore arrayName searchId
array donesearch arrayName searchId
array exists arrayName
array get arrayName ?pattern?
array names arrayName ?mode? ?pattern?
array nextelement arrayName searchId
array set arrayName list
array size arrayName
array startsearch arrayName
array statistics arrayName
array unset arrayName ?pattern?

Documentation  edit

man page

Description  edit

A Tcl array is a container of variables. Tcl's two-component $arrayName(key) syntax can be used to substitute individual values of an array. The name of a variable in an array may be any string.

Unlike a dictionary, an array is not a single value. Instead, the name of the array is used as a handle, and is passed to array commands in order to perform some operation on the array such as reading the value of some variable in the array. The array itself, being a container, does not have a value that can be read. However, array get returns a dictionary representing part or all of the array.

Internally, Tcl uses a hash table to implement an array. An array and a dictionary are similar in functionaly, but each has qualities that distinguish it. trace and upvar both can operate on array member variables, but not on elements in a dictionary.

Neither arrays nor dictionaries are the direct script-level equivalent of Tcl's internal hash tables, but a dictionary is the more minimal interface to them. Prior to the introduction of dict, arrays were the Tcl script-level facility bearing the closest resemblence to such structures, so they were often conscripted to that end.

Array keys are not ordered. It isn't straight-forward to get values out of an array in the same order that they were set. One common alternative is to get the names and then order them. In contrast, values in a dictionary are ordered.

traces may be set on either an array container or an individual array variable.

See Also  edit

Arrays / hash maps
more details about arrays
A simple database
container
a guide to containers in Tcl
parray
Arrays as cached functions
Arrays of function pointers
Memory costs with Tcl
for measurement of array/list element consumption in bytes.
Persistent arrays
Procedures stored in arrays
array name string matching extension
GUI for editing a Tcl array
Fitting a new Rear-End to Arrays
foreach
iterating through an array

Creating an Array  edit

To create an array, set a variable within the array using the arrayName(key) form:
set balloon(color) red

or use array set:
array set balloon {color red}

To create multiple array keys and values in one operation, use array set.

To create an empty array:
array set myArray {}

Array names have the same restrictions as any other Tcl variable.

When using the braces syntax of variable substitution, include the parenthesis and the name of the member variable within the braces:
array unset {this stuff}
set {this stuff(one)} 1
parray {this stuff}

A common beginner mistake is to over-quote the name of the member variable:
#warning, bad code ahead!
set a("key") value ;#-> value
array get a ;#-> {"key"} value

unset a

#better:
set a(key) value ;#-> value
value
array get a ;#-> key value

In Tcl, everything is a string. Quoting strings is mostly not necessary, and can even be a problem, as in the previous example.

In the following example the double quotes are not needed because the values don't contain any special characters:
#unnecessary double quotes
array set myArray {"element" "value"}

Better syntax would be:
array set myArray {element value} ;# or:
set myArray(element) value

More examples:
unset x                        ; # x doesn't exist at all anymore
unset x ; array set x {}       ; # x exists as an array but has no elements
array unset x                  ; # x doesn't exist at all anymore

foreach idx [array names x] {
   set x($idx) {}
}                              ; # array exists - all the elements still
                                 # exist, but values of each element are now
                                 # empty
array set colors {
    red   #ff0000
    green #00ff00
    blue  #0000ff
}
foreach name [array names colors] {
    puts "$name is $colors($name)"
}

Retrieving the Value of an array key  edit

# use a literal key
set value1 $ballon(color)
# use a key within a variable
set value2 $ballon($key)

Iain B. Findleton 2004-06 asked whether there were easier ways to read an array element, given that the name of the array was in a variable and the array key was in a variable. His example was:
eval {set ${key}($item)}

DKF writes, Just remove eval from the outside (it just confuses things) and it'll be fine:
puts [set ${key}($item)]        ;# Read
set ${key}($item) $val          ;# Write

If you're in a procedure, use upvar to create a local reference to the array so you get something like this:
upvar $key v
puts $v($item)
set v($item) $val

It's also possible using upvar to link to an array element, but I don't recommend it (for example, it fails if you decide to set key equal to ::env since env-var management is done via a whole-array trace).

Determine whether a Key Exists  edit

info exists array(key)

See info exists

Unset an Array  edit

unset balloon

array unset provides a way to unset a subset of keys

Incrementing an Array, Creating it if it doesn't Exist  edit

Newer versions of tcl already behave this way, but with older versions:
proc incrArrayElement {var key {incr 1}} {
    upvar $var a 
    if {[info exists a($key)]} {
        incr a($key) $incr
    } else {
        set a($key) $incr
    } 
}

Simulating Multiple Dimensions  edit

There are no multi-dimensional arrays in Tcl but they can be simulated by a naming convention:
set a(1,1) 0 ;# set element 1,1 to 0

This works if the keys used do not contain the ',' character. If the keys can be arbitrary strings then one can use the list of the indices as name of the variable in the array:
set a([list $i1 $i2 $i3]) 0; # set element (i1,i2,i3) of array a to 0

This is completely unambiguous, but might look a bit uglier than the comma solution. Also remember that
set a([list 1 2 3]) 0

is equivalent to
set {a(1 2 3)} 0

but not to
#wrong # args
set a(1 2 3) 0

because the last example passes four argument to set.

AMG: To implement multidimensional arrays, I often use the convention given above (commas, not list, but that's a good idea), but it prevents me from easily getting a list of elements in any one dimension. For the following array:
array set data {
    foo,x ecks    foo,y why    foo,z zed
    bar,x ECKS    bar,y WHY    bar,z ZED
}

I'd like some means to get a list foo bar. How is this useful? I have written many server programs that use multidimensions arrays to keep track of state for all connected clients. To get a list of all client IDs, I have another variable or special array element listing the client IDs, but I have to always keep it in sync with the rest of the array. I dislike this.

What if multidimensional arrays were accessed using $name(dim1)(dim2)(dim3) syntax? Thanks to a bug, we once had multidimensional arrays, but the syntax was of course very very weird (I think it used uplevel 0). This is a bit cleaner-looking. But it has very bad interactions with array. How would the following be converted to use array set?
set data(foo)(x) ecks; set data(foo)(y) why; set data(foo)(z) zed
set data(bar)(x) ECKS; set data(bar)(y) WHY; set data(bar)(z) ZED

What should array get data return?

Lars H: Well, why don't you ask Tcl? :-) It would tell you that after the above commands, array get data returns
bar)(z ZED foo)(x ecks bar)(x ECKS foo)(y why bar)(y WHY foo)(z zed

and (as an aid to help overcome one's prejudices about how the above should be interpreted)
join [array names data] \n

returns
bar)(z
foo)(x
bar)(x
foo)(y
bar)(y
foo)(z

This is a recurring problem with attempts to extend Tcl syntax: the "new syntaxes" people come up with usually already mean something, even if that "something" looks rather silly.

AMG: in response to Lars: Wow, I didn't realize Tcl would accept such syntax! It turns out that I'm simply using )( as my dimension delimiter.

Alright, now let's think about how to get a list of all elements in a given dimension. This is easiest to do if the array indices are proper lists:
array set data [list                                         \
    [list foo x] ecks    [list foo y] why    [list foo z] zed\
    [list bar x] ECKS    [list bar y] WHY    [list bar z] ZED\
]

proc array_dimnames {array_var dim_index} {
    upvar 1 $array_var array
    set result [list]
    foreach name [lsort -unique -index $dim_index [array names array]] {
        lappend result [lindex $name $dim_index]
    }
    return $result
}

% array_dimnames data 0
bar foo
% array_dimnames data 1
x y z

That works. For other delimiters, each element of array names needs to be split before the list can be passed to lsort. Another job for lcomp I guess.

For really big arrays such as the enormous MV catchall array used in OpenVerse, I wonder if this costs too much, so much that it's worth it to separately maintain element lists rather than extract that information from the array names.

AMG: Continued from before: array names data should return foo bar, but $data(foo) wouldn't be valid, breaking old assumptions. Should set data(foo) dummy unset data(foo)(*)? And so on.

If array notation could be applied to dicts we'd be in great shape. Doesn't Jim do this?

Lars H: Why don't you just use nested dicts? It seems those will do precisely what you ask for above.

AMG: I can do some things with arrays that I can't do with dicts: namely, traces and upvars and everything else that uses those features. So, I often use arrays when I need to use elements as -textvariables. Perhaps I should be using namespaces instead, preferably wrapped by snit.

LV: Over on comp.lang.tcl, 2007-02, Fredderic provides the following proc in response to someone who was to declare an empty array at the start of a Tcl script.
proc declare_array arrayName {
    upvar 1 $arrayName array
    catch {unset array}
    array set array {}
} 

The idea here is to catch the unset in case the variable was not already declared. Then, array set makes the variable an array, but without any members. That way, a later reference to the name in a non-array setting generates a variable is array error.
[HKASSEM]

proc array_sort {index val _foreach_sorting_array_ref foreachsorting_command} {
    # _foreach_sorting_array_ref is a reference this mean equivalent to &array in C
    upvar $_foreach_sorting_array_ref arrref
    upvar $index i
    upvar $val v
        
    set x [list]
    foreach {k vl} [array get arrref] {
        lappend x [list $k $vl]
    }
        
        foreach e [lsort -integer -decreasing -index 1 $x] {
        #puts "$i,$v"
                set i [lindex $e 0]
                set v [lindex $e 1]
                # ------- FOREACH BODY ------------<
        uplevel $foreachsorting_command
        # ------END FOREACH BODY----------->
        }  
}

usage:
set myarr(1) 20
set myarr(2) 10
set myarr(3) 30
array_sort index value myarr {
  # actions
   puts "$index $value"
}

output:
3 30
1 20
2 10

Memory Usage  edit

Arrays use more memory than lists. Arrays provide O(1) access due to their hashtable nature, while lists provide O(1) access only for numerical indices.

escargo 2002-11-11: Thinking about using arrays as sets got me wondering: Assuming the keys are what is important to me, I would want to take up the least amount of storage for the values. So, what's smallest? An integer (or zero specifically)? An empty string? The key itself?

Lars H:

This is a very tricky question (especially since Tcl does not provide much for Introspection into the matter). I had expected that any value (TclObject) which already exists should yield the same result, but it seems to matter:
Bytes allocated   Code
---------------   ----------------
         970752   for {set n 1} {$n<10000} {incr n} {set A($n) [expr 0]}
         729088   set zero [expr 0]; for {set n 1} {$n<10000} {incr n} {set A($n) $zero}
         729088   for {set n 1} {$n<10000} {incr n} {set A($n) 0}
         729088   for {set n 1} {$n<10000} {incr n} {set A($n) {}}
        1130496   for {set n 1} {$n< 10000} {incr n} {set A($n) $n}
         970752   for {set n 1} {$n<10000} {incr n} {set A([format %d $n]) $n}

These measurements were essentially obtained by comparing the vsize (as reported by ps) of tclsh before and after evaluating the above code, hence it is rather crude.

escargo: Those last two seem strange! Why would having the pure string as the name make such a difference in the storage? Makes me wonder what this w'ould be.
???????   for {set n 1} {$n < 10000} {incr n} {set A($n) [format %d $n]}

Also, isn't there a fence post error here? Shouldn't the range start with set n 0? Otherwise I see 9999 instances being created, not 10,000.

Lars H: And 10000 instances would be more natural than 9999 for what reason? We're just trying to see what's best, and aren't particularly concerned with how good the best are.

As for that mysterious result when the key was used as value, I'm just as surprised as you are. But try it yourself. The code used for obtaining the measurements can be found on Measuring memory usage. I also set up Compact data storage for discussing matters of this kind.

escargo 2002-11-22: I would think that 10000 would be more natural than 9999 just in terms of thinking about averages. I would rather mentally try to divide a number by 10000 than worry about dividing by 9999.

Michael Schlenker: Trying to explain whats going on: Tcl arrays do not yet use Tcl_Obj* for the array keys (some code for it is in the core but #ifdef'ed out for compatibility reasons) instead they use char* as keys. So 10000 char* are created, with the string reps for 1-10000 for the first 4 examples, but with a larger string rep for the last two examples. Example 1 creates a new Tcl_Obj for every entry, as it cannot easily be shared. Examples 2,3 and 4 create only one Tcl_Obj that is shared. Examples 5 and 6 create one unshared Tcl_Obj for each entry.

MS: Starting from Tcl8.5 arrays keys are Tcl_Obj and not strings; also the measurements above should be much improved in 8.5+

Lars H: I might add that the reason that example 5 is more costly than example 6 is that each of the unshared objects in example 5 have a string representation (generated when the argument A($n) of set is substituted), whereas the unshared objects in example 6 do not (format makes do with the internal representation).

Efficiently Comparing Arrays  edit

escargo 2002-11-19:

What is the most efficient way to compare the contents of two arrays?

If array get had an option to specify the method and order of the results, then a simpler comparison could be done.

In Icon a table can be turned into a list by its sort function, which can return the results in one of four ways:

  1. List of key, value pairs sorted by key.
  2. List of key, value pairs sorted by value.
  3. List of alternating key and value sorted by key.
  4. List of alternating key and value sorted by value.

This puts the table into a known canonical order. There appears to be no way to know that array get would linearize two arrays in the same way.

It makes me wish there was an [array compare] function that could easily answer the question.

Michael A. Cleverly 2002-11-19: Here's an [array compare] type proc:
proc array-compare {array1 array2} {
    upvar 1 $array1 foo $array2 bar

    if {![array exists foo]} {
        return -code error "$array1 is not an array"
    }

    if {![array exists bar]} {
        return -code error "$array2 is not an array"
    }

    if {[array size foo] != [array size bar]} {
        return 0
    }

    if {[array size foo] == 0} {
        return 1
    }

    set keys(foo) [lsort [array names foo]]
    set keys(bar) [lsort [array names bar]]
    set keys(keys) $keys(foo)

    if {![string equal $keys(foo) $keys(bar)]} {
        return 0
    }

    foreach key $keys(keys) {
        if {![string equal $foo($key) $bar($key)]} {
            return 0
        }
    }

    return 1
}

Michael Schlenker: If using Tcl 8.4 one can speed this up a bit, by optimizing the lsort:
proc array-compare2 {array1 array2} {
    upvar 1 $array1 foo $array2 bar

    if {![array exists foo]} {
        return -code error "$array1 is not an array"
    }
    if {![array exists bar]} {
        return -code error "$array2 is not an array"
    }
    if {[array size foo] != [array size bar]} {
        return 0
    }
    if {[array size foo] == 0} {
        return 1
    }

    ;# some 8.4 optimization using the lsort -unique feature 
    set keys [lsort -unique [concat [array names foo] [array names bar]]]
    if {[llength $keys] != [array size foo]} {
       return 0
    }

    foreach key $keys {
        if {$foo($key) ne $bar($key)} {
            return 0
        }
    }
    return 1
}

escargo 2002-11-20: So, just to summarize: Arrays are equal iff (if and only if)

  1. They are equal size.
  2. They have the same names.
  3. For all the names the values (associated with each name in each array) are equal.

Is there a significant performance or space penalty for having to call lsort external to array names instead of having array names have a parameter that does the sorting internally?

The performance and space penalty is insignificant if lsort is used as in the above example.

Copying an Array  edit

Lars H: Usually using array get and array set, like so:
array set copy [array get original]

Passing arrays to procedures  edit

See How to pass arrays

Why arrays are handles instead of values  edit

The simple reason is that that's not how they were implemented.

RS: Also, arrays are collections of variables (so: not a value), and have been in Tcl for a long time. Given modern dict and namespace, they might not even have been invented...

RHS: Would it be unreasonable to treat $arrayName the same as array get $arrayName? One could shimmer between the array rep and the list (and other) representations by how they are accessed. In that vein, you could do something like:
set bob [list a 1 b 2 c 3]
puts $bob(a)

...and it would shimmer the list to an array. The only "gotcha" I can think of would be that the list order might? change when you modified the variable as an array, but I don't think that would be unreasonable.

I can see namespaces being the preferred method for encapsulation. Still not understanding dict, I don't understand the pros and cons of dicts vs arrays for randomly accessible hash type data structures.

KJN: Yes, array get $arrayName is a good string representation. What makes me slightly uncomfortable is that Tcl has two types of compound variables (lists and arrays) that are appropriate in different situations and need different handling (with arrays arguably not first-class objects). I wasn't aware of the dict (in Tcl 8.5).

This would be most useful if it could do everything that lists and arrays can do now, so that lists and arrays can either be deprecated, or implemented in terms of a dict.

RS protests - lists are the most versatile containers (for structs, vectors, matrices, trees, stacks, queues, ...), while dicts are more specialized (but can take over most jobs of arrays, except for traces on array elements). I'd like to have both of them in the future :)

LV: Some might say that using lists for vectors and matrices is a bit like using duct tape to hold a boat together... BLT's vector data type is often mentioned as being a useful data structure when vectors are intended for visualization. Also I guess I misremembered dicts as having more restrictions than just traces.

RS: Hm.. vectors are one-dimensional containers for elements - as are lists. Matrices are two-(or more-)dimensional containers for elements - as are lists of lists. Tcl lists are implemented in C as Tcl_Obj*[], costing ~12 bytes of overhead per list elements. Restricted vectors or matrices could be implemented slightly more efficiently, but would needlessly enlarge the variety of data types that is seen as a problem on this page. Tcl isn't an extreme-performance language (C or Assembler are much better at that), but it has great abstractions (like lists and arrays) to boast.

So I'd not call lists just "duct tape", but rather: simple yet powerful abstractions of containers. More like Swiss Army Knives :)

AM: I consider Tcl's lists to be very similar to C's arrays and Fortran's one-dimensional arrays, with the added advantages of bound checking, automatic size management and heterogeneous content. That makes them more versatile than either of Tcl's arrays or dicts in many ways, but these have their advantages too ...

Compare this to the wealth of data structures that is described in literature! If you only look at the different ways of specialising tree structures! Of course you can do a lot with just C-style arrays. But it does not mean that other structures are not useful from time to time.

DKF: Tcl arrays have a lot in common with Java's java.util.HashMap class, as to dicts. Tcl lists are more like ArrayLists

LV: I guess I would see the connection between lists and vectors and matrices easier if there were built in syntactical sugar to allow accessing the elements of a list simpler. For instance, maybe something like
set l [list this is a series of vector elements]
set m [$l{3}]         ;# Sugar for [lindex $l 3]
set l{2:4} [list not just any]        ;# Results in [list this is not just any vector elements]

or something else that made things seem a bit cleaner.

DKF: Well, I'm thinking of fitting a new rear-end to arrays which might make such things easier.

To Do: Ordered Array  edit

PYK 2015-03-06: Now that ordered dicts exist, arrays could probably also become ordered. Since they pre-date dicts, chances are just that no one has gotten around to doing it yet.

AMG: I am not aware of any motivation for this feature, plus it is expensive to implement. Adding it would add cost to all existing scripts for no benefit, and new scripts needing to track ordering can just use [dict].

The [dict] code maintains a doubly linked list stringing together the hash table keys, and this is quite necessary for constructing the string and list representations as well as for determining iteration order. But since arrays don't have a string representation in the technical sense, they never needed to track element creation order, so they never did. What benefit would there be to add it now? It would cost RAM and CPU time, and the theoretical benefit is already provided by [dict].

Presumably, array ordering would affect the following commands:

So that's two commands worth considering. If you really want this feature, you can implement it yourself by wrapping the [array] commands with code that tracks the ordering in a list. The list can even be put in a reserved array element, e.g. the one whose name is empty string.

See my (AMG's) section on forward-compatible dict for related code that provides a [dict]-compatible API yet uses [array] commands for internal manipulation of data.

Misc  edit

AB: Is there a boolean function or command that identifies if an index of an array (or the element of a list) is empty? For instance, if xcrd(1) = {}, is there a boolean function that'll take in xcrd(1) and return 1, confirming that it's an empty index?

LES: Does that help?
proc isempty foo {
    regexp  {^([^(]+)\(([^(]+)\)}  $foo  =>  array  key
    global  $array
    if  {[info exists $array] == 0} { 
        return "$array? There is no $array array."
    }
    if  {[array get $array $key] eq {}}   { 
        return "$array exists, but there is no $key key in $array array"
    }
    if  {[string length [lindex [array get $array $key] 1]] == 0}  { 
        return "YES - [join "$array ( $key )" {}] exists and IT IS EMPTY"
    }
    return "NO - [join "$array ( $key )" {}] exists and IT IS NOT EMPTY"
}

Testing:
set xcrd(1) this
set xcrd(2) that
set xcrd(3) {} 

puts [isempty xcrd(1)]
puts [isempty xcrd(2)]
puts [isempty xcrd(3)]
puts [isempty xcrd(4)]
puts [isempty blah(4)]

MG: The regexp above is actually a little wrong - set myArray(key() value sets the "key(" element of myArray to value, but LES's regexp won't match it (or the 'empty variable', $()). You can even use [set array(key(name)) value] and get an element in array called key(name). So I think the regexp pattern would need to be''
regexp {^([^(]*)\((.*)\)$} $foo => array key

though there's probably a hole in that, somewhere, too (and it's untested, at 20 to midnight, so may not do what I meant anyway ;)

MG offers an alternative which works on lists, as well as arrays. It treats non-existent array elements as empty, rather than raising an error.
proc isempty2 {_var elem} {
    upvar $_var var
    if {![info exists var]} {
        return -code error "variable \"$_var\" is not set";
    }
    if {![array exists var]} {
        if {![string is integer -strict $elem]} {
            return -code error "second arg must be a number, for non-arrays";
        }
        set text [lindex $var $elem]
    } elseif {![info exists var($elem)]} {
        return 1; # empty - element doesn't exist
    } else {
        set text $var($elem)
    }
    return [expr {$text eq {}}];
};# isempty2
set list [list 0 {} 2]
set a(zero) value
set a(one)  {} 
set a(two)  value

% isempty2 list 0
0
% isempty2 list 1
1
% isempty2 a zero
0
% isempty2 a one
1

When using a list, instead of an array, the second argument has to be a number.

LV 2006-11-16: Looks like Wikipedia's page on associative arrays covers only the minimal aspects of Tcl's contribution

[1].

AMG: This section has been moved: [2].

AMG: Tcl's array notation presents a gotcha when the key contains a close parenthesis.
% set array(key()) hello
hello
% set array(key())
hello
% array names array
key()
% array get array
key() hello
% set keyname key()
key()
% puts $array($keyname)
hello
% puts $array(key())
can't read "array(key()": no such element in array

The problem is that in $array(key) notation, the key extends until, but does not include, the first close parenthesis. This is in contrast to directly naming the array element without using $ in which the key extends until, but does not include, the final character (which must be close parenthesis).

This is perhaps most hurtful in the case of generated code:
proc $name i [string map [list %outVar% $outVar] {
    set table $::tableArray(%outVar%)
    interpolate [lindex $table 3] [locate [lindex $table 1 0] $i]
}]

The above fails, but the following works because it avoids the $ form:
proc $name i [string map [list %outVar% $outVar] {
    set table [set ::tableArray(%outVar%)]
    interpolate [lindex $table 3] [locate [lindex $table 1 0] $i]
}]

Though to be honest, I imagine this would fail too if $outVar were to be braced by list. I can't worry about that right now, got other problems. :^(

pyk 2014-05-30: I think the following would work in all cases:
proc $name i [string map [list %outVar% [list ::tableArray($outVar)] {
    set table [set %outVar%]
    interpolate [lindex $table 3] [locate [lindex $table 1 0] $i]
}]