Richard Suchenwirth 2004-08-07 - I often need to look up data from tables, for instance country codes - would you have known that MH stands for the Marshall Islands? One can implement such tables with arrays:
array set country {DE Germany FR France MH {Marshall Islands} NL Netherlands ... ...}
puts $country(MH)
but such an array cannot be passed by value, and searching the other way ("what is the country code for Marshall Islands?") is less convenient:
proc arr'get {arrName query} {
upvar 1 $arrName arr
foreach {key value} [array get arr] {
if {$value eq $query} {return $key}
}
}
puts [arr'get country {Marshall Islands}]
Note that
arr'get is still pretty compact. What happens if a value is not contained in the table? Then the
foreach loop eventually ends, and the
proc body ends too, returning the last result, which is the constant {} from
foreach. Some people might want to add the line
return {}
after the
foreach, but the effect without it is absolutely the same, so I'm for
simple.
From Tcl 8.5 one can use a
dict for such purposes, but I'm not there yet (all my installations are 8.4, occasionally even an 8.3). What we have since time immemorial, however, is
lists, just as the last argument to the
array set command above (in fact, it's a string to begin with, which can be interpreted as a list). We can use this value directly, and do two-way lookup with
lsearch. In the country code example,
lsearch may have three kinds of result:
- (a) -1, if nothing was found
- (b) an even number (0,2,4,..) if a country code was found
- (c) an odd number (1,3,5..) if a country name was found
In case (a), {} is a suitable response to an unsuccessful query. In case (b) the access function should return the element following the index of the query, while for (c) we'd like to see the preceding element. This specification can be implemented in a pretty compact way:
proc lquery {list query} {
set pos [lsearch $list $query]
lindex $list [expr {$pos+1-2*($pos%2)}]
}
Case (a) is silently covered because for pos=-1, ($pos%2) yields 1 (for odd), which makes the index arithmetic turn out -2, on which
lindex raises no error, but returns the empty string, as specified. We can use lquery directly for lists having an {a b a b ...} pattern, where querying for an a gives its b, and vice versa:
set country {DE Germany FR France MH {Marshall Islands} NL Netherlands ... ...}
puts [lquery $country France]
Testing:
%lquery $country France
FR
% lquery $country FR#
% lquery $country FR
France
As
lsearch by default takes a
glob pattern, we can do wildcard queries, and receive the first entry matching the query:
% lquery $country F*
France
% lquery $country Ma*
MH
Or we can raise an error for an ambiguous pattern, to avoid bugs coming from an unexpected result, most simply by giving the -all switch:
proc lquery {list query} {
set pos [lsearch -all $list $query]
if [llength $pos] {lindex $list [expr {$pos+1-2*($pos%2)}]}
}
pos will now be {} on "not found", so we can call the
lindex part only if it is non-empty. If
lsearch finds more than one match, it returns a list of the matching positions, which
expr can't digest:
% lquery $country *a*
can't use non-numeric string as operand of "+"
The error message isn't too intuitive, however, so let's add one line to improve that:
proc lquery {list query} {
set pos [lsearch -all $list $query]
if {[llength $pos]>1} {error "ambiguous query $query"}
if [llength $pos] {lindex $list [expr {$pos+1-2*($pos%2)}]}
}
% lquery $country *a*
ambiguous query *a*
The data list can be kept in a
global variable, so it is available all the time, but a niftier way is to shrink-wrap it into an
interp alias:
interp alias {} country {} lquery $country
% country FR
France
% country Mar*
MH
So five lines of code give us a cute little data retrieval tool, which can however deal with quite voluminous data sets. Just that linear
lsearch gets slower with longer lists. Let's finally test the timing behavior (on my old 200MHz box at home):
foreach n {10 100 1000 10000} {
set data {}
for {set i 0} {$i<$n} {incr i} {
lappend data a$i b$i
}
set average [expr $n/2] ;# middle of the list
puts "average $i: [time {lquery $data a$average} 100]"
set last [lindex $data end]
puts "last $i: [time {lquery $data $last} 100]"
puts "missing $i: [time {lquery $data not'there} 100]"
}
Funny that searching for an element in the middle, or end, takes longer than for one that isn't there (probably due to string comparison optimization):
average 10: 303 microseconds per iteration
last 10: 279 microseconds per iteration
missing 10: 177 microseconds per iteration
average 100: 534 microseconds per iteration
last 100: 510 microseconds per iteration
missing 100: 354 microseconds per iteration
average 1000: 2720 microseconds per iteration
last 1000: 2825 microseconds per iteration
missing 1000: 2050 microseconds per iteration
average 10000: 27310 microseconds per iteration
last 10000: 27242 microseconds per iteration
missing 10000: 22187 microseconds per iteration
But it is evident that you need not fear tables of 10000 entries, if you just got 27 ms to spare (or less, on today's much faster boxes). If that's too slow, you can still go back to
arrays - just be aware that they take more memory: 42 bytes overhead per entry, as opposed to 12 bytes per list element. So even for large look-up tables,
lquery has its advantages - and in any case it gave me another fun project :)