Updated 2008-01-06 14:20:42 by dkf

Vincent Wartelle - mailto:vwartelle@oklin.com My temporary conclusions and where they come from.

A. OBJECTIVE CONCLUSIONS

on tcl 8.3, 32 bit machines only
 each tcl object =
    any new string, number, date, value     --> 24 bytes + content size
    any pre-existing string, data, value    --> 4 bytes (pointer only)

 content size =
    depends on encoding and data type :
    one or two bytes per char for string values
    may be 0 for a number (integer/double), if it is never used as a
    string (therefore included in the core tcl object)

Jeffrey Hobbs comments: UTF-8 can go up to 3 bytes per char for the 2-byte unicode that Tcl uses internally. Also, content size can be greater for UnicodeString objects, List objects, ... that all malloc some extra space for their internal reps.
 each variable =
    48 bytes + "content size" of the name +
              "tcl object size" of the content

 each hash key entry =
    48 bytes + "content size" of the key  +
              "tcl object size" of the value

 each list =
    32 bytes + size of each list entry

 each list entry =
    4 bytes + "tcl object size" of the content

B. SUBJECTIVE CONCLUSIONS

  • When using TCL, don't emulate pointer mechanisms. Copy the complete data when needed. TCL will replace redundant data by pointers.
  • Each different "thing" in a tcl program will cost 24 bytes
  • Variables and hash-tables are costly:
    52 bytes overhead for each variable,
    52 bytes overhead for each hash table key

  • Lists are not costly: 4 bytes overhead for each element. (Yes, far more if each element is itself a list...)

C. INFORMATION FROM NEWSGROUPS

1. excerpt from

[1]
 >    On a 32 bit machine where alignment is 4 byte boundary
 >      and the types have the
 >    following sizes,
 >            long    4 bytes
 >            int     4 bytes
 >            char *  4 bytes
 >            double  8 bytes
 >            void *  4 bytes
 >    sizeof (Tcl_Obj) = 4 + 4 + 4 + 4 + MAX (4, 8, 4, 4 + 4)
 >                     = 24 bytes

2. excerpt from

[2]
 >> [experiment shows that...] approximately 54 bytes for each key. [...]

Well, it takes a certain amount of space to store the hash entry (four words plus the size of the key; median about 20 bytes in your case on a 32-bit machine) and more to store the variable (each entry in an array is an independent variable that can support its own traces, etc.) which adds another 8 words or 32 bytes. This gives about 52 bytes per array member; pretty close to what you report...

D. MY EXPERIMENTS

tclsh 8.3.2 with TCL_MEMORY_DEBUG on windows Millenium - 32 bit machine
 1. hashtable with empty values

        memory info
        current bytes allocated       152681
        ...

        % for {set i 0} {$i < 10000 } { incr i } {
        set t($i) ""
        }
        % memory info
        current bytes allocated       698453
        ...
        698453 - 152681 = 545772

        approx 54 bytes per key.

 2. hashtable with constant value

        memory info
        current bytes allocated       152550
        ...

        % for {set i 0} {$i < 10000 } { incr i } {
        set t($i) "abcd"
        }
        % memory info
        current bytes allocated       698363
        ...

        698363 - 152550 = 545813

        approx 54 bytes per key.

 3. hashtable with variable value

        memory info
        current bytes allocated       152550
        ...

        % for {set i 0} {$i < 10000 } { incr i } {
        set t($i) "abcd_$i"
        }
        % memory info
        current bytes allocated      1037220
        ...

        1037220  - 152550 = 884670

        approx 89 bytes per key.

 4. empty global variables

        % memory info
        current bytes allocated       152550
        ...
        % for { set i 1 } { $i <= 10000 } { incr i } {
        set ::a[set i] ""
        }
        % memory info
        current bytes allocated      729761
        ...
        729761 - 152550 = 577211

        approx 57 bytes per variable

 5.  global variables with the same value

        % memory info
        ...
        current bytes allocated       152550

        % for { set i 1 } { $i <= 10000 } { incr i } {
        set ::a[set i] "abcd"
        }
        % memory info
        ...
        current bytes allocated       708202
        708202 -  152550 = 555652

        approx. 55 bytes per variable.

 6.  global variables with different values

        % memory info
        ...
        current bytes allocated       152550

        % for { set i 1 } { $i <= 10000 } { incr i } {
        set ::a[set i] "abcd_$i"
        }
        % memory info
        ...
        current bytes allocated     1047070
        1047070 -152550 = 894520

        approx 89 bytes per variable.

 7.  empty list entries

        % memory info
        ...
        maximum bytes allocated       152550

        % for {set i 1 } { $i <= 10000 } { incr i } {
        lappend l ""
        }
        % memory info
        ...
        current bytes allocated       202179
        202179 - 152550 = 49629

        approx 5 bytes per list entry.

 8. identic list entries

        % memory info
        ...
        current bytes allocated       152550

        % for {set i 1 } { $i <= 10000 } { incr i } {
        lappend ::l "abcd"
        }
        % memory info
        ...
        current bytes allocated       202215
        202215 - 152550 = 49665

        approx 5 bytes per list entry.

 9. different list entries

        % memory info
        ...
        current bytes allocated       152550

        % for {set i 1 } { $i <= 10000 } { incr i } {
        lappend ::l "abcd_$i"
        }
        % memory info
        ...
        current bytes allocated     541083
        541083 - 152550 = 428533

        approx 43 bytes per list entry.

interp costs? interp alias costs?

DKF - note that dict (as proposed in TIP #111 [3]) will give hash access for memory costs much closer to that of a list and that of an array.

Arts and crafts of Tcl-Tk programming