When multiple variables have the same value, Tcl internally saves space by only storing that value in memory once, and only copies the value in memory when necessary. This is known as
copy-on-write. It's useful to keep this in mind when working with large values or with
data structures. Each item in a
pure list, for example, is a separate value, so extracting it with
lindex won't result in the value being copied. In the following example,
set value [string repeat a [expr {2 ** 10}]]
for {set i 0} {$i < 60} {incr i} {
lappend list1 $value
}
foreach item $list1 {
lappend list2 $value
}
$value is stored a grand total of 1 time in memory, occupying 1MB, not, as might be supposed, 120MB. All the items in
$list1 and
$list2 share the same value in memory.
A fundamental concept of Tcl is that
everything is a string. In a Tcl script, therefore, there are inevitably a large number of strings, and a naive implementation that copied these strings a lot would perform badly. Copy-on-write is the mechanism by which the
Tcl C implementation avoids unecessary copies. Each value (
Tcl_Obj) has a reference count. Whenever the value is passed to a command or assigned to a variable the reference count is incremented and no copy is made. When a value is to be changed the implementation first checks the reference count. A count of 1 indicates that there is no other reference to the value and it is safe to modify it in-place. A count greater than 1 indicates that there are other references to the value, to preserve Tcl semantics, it should be copied before being modified.
An understanding of
Tcl's copy-on-write mechanism is often needed when writing a
C extension. A good place to start is with the documentation for
Tcl_Obj.
An extension should always take advantage of Tcl's copy-on-write mechanism where appropriate, so that user expectations at the script level regarding memory usage are met. Correct behaviour by an extension can often be verified using
representation.
The Call-By-Reference / Call-by-Value Story edit
Those just learning the concepts of call-by-reference and call by value may wonder how to understand Tcl in those terms, and how those concepts relate to Tcl facilities such as
upvar.
Tcl command arguments are
always passed by value at the
Tcl script level and by reference at the
C implementation level. Passing to a command the name of a value or a command allows it to reference the named variable or value. This script-level facility can be used for the same reasons that a pointer might be used in
C, but obviously, the implementation of these things is different, and tangentially related to copy-on-write.
Through The Looking Glass, and What Alice Found There edit
"The name of the song is called 'Haddocks' Eyes.'" |
"Oh, that's the name of the song, is it?" Alice said, trying to feel interested. |
"No, you don't understand," the Knight said, looking a little vexed. "That's what the name is called. The name really is 'The Aged Aged Man.'" |
"Then I ought to have said 'That's what the song is called'?" Alice corrected herself. |
"No, you oughtn't: that's quite another thing! The song is called 'Ways and Means': but that's only what it's called, you know!" |
"Well, what is the song, then?" said Alice, who was by this time completely bewildered. |
"I was coming to that," the Knight said. "The song really is 'A-sitting On A Gate': and the tune's my own invention." |
C/
C++ programmers discovering copy-on-write initially tend to see it as a guaranteed performance improvement, and beginners'
C++ books are full of examples of string classes using it. After a while they will use their copy-on-write implementations in a multi-
threaded application and then things start to go wrong. The problem is that checking the reference count and copying needs to be performed in a
thread-safe manner which typically requires the use of
mutexes. Since the mutex must be locked for
every access, not just ones that turn out to need a copy, the performance tends to actually be worse than for a straightforward implementation that always copies.
Threaded Tcl avoids this problem by never sharing values between threads.