string bytelength string See Also edit
- string
- string length
Description edit
Returns a decimal string giving the number of bytes used to represent
string in memory. Because UTF-8 uses one to three bytes to represent Unicode characters, the byte length will not be the same as the character length in general. The cases where a script cares about the byte length are rare. Refer to the
Tcl_NumUtfChars manual entry for more details on the UTF-8 representation.
In almost all cases, you should use the string length operation (including determining the length of a Tcl ByteArray object). An example on
tcom purports to need [
string bytelength] when generating a binary blob to get the length of the blob without forcing generation of an internal string representation by [
string length], but [
string length] does not force an internal string representation when the internal object is a pure bytearray representation.
[
string bytelength] should
not be used with binary data. This command measures how long the UTF-8 representation of a string is in bytes. For binary data you don't want conversion to UTF-8, so you don't want [
string bytelength] either. Use [
string length] instead.
US: Proof for the sceptical:
for {set n 0} {$n < 256} {incr n} {
lappend cl $n
}
set str [binary format c* $cl]
puts "len : [string length $str]"
puts "blen: [string bytelength $str]"
DKF: It's not even real UTF-8. It's the length of Tcl's internal encoding which is
almost-UTF8 (i.e., it is consistently denormalized in certain ways). The only possible use of
string bytelength is answering the question “How much memory is allocated to hold this value's
bytes field?”
Basic Example edit
string bytelength abc
Output : 3
Questions edit
AMG: "UTF-8 uses one to three bytes to represent Unicode characters." This is true only for the BMP. For characters above FFFF, UTF-8 characters can be up to six bytes each. Does Tcl support such yet?
DKF: No. This is one of the things we plan to fix in Tcl 9.0.