See Also edit
Description edit
Richard Suchenwirth 1999-07-23: In order to auto-detect and read files (e.g. Tcl sources with Hebrew literals) in 16-bit Unicode representation, I wrote the following:aspect 2014-02-12: WARNING: The following code is (unusually for RS!) rather buggy. It's a good example for newcomers of some easy mistakes to make:- the initial gets has $fn in encoding system, which could be anything (and might not let BOMs through unscathed)
- the main read expects a number of characters in its second argument, but file size counts bytes
- comparing strings with == instead of eq
- info tclversion is not documented as returning a number (see Donald Porter's comment below)
proc file:uread {fn} { set encoding "" set f [open $fn r] if {[info tclversion]>=8.1} { gets $f line if {[regexp \xFE\xFF $line]||[regexp \xFF\xFE $line]} { fconfigure $f -encoding unicode set encoding unicode } seek $f 0 start ;# rewind -- real reading is still to come } set text [read $f [file size $fn]] close $f if {$encoding=="unicode"} { regsub -all "\uFEFF|\uFFFE" $text "" text } return $text }Works both on ASCII and Unicode files (not on swapped bytes tho... FFFE seems to be handled in code, but swapping is not yet ;-(. See also: Unicode and UTF-8
Frank Pilhofer contributed the following swapper that operates on a string data that might be a whole Unicode file, in comp.lang.tcl:Fortunately, swapping is pretty easy in Tcl, at least in LOC:
private method wordswap {data} { binary scan $data s* elements return [binary format S* $elements] }jima: I think it is better to use:
binary scan $data c* elementsCan any expert try my(jima) point?So I'm now using the following code for reading:
global tcl_platform if {[binary scan $data S bom] == 1} { if {$bom == -257} { if {$tcl_platform(byteOrder) == "littleEndian"} { set data [wordswap [string range $data 2 end]] } else { set data [string range $data 2 end] } } elseif {$bom == -2} { if {$tcl_platform(byteOrder) == "littleEndian"} { set data [string range $data 2 end] } else { set data [wordswap [string range $data 2 end]] } } elseif {$tcl_platform(byteOrder) == "littleEndian"} { set data [wordswap $data] } }
Donald Porter:Slightly off-topic note: The code example above tests for the Tcl version with
if {[info tclversion] >= 8.1} ...A better way of testing that is to use:
if {[package vcompare [package provide Tcl] 8.1] >= 0} ...That will continue working if Tcl releases are ever labeled with version numbers more than two levels deep, or if/when a minor release > 9 is released.
RS:Sure. I admit yours is The Right Way ;-) -- only it's about double as long as mine... Maybe I'm pampered, but I've grown to expect it could be done even simpler, so that frequent constructs are nicely wrapped:
proc version {"of" pkg op vers} { expr [package vcompare [package provide $pkg] $vers] $op 0 }Then we can write this sugar: (cf Salt and Sugar)
if [version of Tcl >= 8.1] {...