package require Tclx set fd [open $bigFlatFile r] # We know this file is utf-8 encoded, but we want to read a # certain number of bytes, not chars... fconfigure $fd -encoding binary pipe out in fconfigure $in -encoding binary -blocking 0 -buffering none fconfigure $out -encoding utf-8 -blocking 0 -buffering none seek $fd $offset puts $in [read $fd $numBytes] read -nonewline $out close $fd close $in close $outUnfortunately, on big chunks of text (>8192), there seems to be a bug in pipe that obstructs this solution... In fact: makes the tcl interpreter hang...In any case, Lars H pointed out that this could be done in a much cleaner way using encoding. Here is the final solution (so far):
# This proc is supposed to work just like [read $fileHandle $numChars], # except that the size of the chunk to read is specified in bytes, not in # chars. This is useful in connection with [seek] and [tell] which always # measure in bytes. The proc is supposed to respect the fileHandle's # configuration w.r.t. encoding, but it will not respect the configuration # w.r.t. eol convention, I think. proc readBytes { fileHandle numBytes } { # Record the original configuration: set enc [fconfigure $fileHandle -encoding] # Special treatment of encoding "binary", since this encoding is not # accepted by [encoding convertfrom]. But this case is trivial: if { $enc eq "binary" } { return [read $fileHandle $numBytes] } # We are going to reconfigure the channel. If anything goes wrong, at # least we should restore the original configuration, hence the catch: if { [catch { # Configure for binary read: fconfigure $fileHandle -encoding binary set binaryData [read $fileHandle $numBytes] set txt [encoding convertfrom $enc $binaryData] # And restore the original configuration: fconfigure $fileHandle -encoding $enc } err] } { fconfigure $fileHandle -encoding $enc error $err } else { return $txt } }Older remark: it would be really nice (and quite logical, in view of the functionality provided by seek and tell) if read could accept a -bytes flag. The only thing needed is a convention about how to handle the situation where the number of bytes does not constitute a complete char. One convention could be: finish the char in that case. Another convention: discard the non-complete char. Or finally, just leave the fractional char as binary debris --- it is up to the caller to make sure this does not happen, and in the examples like the above this comes about naturally.
See Also