Updated 2010-11-12 20:04:47 by AMG

Purpose: Demonstrate the use of the [binary] and [fconfigure] commands to process a binary file.

KBK - The following useful Tcl script produces a hex/ASCII dump of a binary file whose name is specified on the command line.
 package require Tcl 8.3

 #----------------------------------------------------------------------
 #
 # dumpFile --
 #
 #       Produce a hex/ASCII dump of a file.
 #
 # Parameters:
 #       fileName -- Path name of the file
 #       channel  -- (Optional) Channel on which to produce the dump.
 #                   Default is stdout.
 #
 # Results:
 #       None.
 #
 # Side effects:
 #       The file is opened, dumped to the specified channel, and
 #       closed again.
 #
 #----------------------------------------------------------------------

 proc dumpFile { fileName { channel stdout } } {

     # Open the file, and set up to process it in binary mode.

     set f [open $fileName r]
     fconfigure $f \
         -translation binary \
         -encoding binary \
         -buffering full -buffersize 16384

     while { 1 } {

         # Record the seek address.  Read 16 bytes from the file.

         set addr [tell $f]
         set s [read $f 16]

         # Convert the data to hex and to characters.

         binary scan $s H*@0a* hex ascii

         # Replace non-printing characters in the data.

         regsub -all -- {[^[:graph:] ]} $ascii {.} ascii

         # Split the 16 bytes into two 8-byte chunks

         set hex1   [string range $hex   0 15]
         set hex2   [string range $hex  16 31]
         set ascii1 [string range $ascii 0  7]
         set ascii2 [string range $ascii 8 16]

         # Convert the hex to pairs of hex digits

         regsub -all -- {..} $hex1 {& } hex1
         regsub -all -- {..} $hex2 {& } hex2

         # Put the hex and Latin-1 data to the channel

         puts $channel [format {%08x  %-24s %-24s %-8s %-8s} \
                            $addr $hex1 $hex2 $ascii1 $ascii2]

         # Stop if we've reached end of file

         if { [string length $s] == 0 } {
             break
         }
     }

     # When we're done, close the file.

     close $f
     return
 }

 #----------------------------------------------------------------------
 #
 # Main program
 #
 #----------------------------------------------------------------------

 if { [info exists argv0] && [string equal $argv0 [info script]] } {
     foreach file $argv {
         puts "$file:"
         dumpFile $file
     }
 }

A slightly different version, derived from the above (not better, just a tiny bit different):
 proc hexdump { filename { channel stdout } } {
     # This is derived from the Tcler's WIKI, page 1599,
     # original author unknown, possibly Kevin Kenny.

     if { [ catch {
        # Open the file, and set up to process it in binary mode.
        set fid [open $filename r]
        fconfigure $fid -translation binary -encoding binary

        while { ! [ eof $fid ] } {

           # Record the seek address. Read 16 bytes from the file.
           set addr [ tell $fid ]
           set s    [read $fid 16]

           # Convert the data to hex and to characters.
           binary scan $s H*@0a* hex ascii

           # Replace non-printing characters in the data.
           regsub -all -- {[^[:graph:] ]} $ascii {.} ascii

           # Split the 16 bytes into two 8-byte chunks
           regexp -- {(.{16})(.{0,16})} $hex -> hex1 hex2

           # Convert the hex to pairs of hex digits
           regsub -all -- {..} $hex1 {& } hex1
           regsub -all -- {..} $hex2 {& } hex2

           # Put the hex and Latin-1 data to the channel
           puts $channel [ format {%08x %-24s %-24s %-16s} \
              $addr $hex1 $hex2 $ascii ]
        }
     } err ] } {
        catch { ::close $fid }
        return -code error $err
     }
     # When we're done, close the file.
     catch { ::close $fid }
     return
 }

Hmm. What version of TCL was this written for? I'm currently using 8.0, and it doesn't know anything about "string equal", or the "-encoding" option to fconfigure (which option Working with binary data says is unnecessary, anyway.) Other than that, and the fact that my regsub doesn't know anything about [:graph:], I like this program... -EE

The above code is written for Tcl 8.3.x .

The bug features^Wfixes and added features of the 8.3 series is well worth the upgrade.

glennj: for 8.0, an equivalent regular expression would be something like:
  {[^ -~]}

But this does not accomodate other locales, only ascii characters.

TclKit appears to build in a copy of hexdump, as above.

JCW - Not any longer as of May 2002 (I've been simplifying TclKit) ... the code has been moved to CritLib, see Critcl and [1].

July 31, 2002:

Note that the hexdump proc above doesn't appear to do the same thing as the dumpfile proc earlier. Here is some output using my /etc/motd:

dumpfile produces:
 /etc/motd:
 00000000  53 75 6e 20 4d 69 63 72  6f 73 79 73 74 65 6d 73  Sun Micr osystems
 00000010  20 49 6e 63 2e 09 53 75  6e 4f 53 20 35 2e 36 09   Inc..Su nOS 5.6.
 00000020  47 65 6e 65 72 69 63 09  41 75 67 75 73 74 20 31  Generic. August 1
 00000030  39 39 37 0a                                       997.
 00000034

hexdump produces:
 /etc/motd:
 00000000 53 75 6e 20 4d 69 63 72  6f 73 79 73 74 65 6d 73  Sun Microsystems
 00000010 20 49 6e 63 2e 09 53 75  6e 4f 53 20 35 2e 36 09   Inc..SunOS 5.6.
 00000020 47 65 6e 65 72 69 63 09  41 75 67 75 73 74 20 31  Generic.August 1
 00000030 47  6 5  6e  6 5  72  6 9  63  0 9   41  7 5  67  7 5  73  7 4  20  3 1   997.

I don't think that last line is what the writer intended.

October 11 2002 -- It was fixed almost immediately, then somebody came and reversed the order of the regsub args "-all --" to "-- -all" and broke it real good @#$^#Q!! It works fine now. -PSE

See also hexadecimal conversions for a tiny string2hex - just wrap a file reader around ;-)

SV - proc hexdump above - Still not good, change line from:
 regexp -- {(.{16})(.{0,16})} $hex -> hex1 hex2

to:
 regexp -- {(.{0,16})(.{0,16})} $hex -> hex1 hex2

You missed this 0 in regexp

RS 2005-05-28: Here's my take:
 proc file'hexdump filename {
    set fp [open $filename]
    fconfigure $fp -translation binary
    set n 0
    while {![eof $fp]} {
        set bytes [read $fp 16]
        regsub -all {[^\x20-\xfe]} $bytes . ascii
        puts [format "%04X %-48s %-16s" $n [hexdump $bytes] $ascii]
        incr n 16
    }
    close $fp
 }
 proc hexdump string {
    binary scan $string H* hex
    regexp -all -inline .. $hex
 }
 foreach file $argv {file'hexdump $file}

Sample output, the script applied to itself:
 /_Ricci/sep> tclsh hexdump.tcl hexdump.tcl
 0000 0d 0a 20 70 72 6f 63 20 66 69 6c 65 27 68 65 78  .. proc file'hex
 0010 64 75 6d 70 20 66 69 6c 65 6e 61 6d 65 20 7b 0d  dump filename {.
 0020 0a 20 20 20 20 73 65 74 20 66 70 20 5b 6f 70 65  .    set fp [ope
 0030 6e 20 24 66 69 6c 65 6e 61 6d 65 5d 0d 0a 20 20  n $filename]..
 ...

JJS 2007-0126: Having just become aware of the [subst [regsub ...]] idiom, I came up with this little procedure:
 proc hexDump {data {chunkSize 10}} {
    # Dump data in hex and ASCII with decimal offset in the form
    # 0000: 49204c6f76652054636c  I Love Tcl
    # Control characters are replaced with "." in the ASCII.
    set n -$chunkSize                   ;# Start "one chunk down"...
    set offset "\[incr n $chunkSize]"   ;# ...so first [incr] gives 0
    set width [expr {$chunkSize * 2}]   ;# Because 2 hex chars/byte
    # Helper script to convert hex to clean printable ASCII.
    set ascii "\[regsub -all {\[\[:cntrl:]]} \[binary format H* \\0] .]"
    # Regsub script to format one chunk of hex data.
    set subSpec "\[format {%04d: %-*s  %s} $offset $width \\0 $ascii]\n"
    # Work with hex data to avoid ASCII quoting hell.
    binary scan $data H* hexData        ;# Practice safe hex!
    # Chunk the data, format the chunks, remove trailing newline.
    string trimright [subst [regsub -all .{1,$width} $hexData $subSpec]]
 }

Here is an example:
 (smith) 3 % hexDump [array get tcl_platform] 20
 0000: 6f7356657273696f6e20352e3020627974654f72  osVersion 5.0 byteOr
 0020: 646572206c6974746c65456e6469616e20746970  der littleEndian tip
 0040: 2c32363820312074687265616465642031206d61  ,268 1 threaded 1 ma
 0060: 6368696e6520696e74656c20706c6174666f726d  chine intel platform
 0080: 2077696e646f7773206f73207b57696e646f7773   windows os {Windows
 0100: 204e547d207469702c3238302031207573657220   NT} tip,280 1 user
 0120: 687473703133393520776f726453697a652034    htsp1395 wordSize 4

JZG 2007-0201: Another inline hex dump (takes binary, returns formatted text) - except this one looks more like the earlier examples that take a file.
 # data        = binary data
 # start_addr  = starting address of the first output column
 #               (affects address column display only)
 # width       = bytes to display per line
 proc hexdump {data {start_addr 0} {width 8}} {
     set out ""
     if { [ catch {
         # Convert the data to hex and to characters.
         binary scan $data H*@0a* hex ascii
         # Replace non-printing characters in the data.
         regsub -all -- {[^[:graph:] ]} $ascii {.} ascii
         set nbytes [string length $ascii]
         for {set pos 0} {$pos < $nbytes} {incr pos $width} {
             set addr [expr $pos + $start_addr]
             set s_hex [string range $hex [expr $pos * 2] [expr ($pos + $width)*2 - 1]]
             set s_ascii [string range $ascii $pos [expr $pos + $width - 1]]

             # Convert the hex to pairs of hex digits
             regsub -all -- {..} $s_hex {& } fmt_hex

             # Put the hex and Latin-1 data to the channel
            append out [format "%06x %-24s %-8s\n" $addr $fmt_hex $s_ascii]
         }
     } err ] } {
         return -code error $err
     }
     return $out
 }

Example:
 0001e0 00 52 75 6e 6e 69 6e 67  .Running
 0001e8 00 00 00 00 00 00 00 00  ........

Here is a little Tcl command to read binary data stored in a string and escape each byte as a hex escape.
 proc escape_bytes { bytes } {
   set len [string length $bytes]
   set escaped ""
   for {set i 0} {$i < $len} {incr i} {
     set c [string index $bytes $i]
     if {[scan $c %c c_int] != 1} {
       error "scan char failed"
     }
     append escaped [format "\\x%x" $c_int]
   }
   return $escaped
 }

 set bytes "ABC"

 % escape_bytes $bytes
 \x41\x42\x43

MSH You can replace the two lines
   for {set i 0} {$i < $len} {incr i} {
     set c [string index $bytes $i]

by
   foreach c [split $bytes {}] {

RS 2008-03-19: Also, as split on "" is guaranteed to deliver single characters, scan cannot fail. So a simpler equivalent of the above is
 proc escape_bytes { bytes } {
    set res ""
    foreach c [split $bytes ""] {
      append res [format \\x%x [scan $c %c]]
    }
    return $res
  }

I always find that less Tcl is more fun - better reading, faster execution... :^)

AMG: I made this the other day, when I didn't have xxd at hand:
set data [read stdin]
for {set i 0} {$i < [string length $data]} {incr i 16} {
    set row [string range $data $i [expr {$i + 15}]]
    binary scan $row H* hex
    set hex [regsub -all {(.{4})} [format %-32s $hex] {\1 }]
    set row [regsub -all {[^[:print:]]} $row .]
    puts [format "%08x: %s %-16s" $i $hex $row]
}

It gives output like this:
[andy@toaster|~]$ ls / | ./hex.tcl
00000000: 6269 6e2f 0a62 6f6f 742f 0a63 6f6d 6d61  bin/.boot/.comma
00000010: 6e64 2f0a 6465 762f 0a65 7463 2f0a 6574  nd/.dev/.etc/.et
00000020: 632e 736c 6163 6b2f 0a68 6f6d 652f 0a6c  c.slack/.home/.l
00000030: 6962 2f0a 6c6f 7374 2b66 6f75 6e64 2f0a  ib/.lost+found/.
00000040: 6d65 6469 612f 0a6d 6e74 2f0a 6f70 742f  media/.mnt/.opt/
00000050: 0a70 6163 6b61 6765 2f0a 7072 6f63 2f0a  .package/.proc/.
00000060: 7075 626c 6963 2f0a 726f 6f74 2f0a 7362  public/.root/.sb
00000070: 696e 2f0a 7365 7276 6963 652f 0a73 7276  in/.service/.srv
00000080: 2f0a 7379 732f 0a74 6d70 400a 7573 722f  /.sys/.tmp@.usr/
00000090: 0a76 6172 2f0a                           .var/.