Updated 2015-04-09 12:24:48 by dbohdan

Daniele Alberto Galliano

This VFS is based upon TclVFS, and allows accessing the content of a Compiled Html (CHM) file, as those used as help for Windows applications.

My intent was allowing the deployment of a whole site, by mean of a single file, yet available in such format. E.g. I don't want to download the whole html help for ActiveTcl, create a war, and deploy it. This comes from the need to distribute useful documentation among my development team, and also from the habit of downloading sites to my Palm, using [Plucker] and the like.

CHMvfs is not thoroughly tested, but worked fine with version ActiveTcl8.4.11.0.162119-html and ActiveTcl8.4.12.0.226725-html of documentation.

Main namespace is related to chm, as a filesystem. I'd like to add access to chm-specific resources, like search for indexes.

lzx namespace is a porting from chmtools, implemented by Matthew T. Russotto, whom I thank a lot, because his work was absolutely necessary to end this program. This LZX decruncher was pulled out of the program cabextract 0.2 by Stuart Caie <kyzer@4u.net> and modified to be useful as an LZX decruncher outside the context of CAB files. I simplified it even more.

The difference between CHMvfs and chmtools, is that I needed a Random Access, which I achieved using the block index at the end of the Reset Table: reading the whole file each time, uncompressing it, required a huge time.

If You like it, You can extend with all the things I forgot, and pulling all the loose strings. You can also use this as a starting point for a CABvfs.

I am not interested in using chm files other than for reading, so it is a read-only filesystems, like zips and other archives.

Code edit

  package provide vfs::chm 0.5
  set ::verbose 0
  package require vfs 1.0
  package provide chmvfs 0.5
  
  set ::vfs::debug 1
  # Basic idea is having access to a site structure as saved in a chm file, then
  # serving it via a tcl-based http server.
  
  namespace eval vfs::chm {
    variable status
    array set status [list / /]
  }
  
  namespace eval ::lzx {
    variable bitbuf ""
    variable bufsiz 0
    variable bufpos 0
    variable status
    array set status [list / /]
    variable LZX_FRAME_SIZE 32768
    variable LZX_CHECK_BLOCK [expr (32768 + 6144)*8]
    variable LZX_MIN_MATCH                2
    variable LZX_MAX_MATCH                257
    variable LZX_NUM_CHARS                256
  #define LZX_BLOCKTYPE_VERBATIM       (1)
  #define LZX_BLOCKTYPE_ALIGNED        (2)
  #define LZX_BLOCKTYPE_UNCOMPRESSED   (3)
    variable LZX_PRETREE_NUM_ELEMENTS  20
    variable LZX_ALIGNED_NUM_ELEMENTS  8
    variable LZX_NUM_PRIMARY_LENGTHS   7
    variable LZX_NUM_SECONDARY_LENGTHS    249
    variable LZX_LENGTH_MAXSYMBOLS   [expr $LZX_NUM_SECONDARY_LENGTHS + 1]
    variable LZX_LENGTH_TABLEBITS    12
    variable LZX_PRETREE_MAXSYMBOLS  $LZX_PRETREE_NUM_ELEMENTS
    variable LZX_PRETREE_TABLEBITS   6
    variable LZX_MAINTREE_MAXSYMBOLS [expr $LZX_NUM_CHARS + 50 * 8]
    variable LZX_MAINTREE_TABLEBITS  12
    variable LZX_ALIGNED_MAXSYMBOLS  $LZX_ALIGNED_NUM_ELEMENTS
    variable LZX_ALIGNED_TABLEBITS   7
    variable extra_bits
    variable position_base
  
    # Initialize LZX lookup tables
    for { set i 0; set j 0} { $i < 51 } { incr i 2 } {
      # 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7...
      set extra_bits($i) $j
      set extra_bits([expr $i + 1]) $j
      # 0,0,1,2,3,4...15,16,17,17,17,17...
      if { ($i != 0) && ($j < 17) } { incr j }
    }
  
    for { set i 0; set j 0 } { $i < 51 } { incr i } {
      # 0,1,2,3,4,6,8,12,16,24,32,...
      set position_base($i) $j
      # 1,1,1,1,2,2,4,4,8,8,16,16,32,32,...
      incr j [expr 1 << $extra_bits($i)]
    }
  }
  
  proc vfs::chm::Mount {chm local} {
    if {![file exists $chm] || ![file isfile $chm] } {
      error "No such file $chm"
    }
    
    set fileHeader(size) 56
    set fileHeader(structure) {        {n type f a4} {n version f i}
        {n totLength f i} {n unknown f i}
        {n timestamp f i} {n langId f i}
        {n guid1 f h32} {n guid2 f h32}}
  
    set recordFormat "h32"
    set fl [::open $chm r]
    fconfigure $fl -translation binary
  
    set fmt ""
    set varList {}
    foreach headerElement $fileHeader(structure) {
        array set element $headerElement
        set fmt "$fmt$element(f)"
        set varList [concat $varList fileHeader($element(n))]
    }
  
    set content [read $fl $fileHeader(size)]
    eval binary scan \$content $fmt $varList
  
    if { $fileHeader(version) != 2 } {
        set hst(size) 40
        set hst(structure) { {n offs0 f w} {n len0 f w}
            {n offs1 f w} {n len1 f w} {n offs2 f w}}
    } else {
        set hst(size) 32
        set hst(structure) { {n offs0 f w} {n len0 f w}
            {n offs1 f w} {n len1 f w}}
    }
  
    set fmt ""
    set varList {}
    foreach headerElement $hst(structure) {
        array set element $headerElement
        set fmt "$fmt$element(f)"
        set varList [concat $varList hst($element(n))]
    }
  
    set content [read $fl $hst(size)]
    set offset $fileHeader(size)
    eval binary scan \$content $fmt $varList
  
    set hSect0(size) $hst(len0)
    set hSect0(structure) {        {n unkn0  f H4} {n unkn1 f i}
        {n fileSize f w } {n unkn2 f i} {n unkn3 f i}}
  
    set fmt ""
    set varList {}
    foreach headerElement $hSect0(structure) {
        array set element $headerElement
        set fmt "$fmt$element(f)"
        set varList [concat $varList hSect0($element(n))]
    }
  
    set content [read $fl $hSect0(size)]
    eval binary scan \$content $fmt $varList
  
    set hSect1(size) 84
    set hSect1(structure) { { n type f a4 } { n version f i } { n length f i } { n unkn0 f i }
        { n chunksz f i } { n density f i } { n depth f i } { n rtchunkno f i } { n PGML1 f i }
        { n PMGLls f i } { n unkn1 f i } { n dirChunks f i } { n wLangId f i } { n guid f h32 }
        { n len2 f i } { n unkn2 f i } { n unkn3 f i } { n unkn4 f i } }
  
    set fmt ""
    set varList {}
    foreach headerElement $hSect1(structure) {
        array set element $headerElement
        set fmt "$fmt$element(f)"
        set varList [concat $varList hSect1($element(n))]
    }
  
    seek $fl $hst(offs1)
    set content [read $fl $hSect1(size)]
    eval binary scan \$content $fmt $varList
    set offset [expr $hst(offs1) + $hSect1(length)]
    seek $fl $offset
  
    for {set chunk 0} { $chunk < $hSect1(dirChunks) } { incr chunk } {
      set chOffset 0
      set content [read $fl $hSect1(chunksz)]
      incr offset $hSect1(chunksz)
      set fmt a4
      eval binary scan \$content $fmt chunkType
  
      if { $chunkType == "PMGL" } {
        set begLstChunk(size) 20
        set begLstChunk(structure) { { n type f a4 } { n length f i } { n zero f i }
            { n prev f i } { n next f i } }
  
        set fmt ""
        set varList {}
        foreach headerElement $begLstChunk(structure) {
            array set element $headerElement
            set fmt "$fmt$element(f)"
            set varList [concat $varList begLstChunk($element(n))]
        }
  
        eval binary scan \$content $fmt $varList
        set quickref [expr $chOffset + $hSect1(chunksz) - $begLstChunk(length)]
        set end [expr $chOffset + $hSect1(chunksz) - 2]
        set fmt s
        eval binary scan \$content @${end}$fmt begLstChunk(entryNum)
        lappend begLstChunk(structure) {n entryNum f s}
  
        set chunks($chunk) [array get begLstChunk]
  
        incr chOffset $begLstChunk(size)
        set chunkList {}
        for {set entryno 0} { $entryno < $begLstChunk(entryNum) } {incr entryno} {
          set chOffset [readEntry $content $chOffset entry]
          array set temp $entry
          set name [string trim $temp(NAME) /]
          if { $name == "/" } { set name "/" }
          set ::vfs::chm::tree${chm}($name) $entry
          lappend chunkList $entry
        }
        set chunks(lst$chunk) $chunkList
  
      } else {
        if { $chunkType == "PMGI" } {
        } else {
          error "Unknown chunk type $chunkType"
        }
      }
    }
  
    if { $fileHeader(version) != 2 } {
      set offset $hst(offs2)
    } else {
      incr offset $hSect1(chunksz)
    }
    close $fl
  
    if { $fileHeader(version) != 2 } {
      set contentStart $hst(offs2)
    } else {
      set contentStart $offset
    }
    set ::vfs::chm::tree${chm}(contentStart) $contentStart
    set nlfid [open $chm ::DataSpace/NameList r 438]
    set content [read $nlfid]
    close $nlfid
    binary scan $content ss len num
    set offset 4
    for { set idx 0 } { $idx < $num } { incr idx } {
      binary scan $content @${offset}s fileNameLen
      incr offset 2
      set beg $offset
      incr offset [expr 2 * $fileNameLen]
      if { $idx > 0 } {
        set fname [string range $content $beg $offset]
        set fname [join [split $fname "\0"] ""]
        array set sectData [set ::vfs::chm::tree${chm}(::DataSpace/Storage/$fname/Content)]
        incr sectData(OFFSET) $contentStart
        set ctrlfd [open $chm ::DataSpace/Storage/$fname/ControlData r 438]
        set control [read $ctrlfd]
        close $ctrlfd
        binary scan $control ia4iiiii wnum sectData(SIGN) sectData(VERSION) sectData(RSTINT) sectData(WINSZ) sectData(CACSZ) zero
        if { $sectData(VERSION) == 2 } {
          set sectData(WINSZ) [expr $sectData(WINSZ) * 32768]
          set sectData(RSTINT) [expr $sectData(RSTINT) * 32768]
        }
        switch $sectData(WINSZ) {
           32768 { set sectData(WINBT) 15 }
           65536 { set sectData(WINBT) 16 }
          131072 { set sectData(WINBT) 17 }
          262144 { set sectData(WINBT) 18 }
          524288 { set sectData(WINBT) 19 }
         1048576 { set sectData(WINBT) 20 }
         2097152 { set sectData(WINBT) 21 }
          default { error "bad controldata window size" }
        }
        set ctrlFileName "::DataSpace/Storage/$fname/Transform/{7FC28940-9D31-11D0-9B27-00A0C91E9C7C}/InstanceData/ResetTable"
        set ctrlfd [open $chm $ctrlFileName r 438]
        set control [read $ctrlfd]
        close $ctrlfd
        binary scan $control iiiiwwwww sectData(unk) sectData(RSTBEN) sectData(RSTBSZ) sectData(RSTBLN) sectData(RSTBULN) sectData(RSTBCLN) sectData(RSTBBSZ) sectData(RSTBZRO) sectData(RSTBBND)
        array set rstTable [set ::vfs::chm::tree${chm}($ctrlFileName)]
        set sectData(RSTBLOFFS) $rstTable(OFFSET)
        set ::vfs::chm::tree${chm}(Section$idx) [array get sectData]
      }
      incr offset 2
    }
    set ::vfs::chm::status($local) $chm
    set ::vfs::chm::tree${chm}(mountPoint) $local
    ::vfs::log "chm $chm mounted at $local"
    vfs::filesystem mount $local [list vfs::chm::handler $chm]
    vfs::RegisterMount $local [list vfs::chm::Unmount]
    return $local
  }
  
  proc vfs::chm::Unmount {local} {
    set chm $::vfs::chm::status($local)
    unset ::vfs::chm::tree$chm
    unset ::vfs::chm::status($local)
    vfs::filesystem unmount $local
  }
  
  proc vfs::chm::handler {chm cmd root relative actualpath args} {
    if {$cmd == "matchindirectory"} {
      eval [list $cmd $chm $relative $actualpath] $args
    } else {
      eval [list $cmd $chm $relative] $args
    }
  }
  
  # If we implement the commands below, we will have a perfect
  # virtual file system for Compiled HTML archives.
  
  proc vfs::chm::stat {chm name} {
      ::vfs::log "stat $name"
      if { $name == ""} {
        return [list type directory size 0 mode 0555 \
          ino -1 depth 0 name $name atime 0 ctime 0 mtime 0 dev -1 \
          uid -1 gid -1 nlink 1]
            }
      if {[catch "set ::vfs::chm::tree${chm}($name)" x]} {
        error "No such file: $x"
      }
      array set entry $x
      if { $entry(LENGTH) == 0 } {
        return [list type directory size 0 mode 0111 \
          ino -1 depth 0 name $name atime 0 ctime 0 mtime 0 dev -1 \
          uid -1 gid -1 nlink 1]
      }
      return [list type file size $entry(LENGTH) mode 0111 \
        ino -1 depth 0 name $name atime 0 ctime 0 mtime 0 dev -1 \
        uid -1 gid -1 nlink 1]
  }
  
  proc vfs::chm::access {chm name mode} {
    if {$mode & 2} {
      vfs::filesystem posixerror $::vfs::posix(EROFS)
    }
    if { $name == ""} { return 1 }
    if {[catch "set ::vfs::chm::tree${chm}($name)" x]} {
      vfs::filesystem posixerror $::vfs::posix(ENOENT)
    }
          return 1
  }
  
  proc vfs::chm::exists {chm name} {
    if { $name == ""} { return 1 }
    if {[catch "set ::vfs::chm::tree${chm}($name)" x]} {
      vfs::filesystem posixerror $::vfs::posix(ENOENT)
    }
          return 1
  }
  
  proc vfs::chm::matchindirectory {chm path actualpath pattern type} {
  
      set pattern [file join $path $pattern]
      set biggest [array name ::vfs::chm::tree${chm} $pattern]
      set root [set ::vfs::chm::tree${chm}(mountPoint)]
      set bigger [list]
      foreach p $biggest {
        if { [string match "${pattern}/*" $p ]} continue
        lappend bigger [file join $root $p]
      }
      #::vfs::log "got $newres"
      return [::vfs::matchCorrectTypes $type $bigger]
  }
  
  proc vfs::chm::open {chm name mode permissions} {
    switch -- $mode {
      "" -
      "r" {
        if {[catch "set ::vfs::chm::tree${chm}($name)" x]} {
          vfs::filesystem posixerror $::vfs::posix(ENOENT)
        }
        array set entry $x
        if { !$entry(LENGTH) } {
          # There are no empty files: they are folders!
          vfs::filesystem posixerror $::vfs::posix(EPERM)
        }
  
        set nfd [vfs::memchan]
        fconfigure $nfd -translation binary
        set chmfd [::open $chm r]
        fconfigure $chmfd -translation binary
  
        if { $entry(SECTION) } {
          # needs decompression
          array set sectData [set ::vfs::chm::tree${chm}(Section${entry(SECTION)})]
          set entryNum [expr $entry(OFFSET) / $sectData(RSTBBSZ) / 2]
          set entry(NUM) [expr $entryNum * 2]
          if { $entryNum > $sectData(RSTBEN) } {
            error "Entry $entryNum not present in Reset Table($sectData(RSTBEN))"
          }
          set entryOffset [expr [set ::vfs::chm::tree${chm}(contentStart)] + $sectData(RSTBLOFFS) + $sectData(RSTBLN) + $entry(NUM) * $sectData(RSTBSZ)]
          set fl [::open $chm r]
          fconfigure $fl -translation binary
          seek $fl $entryOffset
          set addr [read $fl $sectData(RSTBSZ)]
          if { $sectData(RSTBSZ) == 4 } {
            binary scan "$addr" i compOffset
          } else {
            binary scan "$addr" w compOffset
          }
          set entry(OFFS) $compOffset
          set skip [expr $entry(OFFSET) - $entry(NUM) * $sectData(RSTBBSZ)]
          seek $chmfd $sectData(OFFSET)
          seek $chmfd $entry(OFFS) current
          ::lzx::decompress $chmfd $nfd $entry(LENGTH) $skip $sectData(WINBT)
        } else {
          seek $chmfd [set ::vfs::chm::tree${chm}(contentStart)] start
          seek $chmfd $entry(OFFSET) current
          set data [read $chmfd $entry(LENGTH)]
          puts -nonewline $nfd $data
        }
  
        fconfigure $nfd -translation auto
        seek $nfd 0
        return [list $nfd]
      }
      default {
        vfs::filesystem posixerror $::vfs::posix(EROFS)
      }
    }
  }
  
  proc vfs::chm::createdirectory {chm name} {
      #::vfs::log "createdirectory $name"
      vfs::filesystem posixerror $::vfs::posix(EROFS)
  }
  
  proc vfs::chm::removedirectory {chm name recursive} {
      #::vfs::log "removedirectory $name"
      vfs::filesystem posixerror $::vfs::posix(EROFS)
  }
  
  proc vfs::chm::deletefile {chm name} {
      #::vfs::log "deletefile $name"
      vfs::filesystem posixerror $::vfs::posix(EROFS)
  }
  
  proc vfs::chm::fileattributes {chm name args} {
      #::vfs::log "fileattributes $args"
      switch -- [llength $args] {
          0 {
              # list strings
              return [list NAME SECTION OFFSET LENGTH]
            }
          1 {
              # get value
        if {[catch "set ::vfs::chm::tree${chm}($name)" x]} {
          vfs::filesystem posixerror $::vfs::posix(ENOENT)
        }
        array set entry $x
              set index [lindex $args 0]
              set name [lindex [list NAME SECTION OFFSET LENGTH] $index]
              return $entry($name)
          }
          2 {
              # set value
              set index [lindex $args 0]
              set val [lindex $args 1]
              vfs::filesystem posixerror $::vfs::posix(EROFS)
          }
      }
  }
  
  proc vfs::chm::utime {fd path actime mtime} {
      vfs::filesystem posixerror $::vfs::posix(EROFS)
  }
  
  proc vfs::chm::readEntry { content offset lst } {
    set l 0
    set b -1
    while { $b < 0 } {
      eval binary scan \$content @${offset}c1 b
      incr offset
      set l [expr ( $l << 7 ) | ($b & 127) ]
    }
    eval binary scan \$content @${offset}a$l n
    incr offset $l
    set s 0
    set b -1
    while { $b < 0 } {
      eval binary scan \$content @${offset}c1 b
      incr offset
      set s [expr ( $s << 7 ) | ($b & 127) ]
    }
    set o 0
    set b -1
    while { $b < 0 } {
      eval binary scan \$content @${offset}c1 b
      incr offset
      set o [expr ( $o << 7 ) | ($b & 127) ]
    }
    set l 0
    set b -1
    while { $b < 0 } {
      eval binary scan \$content @${offset}c1 b
      incr offset
      set l [expr ( $l << 7 ) | ($b & 127) ]
    }
    upvar $lst x
    set x [list NAME $n SECTION $s OFFSET $o LENGTH $l]
    return $offset
  }
  
  proc ::lzx::checkBuffer { fds } {
    variable bufpos
    variable bufsiz
    variable bitbuf
  
    # How many bytes to be skipped?
    set bigSkip [expr ($bufpos >> 3 ) & -2]
    # How many bits to be skipped?
    set smlSkip [expr $bufpos & 15]
    # In case of exhausted buffer, read some
    # We should have at least LZX_CHECK_BLOCK = ( LZX_FRAME_SIZE + 6k ) * 8 bits,
    # the maximum size an uncompressed block can achieve, but we
    # read a MegaByte at the time
    if { $bufsiz < $::lzx::LZX_CHECK_BLOCK } {
      set bytes [read $fds 1048576]
      set bitbuf [string range $bitbuf $bigSkip end]
      append bitbuf $bytes
      # Use real bytes length, in case of less input data
      incr bufsiz [expr [string length $bytes] * 8]
      set bufpos $smlSkip
      set bigSkip 0
    }
  }
  
  proc ::lzx::bits { len } {
    variable bufpos
    variable bufsiz
    variable bitbuf
    
    # How many bytes to be skipped?
    set bigSkip [expr ($bufpos >> 3 ) & -2]
    # How many bits to be skipped?
    set smlSkip [expr $bufpos & 15]
    # I read an entire DWORD: it should be enough
    if { [binary scan $bitbuf @${bigSkip}ss num1 num2] < 2} {
      error "Exhausted input"
    }
    set num [expr ($num1 << 16) | ($num2 & 65535)]
    set res [expr $num >> ( 32 - $smlSkip - $len ) ]
    set res [expr $res & ( ( 1 << $len ) - 1 ) ]
    incr bufpos $len
    incr bufsiz -$len
    return $res
  }
  
  proc ::lzx::build_table { obj } {
    set nsyms [set ::lzx::LZX_${obj}_MAXSYMBOLS ]
    set nbits [set ::lzx::LZX_${obj}_TABLEBITS ]
    set table_mask  [expr 1 << $nbits]
    set bit_mask [expr $table_mask >> 1]
    set next_symbol $bit_mask
    set bit_num 1
    set pos 0
  
    while { $bit_num <= $nbits } {
      for { set sym 0 } { $sym < $nsyms } { incr sym } {
        if { [set ::lzx::status(${obj}_len$sym)] == $bit_num } {
          set leaf $pos
  
          incr pos $bit_mask
          if { $pos > $table_mask } {
            error "Table overrun"
          }
  
          set fill $bit_mask
          while { $fill > 0 } {
            set ::lzx::status(${obj}_table$leaf) $sym
            incr fill -1
            incr leaf
          }
        }
      }
      set bit_mask [expr $bit_mask >> 1]
      incr bit_num
    }
  
    if { $pos != $table_mask } {
      for { set sym $pos } { $sym < $table_mask } { incr sym } {
        set ::lzx::status(${obj}_table$sym) 0
      }
  
      set pos [expr $pos << 16]
      set table_mask [expr $table_mask << 16]
      set bit_mask $::lzx::LZX_FRAME_SIZE
  
      while { $bit_num <= 16 } {
        for { set sym 0 } { $sym < $nsyms } { incr sym } {
          if { [set ::lzx::status(${obj}_len$sym)] == $bit_num } {
            set leaf [expr $pos >> 16]
            for { set fill 0 } { $fill < $bit_num - $nbits } { incr fill } {
              if { $::lzx::status(${obj}_table$leaf) == 0 } {
                set s [expr $next_symbol << 1]
                set ::lzx::status(${obj}_table$s) 0
                incr s
                set ::lzx::status(${obj}_table$s) 0
                set ::lzx::status(${obj}_table$leaf) $next_symbol
                incr next_symbol
              }
              set leaf [expr $::lzx::status(${obj}_table$leaf) << 1]
              if { ($pos >> (15-$fill)) & 1 } { incr leaf }
            }
            set ::lzx::status(${obj}_table$leaf) $sym
            incr pos $bit_mask
            if { $pos > $table_mask } { error "table overflow" }
          }
        }
        set bit_mask [expr $bit_mask >> 1]
        incr bit_num
      }
    }
  
    if { $pos == $table_mask } return
  }
  
  proc ::lzx::read_huffsym { obj } {
    set mp $::lzx::bufpos
    set ms $::lzx::bufsiz
    set bits [set ::lzx::LZX_${obj}_TABLEBITS ]
    set r [::lzx::bits $bits]
  
    set i $::lzx::status(${obj}_table$r)
    if { $i >= [set ::lzx::LZX_${obj}_MAXSYMBOLS ] } {
      set j [expr 1 << (32 - $bits)]
      set go 1
      while { $go || $i >= [set ::lzx::LZX_${obj}_MAXSYMBOLS ] } {
        set j [expr $j >> 1]
        set i [expr $i << 1]
        set i [expr $i | [::lzx::bits 1]]
        if {!$j} { error "Illegal data" }
        set go 0
        set i $::lzx::status(${obj}_table$i)
      }
    }
    set j $::lzx::status(${obj}_len$i)
    set ::lzx::bufpos $mp
    set ::lzx::bufsiz $ms
    incr ::lzx::bufpos $j
    incr ::lzx::bufsiz -$j
    return $i
  }
  
  proc ::lzx::read_lengths { obj first last } {
    for { set x 0 } { $x < 20 } { incr x } {
      set ::lzx::status(PRETREE_len$x) [::lzx::bits 4]
    }
    build_table PRETREE
  
    for { set x $first } { $x < $last} {} {
      set z [::lzx::read_huffsym PRETREE]
      if { $z == 17 } {
        set y [::lzx::bits 4]; incr y 4
        while { $y} {
          incr y -1
          set ::lzx::status(${obj}_len$x) 0
          incr x
        }
      } else {
        if { $z == 18 } {
          set y [::lzx::bits 5]; incr y 20
          while { $y} {
            incr y -1
            set ::lzx::status(${obj}_len$x) 0
            incr x
          }
        } else {
          if { $z == 19 } {
            set y [::lzx::bits 1]; incr y 4
            set z [::lzx::read_huffsym PRETREE]
            set z [expr $::lzx::status(${obj}_len$x) - $z]
            if { $z < 0 } { incr z 17 }
            while { $y} {
              incr y -1
              set ::lzx::status(${obj}_len$x) $z
              incr x
            }
          } else {
            set z [expr $::lzx::status(${obj}_len$x) - $z]
            if { $z < 0 } { incr z 17 }
            set ::lzx::status(${obj}_len$x) $z
            incr x
          }
        }
      }
    }
  }
  
  proc ::lzx::decompress {infd askfd len skip wndbit } {
    variable bufsiz
    variable bufpos
    variable bitbuf
  
    set bufsiz 0
    set bufpos 0
    set bitbuf ""
  
    if { $wndbit == 20 } {
      set posn_slots 42
    } else {
      if { $wndbit == 21 } {
        set posn_slots 50
      } else {
        set posn_slots [expr $wndbit * 2]
      }
    }
  
    if { $skip } {
      set outfd [vfs::memchan]
      fconfigure $outfd -translation binary
    } else {
      set outfd $askfd
    }
  
    # decompress as much as needed, but outputs only the content, skip previous
    # the idea is to start with initial values: LZX status is not used nor needed
  
    set window_posn 0
    set last_window_posn 0
    set window_size [expr 1 << $wndbit]
    set bufsiz 0
    set main_elements [expr $::lzx::LZX_NUM_CHARS + ($posn_slots << 3)]
  
    set todo [expr $len + $skip]
  
    while { $todo > 0 } {
    ::lzx::checkBuffer $infd
      if { $window_posn == $window_size } {
        set window_posn 0
      }
      if { $window_posn == 0 } {
              set header_read 0
              for { set i 0 } { $i < $::lzx::LZX_MAINTREE_MAXSYMBOLS } { incr i } {
          set ::lzx::status(MAINTREE_len$i) 0 }
              for { set i 0 } { $i < $::lzx::LZX_LENGTH_MAXSYMBOLS } { incr i } {
          set ::lzx::status(LENGTH_len$i) 0 }
        set R0 1; set R1 1; set R2 1
      }
      if { !$header_read } {
        set i 0; set j 0
        set k [::lzx::bits 1]
        set intel_filesize 0
        if { $k } { set intel_filesize [::lzx::bits 32] }
        set header_read 1
      }
      set block_type [::lzx::bits 3]
      set block_remaining [::lzx::bits 24]
      set block_length $block_remaining
  
      switch $block_type {
        1 { # LZX_BLOCKTYPE_VERBATIM
          read_lengths MAINTREE 0 256
          read_lengths MAINTREE 256 $main_elements
          build_table MAINTREE
          if { [set ::lzx::status(MAINTREE_len232)] != 0} { set intel_started 1 }
  
          read_lengths LENGTH 0 $::lzx::LZX_NUM_SECONDARY_LENGTHS
          build_table LENGTH
        }
        2 { # LZX_BLOCKTYPE_ALIGNED
          for { set i 0 } { $i < 8 } { incr i } {
            set ::lzx::status(ALIGNED_len$i) [::lzx::bits 3]
          }
          build_table ALIGNED
          read_lengths MAINTREE 0 256
          read_lengths MAINTREE 256 $main_elements
          build_table MAINTREE
          if { [set ::lzx::status(MAINTREE_len232)] != 0} { set intel_started 1 }
  
          read_lengths LENGTH 0 $::lzx::LZX_NUM_SECONDARY_LENGTHS
          build_table LENGTH
        }
        3 { # LZX_BLOCKTYPE_UNCOMPRESSED
          set intel_started 1
          set b [expr $::lzx::bufpos & 15]
          if { $b } {
            set b [expr 16 - $b]
            incr bufpos $b
            incr bufsiz -$b
          }
          set R0 [::lzx::bits 32]
          set R1 [::lzx::bits 32]
          set R2 [::lzx::bits 32]
        }
        default {
          error "Illegal block type $block_type"
        }
      }
      set this_run $block_remaining
      while { $this_run > 0 && $todo > 0 } {
        if { $this_run > $todo } { set this_run $todo }
        incr todo -$this_run
        incr block_remaining -$this_run
  
        set last_window_posn $window_posn
        set window_posn [expr $window_posn & ( $window_size - 1 )]
  
        if { ($window_posn + $this_run) > $window_size } {
          error "Invalid format"
        }
  
        switch $block_type {
          1 { # LZX_BLOCKTYPE_VERBATIM
            while { $this_run > 0 } {
              set main_element [read_huffsym MAINTREE]
              if { $main_element < $::lzx::LZX_NUM_CHARS } {
                puts -nonewline $outfd [binary format c $main_element]
                incr window_posn
                incr this_run -1
              } else {
                incr main_element -$::lzx::LZX_NUM_CHARS
  
                set match_length [expr $main_element & $::lzx::LZX_NUM_PRIMARY_LENGTHS]
                if { $match_length == $::lzx::LZX_NUM_PRIMARY_LENGTHS } {
                  incr match_length [read_huffsym LENGTH]
                }
                incr match_length $::lzx::LZX_MIN_MATCH
  
                set match_offset [expr $main_element >> 3]
  
                if { $match_offset > 2 } {
                  if { $match_offset != 3 } {
                    set extra $::lzx::extra_bits($match_offset)
                    set verbatim_bits [::lzx::bits $extra]
                    set match_offset [expr $::lzx::position_base($match_offset) - 2 + $verbatim_bits]
                  } else {
                    set match_offset 1
                  }
  
                  set R2 $R1; set R1 $R0; set R0 $match_offset
                } elseif { $match_offset == 0 } {
                    set match_offset $R0
                } elseif { $match_offset == 1 } {
                    set match_offset $R1
                    set R1 $R0; set R0 $match_offset
                } else {
                  set match_offset $R2
                  set R2 $R0; set R0 $match_offset
                }
  
                set length $match_length
                while { $length } {
                  set rundest [tell $outfd]
                  seek $outfd -$match_offset end
                  set match [read $outfd $length]
                  seek $outfd $rundest
                  puts -nonewline $outfd $match
                  set l [string length $match]
                  incr length -$l
                }
                incr window_posn $match_length
                incr this_run -$match_length
              }
                    if { ($window_posn % 32768 ) == 0 && $window_posn != 0 } {
                set b [expr $::lzx::bufpos & 15]
                if { $b } {
                  set b [expr 16 - $b]
                  incr ::lzx::bufpos $b
                  incr bufsiz -$b
                }
                    }
            }
          }
          2 { # LZX_BLOCKTYPE_ALIGNED
            while { $this_run > 0 } {
              set main_element [read_huffsym MAINTREE]
  
              if { $main_element < $::lzx::LZX_NUM_CHARS } {
                puts -nonewline $outfd [binary format c $main_element]
                incr window_posn
                incr this_run -1
              } else {
                incr main_element -$::lzx::LZX_NUM_CHARS
  
                set match_length [expr $main_element & $::lzx::LZX_NUM_PRIMARY_LENGTHS]
                if { $match_length == $::lzx::LZX_NUM_PRIMARY_LENGTHS } {
                  incr match_length [read_huffsym LENGTH]
                }
                incr match_length $::lzx::LZX_MIN_MATCH
  
                set match_offset [expr $main_element >> 3]
  
                if { $match_offset > 2 } {
                  # It is not a repeated offset
                  set extra $::lzx::extra_bits($match_offset)
                  set match_offset [expr $::lzx::position_base($match_offset) - 2]
                  if { $extra > 3 } {
                    incr extra -3
                    set verbatim_bits [::lzx::bits $extra]
                    incr match_offset [expr $verbatim_bits << 3]
                    set aligned_bits [::lzx::read_huffsym ALIGNED]
                    incr match_offset $aligned_bits
                  } elseif { $extra == 3 } {
                    set aligned_bits [::lzx::read_huffsym ALIGNED]
                    incr match_offset $aligned_bits
                  } elseif { $extra > 0 } {
                    set verbatim_bits [::lzx::bits $extra]
                    incr match_offset $verbatim_bits
                  } else {
                    set match_offset 1
                  }
  
                  set R2 $R1; set R1 $R0; set R0 $match_offset
                } elseif { $match_offset == 0 } {
                  set match_offset $R0
                } elseif { $match_offset == 1 } {
                  set match_offset $R1
                  set R1 $R0; set R0 $match_offset
                } else {
                  set match_offset $R2
                  set R2 $R0; set R0 $match_offset
                }
  
                set length $match_length
                while { $length } {
                  set rundest [tell $outfd]
                  seek $outfd -$match_offset end
                  set match [read $outfd $length]
                  seek $outfd $rundest
                  puts -nonewline $outfd $match
                  set l [string length $match]
                  incr length -$l
                }
                incr window_posn $match_length
                incr this_run -$match_length
               }
                    if { ($window_posn % 32768 ) == 0 && $window_posn != 0 } {
                set b [expr $::lzx::bufpos & 15]
                if { $b } {
                  set b [expr 16 - $b]
                  incr bufpos $b
                  incr bufsiz -$b
                }
                    }
                  }
          }
          3 { # LZX_BLOCKTYPE_UNCOMPRESSED
            set tt [expr $::lzx::bufpos / 8]
            set part [string range $::lzx::bitbuf $tt [expr $tt + $this_run - 1]]
            puts -nonewline $outfd $part
            incr ::lzx::bufpos [expr $this_run * 8]
            incr bufsiz [expr -$this_run * 8]
            incr window_posn $this_run
            if { $::lzx::bufpos & 8 } {
              incr ::lzx::bufpos 8
              incr bufsiz -8
            }
          }
          default {
            error "Illegal block type $block_type"
          }
        }
        set this_run $block_remaining
      }
    }
    if { $skip } {
      seek $outfd $skip
      set tmp [read $outfd $len]
      puts -nonewline $askfd $tmp
    }
  }
  
  proc ::vfs::chm::test {} {
    ::vfs::chm::Mount c:/Tcl/doc/ActiveTclHelp8.4.chm c:/chm
    cd c:/chm
    set d [glob -types d -directory c:/chm -tails ActiveTcl8.*]
    cd $d
    puts [pwd]
    puts [join [lsort [glob *]] \n]
    catch "file mkdir test" err
    puts $err
    set fid [::open c:/chm/$d/aspn.css r]
    set content [read $fid]
    close $fid
    set fid [::open c:/aspn.css w]
    fconfigure $fid -translation binary
    puts $fid $content
    close $fid
    cd c:/
    ::vfs::chm::Unmount c:/chm
  }

CHM Web server edit

This is just a DustMote modification to browse the CHM file as a site.
  package require vfs::chm
  #source chmvfs.tcl
  
    ::vfs::chm::Mount c:/Tcl/doc/ActiveTclHelp8.4.chm c:/chm
  
   set d [glob -types d -directory c:/chm -tails ActiveTcl8.*]
  
   set root "c:/chm/$d"
   set default "at.toc.html"
   set port 80
  
   proc bgerror {trouble} {puts stdout "bgerror: $trouble"}
  
   proc answer {socketChannel host2 port2} {
     fileevent $socketChannel readable [list readIt $socketChannel]
   }
  
   proc readIt {socketChannel} {
     global root default
     fconfigure $socketChannel -blocking 0
     set gotLine [gets $socketChannel]
     if { [fblocked $socketChannel] } then {return}
     fileevent $socketChannel readable ""
     set shortName "/"
     regexp {/[^ ]*} $gotLine shortName
     set many [string length $shortName]
     set last [string index $shortName [expr $many-1] ]
     if {$last=="/"} then {set shortName $shortName$default }
     set wholeName $root$shortName
  set err ""
  puts "Serving $wholeName"
     if [catch {set fileChannel [open $wholeName RDONLY] } err ] {
       puts $socketChannel "HTTP/1.0 404 Not found"
       puts $socketChannel ""
       puts $socketChannel "<html><head><title><No such URL.></title></head>"
       puts $socketChannel "<body><center>"
       puts $socketChannel "The URL you requested does not exist on this site."
       puts $socketChannel "</center>$err</body></html>"
  puts $err
     } else {
       fconfigure $fileChannel -translation binary
       fconfigure $socketChannel -translation binary -buffering full
       puts $socketChannel "HTTP/1.0 200 OK"
       puts $socketChannel ""
       set work [read $fileChannel]
  
       puts $socketChannel $work
       close $fileChannel
     }
  
     close $socketChannel
   }
  
   socket -server answer $port
   vwait forEver

Discussion edit


AK: A number of questions.

  • The history of the lzx code is not fully clear to me from the description. Does it come from cabextract, or from chmtools?
  • Am I right in my assumption that with porting you mean that the original code is in C(++), and you ported it to Tcl?
  • What licenses was the original code under, and what license is your code under?
  • Do you know where it is possible to find an lzx compressor in Tcl or C(++) ? (For people interested in doing the write part of the VFS).
  • Can you provide links to web pages which describe the structure of CHM and CAB files in detail? I.e., a good specification of the file format.

DAG: A number of answers.

  • It does come from chmtools. In the source code I found, and copied, the reference.
  • You're perfectly right: it was coded in C, and I ported in Tcl. It was just for fun, and to make it much more portable, without the need of a binary library: I love Tcl-only packages.
  • It was GNU GPL by Caie. Russotto didn't claim any right on modifications. For me is just the same.
  • See below.
  • My ending point, after a long research was [1], the reason why I thanked Matthew.
  • Here is another place of information [2]. It is more related to CHM format, and can be used to implement next features.

DDG: Very interesting work. However I can't read the aspn.css file. Getting the "illegal block type 4" error. The file exists however as the glob results shows. Any suggestions?

DAG: How unfortunate! May I know which version of file do you use? I am afraid the versions I own have only VERBATIM and ALIGNED blocks, so UNCOMPRESSED block part could be not tested at all.

Anyway, last night I found that there is still a bug related to Random Access: it seems probably something about odd blocks. I am afraid I access those with some misalignment. I am looking into it: next tests will copy all files from archive to another location.

Another thing I fixed is the buffer reset at the beginning of decompress proc.

DDG Thanks for your response. I use the file D:/ActiveTcl8.4.2/doc/ActiveTclHelp.chm. I can give you a link for downloading "my" chm file. It would be nice to use chmvfs for my dgHelpBrowser like the metakit files.

DAG: That's a good idea. Have a try with this last version: it works fine on both my versions, but for XOTcl pdf files. I'm working on it, but I think that probably it is still related to UNCOMPRESSED blocks. Even chmtools seems to have some problems with those pdf files, though.

I also added a small modification of DustMote, which I use to browse the CHM file as a site. In case it is not working for You, let me know where I can download Your file from.

DDG It now works also for my chm file. Great! I will see that I update dgHelpBrowser for reading chm-files on Unixes,Win and OS-X etc. I once tried to adjust chmlib for this purpose using Swig but I failed. Thanks.

DAG: I thought I found the last problem in UNCOMPRESSED blocks handling, realigning to 16-bit boundaries after each of them, if needed. Nonetheless, it seems that pdf extracted are not always identical to stored files. Still investigating.

DAG: I did it. At last I found the bug in UNCOMPRESSED blocks handling. Nevertheless, I discovered a .chm that puzzled me again. Filenames are stored in all lower-cases, but referred also with upper cases. This means that I need to add an option to mount the VFS in both modes: case-sensitive or case-insensitive. I think that, coming from Windows, the latter should be the default.