Updated 2016-06-07 21:50:45 by lh

chan pending, a chan action, retrieves the about of data pending in a channel's buffer.

Synopsis  edit

chan pending mode channelId

Documentation  edit

TIP 287
resulted in chan pending

Description  edit

Returns the number of bytes in Tcl's buffers for the channel channelId in the direction specified by mode (which must be input or output).

chan pending makes it safe to use chan gets with sockets, which can now be inspected to detect an excessively long line, providing an opportunity to prevent memory exhaustion.

How to Use  edit

chan pending is designed for use with non-blocking channels, especially sockets, and it is used to deal with the case where the source is producing far more data than was expected before producing some particular delimiter, e.g., a newline character for gets.
proc accept {cb channel args} {
    global timeouts
    chan configure $channel -blocking 0 -buffersize 4096
    chan event $channel readable [list doGets $channel $cb]
    set ::timeouts($channel) ""
}

proc doGets {channel callback} {
    global timeouts
    if {[chan gets $channel line] >= 0} {
        after cancel $timeouts($channel)
        set timeouts($channel) ""
        {*}$callback $channel $line
    } elseif {[chan eof $channel]} {
        chan close $channel
        after cancel $timeouts($channel)
        unset timeouts($channel)
    } else {
        # Must be blocked; check for excessive buffering
        if {[chan pending input $channel] > 1024} {
            # must be a line longer than a kilobyte; naughty! reject
            chan close $channel
            after cancel $timeouts($channel)
            unset timeouts($channel)
        } elseif {$timeouts($channel) eq ""} {
            # no line timeout watcher; install one that waits 10 seconds
            # from when the start of the line arrives to when the end of the
            # line arrives before killing the channel
            set timeouts($channel) [after 10000 killChannel $channel]
        }
    }
}

proc killChannel channel {
    global timeouts
    unset timeout($channel)
    chan close $channel
}

# Make the (server) socket
socket -server [list accept processLineCallback] 12345
# This is where you'd add your code to handle each line
proc processLineCallback {channel line} {
    # ... whatever ...
    puts $channel-->$line
}

LH 2016-06-07 The above sample code does not work as intended. In my tests, the socket server accepted excessively long lines, close to 1MB, easily braking the 1KB limit used in the code. In fact, chan pending can't deal with the case where a client is producing far more data than was expected. To do so, it should be called before gets is called, and not after, as in the sample code. The reason for this is explained on the gets wiki page, in the "Show discussion" part. In short, the moment gets is called we loose control of how many characters will be returned.

Unfortunately, calling chan pending before gets will not help much because it only reports bytes pending in the internal Tcl buffer, and ignores bytes pending in the OS buffer. For example, in my tests, it consistently reported 0 before the first gets on a fresh client socket (I removed this redundant line from my test code below, so add it back, if you wish). Even if chan pending was taking into account the bytes pending in the OS buffer, we wouldn't know if an end-of-line character (or sequence) is among the pending bytes or not, so we couldn't make a decision if calling gets is safe anyway.

Even if it worked, the above sample code has a fundamental deficiency: it relies on gets failing (I mean, returning -1) before chan pending can recognize a line that is too long. But gets fails only if all buffering resources have been exhausted, and this may cost megabytes of wasted memory. Not a good idea for a server handling thousands of clients.

The only clean and reliable solution to the "excessively long line" problem I'm aware of is to modify gets by adding an optional "-max size" argument, in the spirit of read command. With the current Tcl I/O commands, the only reliable, but dirty, solution I'm aware of is to add to a socket server app the third layer of buffering, on top of the Tcl and OS buffers, with the help of [read channel size] command on binary sockets.

Here is the code of my tests, run on Windows 8.1 with Wish 8.6.4. I removed the timeout and callback parts from the sample code, since they are irrelevant for this experiment. If you run this code on your platform, your millage may vary because the sample code is highly non-deterministic (different OS buffering strategies and many race conditions):
# utils.tcl

set port 12345

proc ShowConsole args {
  if {[catch { console show }]} return
  set appName [file tail [file root [info script]]]
  array set "" [list -title $appName -exit exit]
  array set "" $args
  console title $(-title)
  set exit [list consoleinterp eval $(-exit)]
  console eval [list wm protocol . WM_DELETE_WINDOW $exit]
  wm withdraw .
  update
}

proc UseLogFile args {
  array set "" [list -file [file root [info script]].log -mode a]
  array set "" $args
  set ::logFile [open $(-file) $(-mode)]
  set timestamp [clock format [clock seconds] -format {%Y-%m-%d %H:%M:%S}]
  puts $::logFile \n$timestamp
}

proc log msg {
  puts $msg
  catch { update idletasks }
  if {![info exists ::logFile]} return
  puts $::logFile $msg
  flush $::logFile
} 

proc prefix {str {length 60}} {
  if {[string length $str] <= $length} { return $str }
  return [string range $str 0 $length-1]...
}
# server.tcl

source utils.tcl
UseLogFile
ShowConsole -exit {
  foreach socket [array names ::CLIENT] { CloseClient $socket EXIT }
  close $::serverSocket
  exit
}

proc CloseClient {socket {reason ""}} {
  log "--- closing client $::NUM($socket) ($reason)"
  unset ::CLIENT($socket)
  unset ::NUM($socket)
  close $socket
}


if {[catch {socket -server accept $port} serverSocket]} {
  log "*** can't open server socket on port $port: $serverSocket"
  exit
}


proc accept {channel args} {
  chan configure $channel -blocking 0 -buffersize 4096
  chan event $channel readable "serve $channel"
  set ::CLIENT($channel) ""
  set ::NUM($channel) [incr ::clientNum]
  log "\n+++ new client $::NUM($channel)"
}

proc serve channel {
  if {[chan gets $channel line] >= 0} {
    log "< ([string length $line]) [prefix $line]"
    return
  }
  if {[chan eof $channel]} {
    CloseClient $channel EOF
    return
  }
  set pending [chan pending input $channel]
  if {$pending > 1024} {
    CloseClient $channel "LineTooLong, pending $pending"
  }
}
# client.tcl

source utils.tcl
UseLogFile
ShowConsole

proc try {cmd args} {
  if {![catch { {*}$cmd {*}$args } result]} { return $result }
  log "$cmd: $result"
}

proc emit line {
  set socket [try socket localhost $::port]
  log "> ([string length $line]) [prefix $line]"
  try puts $socket $line
  try flush $socket
  try close $socket
}

proc line length {
  set chunk [string repeat 0123456789 100]
  append line [string repeat $chunk [expr {$length/1000}]]
  append line [string range $chunk 0 [expr {$length%1000}]-1]
  return $line
}

emit [line 1000]
emit [line 5000]
emit [line 10000]
emit [line 50000]
emit [line 100000]
emit [line 500000]
emit [line 1000000]
emit [line 5000000]
log done

Here is the client's log showing that 1,000,000 character long line was still accepted by the server, while the 5,000,000 character long line was rejected:
2016-06-07 19:42:51
> (1000) 012345678901234567890123456789012345678901234567890123456789...
> (5000) 012345678901234567890123456789012345678901234567890123456789...
> (10000) 012345678901234567890123456789012345678901234567890123456789...
> (50000) 012345678901234567890123456789012345678901234567890123456789...
> (100000) 012345678901234567890123456789012345678901234567890123456789...
> (500000) 012345678901234567890123456789012345678901234567890123456789...
> (1000000) 012345678901234567890123456789012345678901234567890123456789...
> (5000000) 012345678901234567890123456789012345678901234567890123456789...
puts: error writing "sock0000000004183240": connection reset by peer
done

Finally, here is the server's log:
2016-06-07 19:42:46

+++ new client 1
< (1000) 012345678901234567890123456789012345678901234567890123456789...
--- closing client 1 (EOF)

+++ new client 2
< (5000) 012345678901234567890123456789012345678901234567890123456789...

+++ new client 3
< (10000) 012345678901234567890123456789012345678901234567890123456789...
--- closing client 2 (EOF)

+++ new client 4
< (50000) 012345678901234567890123456789012345678901234567890123456789...
--- closing client 3 (EOF)

+++ new client 5
< (100000) 012345678901234567890123456789012345678901234567890123456789...
--- closing client 4 (EOF)

+++ new client 6
< (500000) 012345678901234567890123456789012345678901234567890123456789...
--- closing client 5 (EOF)
--- closing client 6 (EOF)

+++ new client 7
< (1000000) 012345678901234567890123456789012345678901234567890123456789...
--- closing client 7 (EOF)

+++ new client 8
--- closing client 8 (LineTooLong, pending 28672)