Here's the package to read and parse
[RFC 822
]-like (or
RFC 2822-like) headers, which are used in a variety of applications nowadays (e-mail, news and Debian Packages.gz files come to mind).
When developing the package, the ability to re-produce a header just read is important. E.g. the following code should produce a
header file containing the part of the
message file until (and including) the first empty line:
## get header from `message':
array unset header
set chan [ open "message" r ]
set order [ read-parse-header $chan header ]
close $chan
## write header to `header':
set chan [ open "header" w ]
write-header $chan $order header
puts $chan ""
close $chan
Please note, however, that in mailbox files the header is often prepended with
From (note the absence of colon) line, which isn't part of the RFC 822 syntax, so the
read-parse-header procedure will signal an error on seeing this sort of input.
One approach to this problem can be seen in
Mbox to MH directory conversion tool.
- Namespace
- ::rfc-822
The
procs below belong to this namespace, exported and ready to use.
- Function
- header-fields-order arrayName
Returns the order of fields in header, which is currently stored in array element with key
:order. Empty list is returned if no such element exists.
- Procedure
- header-field-finish arrayName
This procedure is called to signal that the header processing is done.
To allow header field continuation, an implementation of
header-parse-line must preserve some information between the calls. The information is currently stored in array elements with keys beginning with a couple of colons. This procedure flushes such a temporary information (if any), enforcing the array to actually represent the header read.
This procedure is automagically called by
header-parse-line when it sees an empty line, which is a common convention of terminating the header.
- Function
- header-parse-line arrayName line
Parses the
line given, modifying the array
arrayName accordingly. Given an empty line calls
header-field-finish and returns
1, otherwise returns
0.
See the description of the
read-parse-header for more information.
- Function
- read-parse-header ?channelId? arrayName
Reads and parses RFC 822 header fields from
channelId (defaulting to
stdin if omitted), filling the array specified by
arrayName. Returns the order of fields read (as a list).
Header fields are keyed by header name and stored as lists, with each occurence of the header field resulting in exactly one element in the list. Array isn't cleared before processing, it's left to the caller. Leading whitespace is
not removed. Header values continued over several lines result in newlines being embedded in the list elements.
Returned order of fields will contain several occurences of the same header name, in case there were multiple occurences of the header.
- Procedure
- write-header ?channelId? order arrayName
Writes header fields specified by
order argument to the given
channelId (defaulting to
stdout if omitted), obtaining values from the array specified by
arrayName.
NB: Header fields not included in
order will
not be output. This can be used to strip away unwanted headers.
### rfc822h.tcl --- RFC 822 (and alike) headers -*- Tcl -*-
## $Id: 15254,v 1.5 2006-01-16 07:00:23 jcw Exp $
### Copyright (C) 2005, 2006 Ivan Shmakov
## This library is free software; you can redistribute it and/or modify
## it under the terms of the GNU Lesser General Public License as
## published by the Free Software Foundation; either version 2.1 of the
## License, or (at your option) any later version.
## This library is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## Lesser General Public License for more details.
## You should have received a copy of the GNU General Public License
## along with this program; if not, write to the Free Software
## Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301
## USA
### Code:
namespace eval ::rfc-822 {
variable re-header-name "\[\\041-\\071\\073-\\177\]+"
variable re-header-del ":"
namespace export \
header-fields-order \
header-field-finish \
header-parse-line \
read-parse-header \
write-header
}
## In the code below, header is an array keyed by header field name,
## and whose values are corresponding header field values. Because RFC
## 822 header field name cannot contain colons, the convention is to
## put additional information (such as the order of fields) into array
## elements with keys beginning with one or more colons. Eg. the order
## of fields is stored in the element with key ":colon". The keys of
## the private fields of the package begin with two colons.
## : Get the value of a variable, or default if it's not existent
proc ::rfc-822::safe-get { varn { default "" } } {
upvar 1 $varn var
## .
expr { [ info exists var ] ? [ set var ] : $default }
}
## : Get the order of fields in the header
proc ::rfc-822::header-fields-order { arrayName } {
## .
safe-get header(:order) [ list ]
}
## : Store an accumulated header line (if any) in the header
proc ::rfc-822::header-field-finish { arrayName } {
upvar 1 $arrayName header
set varnf header(::last-field)
set varnv header(::last-value)
if { ! [ info exists $varnf ] } {
## no header field to finish
## .
return
}
set field [ set $varnf ]
lappend header($field) [ set $varnv ]
lappend header(:order) $field
unset $varnf $varnv
## .
return
}
## : Parse just one given header line, modifying the header
proc ::rfc-822::header-parse-line { arrayName line } {
## => 0 | 1 (done)
variable re-header-name
variable re-header-del
upvar 1 $arrayName header
if { [ string length $line ] == 0 } {
## end of header, store accumulated field, if any
header-field-finish header
return 1
} elseif { [ string match "\[ \t\]*" $line ] } {
## accumulate continuation of a header field
if { [ info exists header(::last-field) ] == 0 } {
error "no header to continue"
}
append header(::last-value) "\n" $line
} elseif { [ regexp -- \
"^(${re-header-name})${re-header-del}(.*)\$" \
$line \
dummy n v ] } {
## new header field
header-field-finish header
set header(::last-field) $n
set header(::last-value) $v
} else {
error "cannot parse header line"
}
## .
return 0
}
## : Read lines from the channel, parsing them as the RFC 822 header
proc ::rfc-822::read-parse-header { { channelId stdin } arrayName } {
upvar 1 $arrayName header
set done 0
if { ! [ info exists header(:order) ] } {
set header(:order) [ list ]
}
while { ! $done } {
if { [ gets $channelId line ] < 0 } {
error "eof or no data while reading header"
}
if { [ header-parse-line header $line ] } {
set done 1
}
}
## .
set header(:order)
}
## : Write lines to the given channel, formatting them as the header
proc ::rfc-822::write-header { { channelId stdout } order arrayName } {
## FIXME: should it write header fields not in ORDER?
upvar 1 $arrayName header
array unset is
foreach hf $order {
if { ! [ info exists is($hf) ] } {
set is($hf) 0
}
set i $is($hf)
set l $header($hf)
if { $i >= [ llength $l ] } {
## silently ignore this case
continue
}
puts $channelId \
[ format "%s:%s" $hf [ lindex $l $i ] ]
incr is($hf)
}
## .
return
}
package provide rfc822::headers 0.3
### Emacs stuff
## Local variables:
## fill-column: 72
## indent-tabs-mode: nil
## ispell-local-dictionary: "english"
## mode: outline-minor
## outline-regexp: "###\\|proc"
## End:
## LocalWords:
### rfc822h.tcl ends here