One of the applications of the code, described in
Reading and parsing RFC 822 headers, is to manage a news archive. Messages in the newsgroups are formatted according to the
RFC 1036 [
1], which is a variant of
[RFC 822
], but somewhat more restrictive.
The code below is to provide some sanity checks (e. g., multiple instances of the same header are disallowed) and parsing for the values of a couple of RFC 1036 header fields (
Message-ID and
Xref.)
Not much, but still better than nothing.
- Namespace
- ::rfc-1036
The
procs below belong to this namespace, exported and ready to use.
- Function
- trim string
Returns
string with any leading spaces and tabs removed.
- Function
- get-field arrayName field
Returns the value of the header field
field, as stored in array
arrayName, with any leading spaces and tabs removed. If there're no, or multiple, instances of the field, an error is signaled.
- Function
- parse-Message-ID value ?checkOnlyP?
Parses
value as a
Message-ID and returns the list of two elements: the unique part and the host part. If
value cannot be parsed as a
Message-ID, an error is signaled. If
checkOnlyP is given and is a
true value, an empty string is returned, but the error checking is nevertheless performed.
- Function
- parse-Xref value ?checkOnlyP?
Parses
value as an
Xref and returns the list of two values: the host part and the list of newsgroup-article number pairs, suitable for a later
foreach:
foreach { group number } $pairs {
...
}
If
value cannot be parsed as an
Xref, an error is signaled. If
checkOnlyP is given and is a
true value, an empty string is returned, but the error checking is nevertheless performed.
TODO: implement parsing of more RFC 1036 header fields;
Newsgroups,
References probably will be most useful, and
Path is probably easiest to implement.
### rfc1036p.tcl --- Parsing RFC 1036 headers -*- Tcl -*-
## $Id: 16304,v 1.1 2006-08-21 18:00:33 jcw Exp $
### Copyright (C) 2006 Ivan Shmakov
## This library is free software; you can redistribute it and/or modify
## it under the terms of the GNU Lesser General Public License as
## published by the Free Software Foundation; either version 2.1 of the
## License, or (at your option) any later version.
## This library is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## Lesser General Public License for more details.
## You should have received a copy of the GNU Lesser General Public
## License along with this program; if not, write to the Free Software
## Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301
## USA
### Code:
namespace eval ::rfc-1036 {
## basic REs
## FIXME: the RFC says nothing about these two
variable group-re "\[0-9a-zA-Z_.+-\]+"
variable host-re "\[0-9a-zA-Z.-\]+"
## header value REs
variable message-id-re \
"^<(\[^<@> \t\]+)@(\[^<@> \t\]+)>\$"
variable xref-pair-re "(${group-re}):(\[0-9\]+)"
variable xref-re \
"^(${host-re})((?:\[ \t\]+${xref-pair-re})+)\$"
## exports
namespace export \
trim \
get-field \
parse-Message-ID \
parse-Xref
}
## : Return STR with any leading spaces and tabs removed
proc ::rfc-1036::trim { str } {
## .
string trimleft $str " \t"
}
## : Return the value of FIELD of the header
proc ::rfc-1036::get-field { arrayName field } {
upvar 1 $arrayName header
set varn header($field)
if { ! [ info exists $varn ] } {
error "no `$field' header field"
}
set lst [ set $varn ]
## NB: RFC 1036 says nothing about this case;
## at least my INN rejects articles with duplicate headers
if { [ llength $lst ] != 1 } {
error "multiple `$field' header fields"
}
## .
trim [ lindex $lst 0 ]
}
## : Return the unique part and the host part of VALUE
proc ::rfc-1036::parse-Message-ID { value { check-only? 0 } } {
variable message-id-re
if { ! [ regexp -- ${message-id-re} $value \
dummy unique host ] } {
error "`Message-ID' does not match the pattern"
}
if { ${check-only?} } {
## .
return
}
## .
list $unique $host
}
## : Return the host part and the list of locations of VALUE
proc ::rfc-1036::parse-Xref { value { check-only? 0 } } {
variable xref-pair-re
variable xref-re
if { ! [ regexp -- ${xref-re} $value \
dummy host rest ] } {
error "`Xref' does not match the pattern"
}
if { ${check-only?} } {
## .
return
}
set pairs [ list ]
foreach s [ split $rest " \t" ] {
if { ! [ string length $s ] } { continue }
if { ! [ regexp -- ${xref-pair-re} $s \
dummy group number ] } {
error "unreachable"
}
lappend pairs $group $number
}
## .
list $host $pairs
}
package provide rfc1036::parse 0.1.1
### Emacs stuff
## Local variables:
## fill-column: 72
## indent-tabs-mode: nil
## ispell-local-dictionary: "english"
## mode: outline-minor
## outline-regexp: "###\\|proc"
## End:
## LocalWords:
### rfc1036p.tcl ends here