Marty Backe - 30 Dec 2003
To keep current with all the various opensource projects, I subscribe to the
Freshmeat e-mail newsletter, a daily listing (in digest format) of opensource announcements.
Here's an example entry from the newsletter:
[[066]] - TclCurl 0.10.8 (Development)
by Andres Garcia (http://freshmeat.net/users/andresgarci/)
Monday, December 29th 2003 09:55
Internet :: File Transfer Protocol (FTP)
Internet :: WWW/HTTP
Software Development :: Libraries
About: TclCurl provides a binding for libcurl. It makes it possible to
download and upload files using protocols like FTP, HTTP, HTTPS, LDAP,
telnet, dict, gopher, and file.
Changes: The binding was updated for libcurl 7.0.18.
License: BSD License (revised)
URL: http://freshmeat.net/projects/tclcurl/
Now say I'm interested in finding out a bit more about this project. I click on the URL link which takes me to a
Freshmeat website from which I can then click a link to get to the projects homepage. That's a lot of clicking!
I wrote a filter application (see listing below) that visits the
Freshmeat page for each project in the newsletter, extracts the homepage url, and adds it to the newsletter - below the URL line.
Since I run my own mail server I am able to insert this filter application between my mail delivery agent (Procmail) and my mailbox. Now as I read the newsletter, I'm just one click away from any given homepage.
I used
Snit primarily because I wanted to gain a little exposure to its use. It's certainly a very simple
Snit application.
To see it in action you can grab a newsletter from the archive (see
http://freshmeat.net/newsletter) and pass it through the filter:
cat newsletter.txt | FreshmeatMailFilter.tcl > converted.txt
#!/bin/sh
#\
exec tclsh8.4 "$0" "$@"
################################################################################
################################################################################
#
# Written by Marty Backe
#
# Freshmeat newsletter filter.
#
# Rev Date Changed By Comments
# ----- ----------- ---------- -----------------------------------------------
# 1.0 30 Dec 2003 M Backe Initial release.
#
################################################################################
################################################################################
#
# Load required packages
# Snit is only used because I wanted to get acquainted with it.
#
set packageList {
{http}
{snit}
}
foreach package $packageList {
if {[catch {eval package require $package} errorMsg]} {
puts "FreshmeatMailFilter requires '$package' or above. The following"
puts "error occurred: \"$errorMsg\""
exit
}
}
# ------------------------------------------------------------------------------
#
# Type: FreshmeatMailFilter
#
# Summary: This program is designed as a filter for the daily Freshmeat
# e-mail newsletters. The URL's for each project currently
# specify a Freshmeat webpage. This requires the reader of the
# Freshmeat newsletter to first go to the Freshmeat page, find
# the project homepage link, and then click on that to get to the
# project homepage.
# This program extracts the actual project homepage and adds it
# as an additional link in the newsletter, below the existing
# URL link.
#
# This program reads stdin, looks for lines that contain
# the URL, retrieves the necessary Freshmeat webpages to
# extract the project homepage url, and inserts the url in
# a line below the URL line.
#
# Usage: From a Procmail recipe, pipe the e-mail through this program.
# Example:
# :0:
# | /home/johndoe/MailFilters/FreshmeatMailFilter.tcl |
# /usr/local/bin/dmail +"Mail/Freshmeat"
# Example:
# cat freshmeat_message.txt | FreshmeatMailFilter.tcl >
# freshmeat_message2.txt
#
# ------------------------------------------------------------------------------
::snit::type FreshmeatMailFilter {
variable mailMessage ""
constructor {} {
set mailMessage [$self readInput]
foreach line $mailMessage {
#
# Look for the URL line that is provided for each project.
#
if {[string first "URL: http://freshmeat.net/" $line 0] == 0} {
#
# Use regexp here but not above because 'string first' is
# much faster and therefore is a better choice if used on every
# line of the file, which it is in this case.
#
set urlString ""
regexp {^URL: (http://.*)$} $line matchString urlString
puts $line
if {$urlString != ""} {
set homepageUrl [$self getHomepageUrl $urlString]
if {$homepageUrl != ""} {
puts "Homepage: $homepageUrl"
}
}
} else {
puts $line
}
}
}
# --------------------------------------------------------------------------
#
# Method: readInput
#
# Summary: Reads stdin. A list is built, where each list item
# is a line from the stdin.
#
# Input:
# Output: A list
#
# Uses:
#
# --------------------------------------------------------------------------
method readInput {} {
set tmpFileBuffer ""
while {-1 != [gets stdin inputline]} {
lappend tmpFileBuffer $inputline
}
close stdin
return $tmpFileBuffer
}
# --------------------------------------------------------------------------
#
# Method: getHomepageUrl
#
# Summary: The provided URL is that which corresponds to the Project
# URL provided in the newsletter.
# The Freshmeat URL is redirected (http return code 302) to
# a webpage that contains another redirected URL. Following
# that URL gets us to the actual project homepage website.
#
# Therefore, acquiring the actual project homepage requires
# three downloads from Freshmeat.
#
# If any errors occur along the way (invalid url, timeouts,
# etc.) a null string is returned.
#
# Input: <Freshmeat project URL>
# Output: <Project homepage URL>
# null string if the project homepage URL could not be found
#
# Uses:
#
# --------------------------------------------------------------------------
method getHomepageUrl {url} {
#
# Get the webpage specified in the Newsletter URL link. This is
# expected to be a redirect (http return code 302).
#
if {![catch {set urlToken [http::geturl $url -timeout 10000]} \
errorMsg]} {
if {[http::status $urlToken] != "ok"} {
#
# We get here if a timeout occurred.
#
http::cleanup $urlToken
return ""
}
if {[http::ncode $urlToken] == 302} {
#
# A redirection occurred (which is expected for these URL's).
# Grab the new URL and retrieve the webpage contents. If this
# times out or some other error occurs, give up.
#
upvar #0 $urlToken state ;# See docs for ::http
array set meta $state(meta) ;# See docs for ::http
http::cleanup $urlToken
if {[catch {set urlToken [http::geturl $meta(Location) \
-timeout 10000]} errorMsg]} {
return ""
} else {
if {[http::status $urlToken] != "ok"} {
http::cleanup $urlToken
return ""
}
}
}
set webpage [http::data $urlToken]
http::cleanup $urlToken
set url ""
#
# Search the webpage for the homepage URL. Note that Freshmeat
# again provides a URL that causes redirection.
#
regexp {(?:<b>Homepage:</b><br>[[:space:]]*<a href=\"([^[:space:]]*)\">http://.*</a><br>)+?} $webpage match url
set url "http://freshmeat.net$url"
if {[catch {set urlToken [http::geturl $url -timeout 10000]} \
errorMsg]} {
return ""
}
if {[http::ncode $urlToken] != 302} {
http::cleanup $urlToken
return ""
}
#
# The redirected URL is the one we finally want.
#
upvar #0 $urlToken state
array set meta $state(meta)
http::cleanup $urlToken
return $meta(Location)
} else {
#
# There was an error (catch thrown) in retrieving the Freshmeat
# webpage.
#
return ""
}
}
}
FreshmeatMailFilter freshmeat