Updated 2012-11-02 14:05:21 by ninovillari

Richard Suchenwirth 2007-06-27 - http://www.fallingrain.com/world/ (Copyright 1996-2004 by Falling Rain Genomics, Inc.) provides a very large, publicly accessible gazetteer of the world's cities and airports - they must have millions of entries available in HTML format. To avoid that pages get too big, they use a partly very deep URL tree. For instance, to locate my city Konstanz, the URL is
 http://www.fallingrain.com/world/a/K/o/n/s/t/a/n/z/

In other cases, short prefixes are sufficient, e.g. all 131 airports whose code starts with ED (plus some others) are delivered by the URL
 http://www.fallingrain.com/world/a/E/D/

So to search for a place one has to iterate the URL, appending letter after letter (or its decimal Unicode if it is outside of ASCII) until a match is found. Here's a proc that does this - called with a place name, it returns a list of hits, where each hit is a list of
 name type region country lat lon elevation(ft) population(est)

 #!/usr/bin/env tclsh
 package require http
 proc geo'get'rain placename {
    set url http://www.fallingrain.com/world/a/
    set res {}
    foreach c [split $placename ""] {
        set i [scan $c %c]
        if {$i < 65 || $i > 127} {set c $i}
        append url $c/
        set token [http::geturl $url]
        set page [http::data $token]
        http::cleanup $token
        foreach line [split $page \n] {
            if [string match <tr* $line] {
                set line [string map {<td> \x80 </tr> "" ")) (( " ""} $line]
                set fields [split $line \x80]
                regexp {<a.+>(.+)</a>} [lindex $fields 1] -> name
                if [string match $placename* $name] {
                    lappend res [linsert [lrange $fields 2 end] 0 $name]
                }
            }
        }
    }
    set res
 }
#-- If this script is called as toplevel, the function is called, and results displayed:
 if {[file tail [info script]] eq [file tail $argv0]} {
    puts [join [geo'get'rain $argv] \n]
 }

Testing this as a command-line tool:
 /_Ricci> geo_rain.tcl Stockel
 Stockel city {Province de )) (( Brabant} Belgium 50.8333333 4.45 262 309844
 Stockelanda city {(( Alvsborgs Lan ))} Sweden 58.65 12 354 935
 Stockels city {Land Hessen} Germany 50.5666667 9.7333333 1049 14658
 Stockelsberg city {Land Bayern} Germany 49.3833333 11.2666667 1312 22990
 Stockelsdorf city {Land Schleswig-Holstein} Germany 53.9 10.65 49 49614
 Stockelweingarten city {Bundesland Karnten} Austria 46.6694444 13.9377778 1558 9777

It takes its time for the repeated queries, but it's good waiting for :^) The population figures are sometimes a bit high, because it is reported to cover a 7 km radius around the point.

Also, the "region" field contains nonsense for e.g. UK (almost always Aberdeen) and France (usually Alsace), Liechtenstein (always Balzers) - looks like instead of missing data, the alphabetically first region is returned. For Germany, US, etc. things look better.

DKF: Note that for large cities, the population returned can also be too small.

[ninovillari] - 2012-11-02 14:05:21

Hallo, I have noticed that in http://www.fallingrain.com/world/ there is not Estonia (I started searching for EE, as I thought you could have used this acronym), while you can find Tallinn at http://www.fallingrain.com/icao/EETN.html.