Updated 2012-07-08 22:19:18 by RLE

Richard Suchenwirth - In Tclworld I want to have much and detailed geographic data, both for display on the map and for browsing additional facts. (The two are interrelated: for sensible map display e.g. of cities, it is helpful to know the population and whether that city is a capital - population helps in assigning display level, font size...; names of capitals might be underlined and/or given a different map mark).

One problem is that not all geographic names are unique - which would be required for using them as array indices in the database. A city named Hamilton exists for instance in England, but also in Australia, Canada, New Zealand (colonialists sometimes have limited imagination ;-), as well as in the US states of Alabama, Montana, and Ohio. In the US case, the typical solution is to append the state name, often in the two-letter USPS abbreviation. For the countries of the world (and dependent territories), there are also two-letter codes in ISO 3166, but not disjoint from the US state codes (e.g. CA is both California and Canada - see language/country name servers). However, one could introduce a hierarchic scheme, e.g. from specific to generic, like Web domains:
 Hamilton,GB
 Hamilton,OH.US
 Hamilton,CA  --- Canada, not California, that would be: Hamilton,CA.US

or, for having the tree structure clearer (e.g. in sorted lists):
 GB:Hamilton
 US.OH:Hamilton

in which the colon marks the place where the "display name" (to be put on the map, or database browser) begins. Prefix and display name are easily extracted with the regexp
 regexp {((.+):)?(.+)} $qualifiedname -> - prefix displayname

Two-letter codes for administrative subdivisions are also usual in Canada,
 + CA.AB = Alberta
 + CA.BC = {British Columbia}
 + CA.MB = Manitoba
 + CA.NB = {New Brunswick}
 + CA.NF = Newfoundland
 + CA.NT = {Northwest Territories}
 + CA.NS = {Nova Scotia}
 + CA.NU = Nunavut
 + CA.ON = Ontario
 + CA.PE = {Prince Edward Island}
 + CA.QC = Quebec
 + CA.SK = Saskatchewan
 + CA.YT = Yukon

Switzerland, Italy, and defined but rarely used in China, Germany, etc. (Others: please add!) For France, one might use the two-digit departement codes as used on number plates (e.g. 13: Bouches du Rhône (Marseille); 75: Paris).

Resolution of such "fully qualified pathes" could be done with database entries like
 + GB = {Great Britain}
 + US = {United States of America}
 + US.OH = Ohio

(+ is an alias for the database, to make these lines valid Tcl commands - see A little database API). Here's how to resolve such geocodes to human-readable:
 proc explain {db code} {
    set res ""; set region ""; set name ""
    regexp {(..)([.]([^:]+)(:(.+))?)?} $code -> country - region - name
    if {$name!=""} {append res "$name in "}
    if {[set dbRegion [$db $country.$region =]] != ""} {
        set region $dbRegion
    }
    if {$region!=""} {append res "$region, "}
    if {[set dbCountry [$db $country =]] != ""} {
        set country $dbCountry
    }
    append res $country
 }
 % explain + GB
 Great Britain
 % explain + US.OH
 Ohio, United States of America
 % explain + US.OH:Hamilton
 Hamilton in Ohio, United States of America

One might consider a level above countries, which would of course be continent. One letter is too short, as the majority of continent names starts with A, so for instance
 + AFR = Africa
 + AMN = {North America}
 + AMS = {South America}
 + ANT = Antarctica
 + ASI = Asia
 + AOC = {Australia & Oceania}
 + EUR = Europe

and a pedant might even construct "fully qualified names" like this:
 Terra.AMN.US.CO.Denver

to be prepared for interstellar extensions - but we better keep the data compact, and using the country code as "top-level domain" should be sufficient for the foreseeable future.

AM While unusual in the Netherlands ever since official postal codes were introduced, the following is an understandable collection of two-letter abbreviations for Dutch provinces:

GR = Groningen FR = Friesland DR = Drenthe OV = Overijssel GL = Gelderland UT = Utrecht NH = Noord-Holland ZH = Zuid-Holland ZL = Zeeland NB = Noord-Brabant LB = Limburg FL = Flevoland

(Sorry, there may be official English names, but I would not know them. Historically speaking, we might need to add the seven original "gewesten", but this should do for any map later than 1850 :))