Bag of number/time spellers

Summary edit

Some "few-liner" code examples, started by Richard Suchenwirth. Help yourself! Add comments when you know it better!

This page was split off from the more general Bag of algorithms page.

Description edit

gold Number spelling in India lead to a fuller sense of the zero, so number spelling is not as trivial as it looks.

The Story of Indian Zero

(dead link - see a archived version at the Wayback Machine

A Brief History of Zero

, Kristen McQuillin ,1997-07 (revised 2004-01)

Indian number speller edit

e.g., 1760000000000: One Lakh Seventy Six Thousand Crore

code available at Nagu's Tcl Blog

English number speller edit

e.g., en:num 29 => twenty-nine

proc en:num {n {optional 0}} {
    #---------------- English spelling for integer numbers
    if {[catch {set n [expr $n]}]}  {return $n}
    if {$optional && $n==0} {return ""}
    array set dic {
        0 zero 1 one 2 two 3 three 4 four 5 five 6 six 7 seven 
        8 eight 9 nine 10 ten 11 eleven 12 twelve
    }
    if [info exists dic($n)] {return $dic($n)}
    foreach {value word} {1000000 million 1000 thousand 100 hundred} {
        if {$n>=$value} {
            return "[en:num $n/$value] $word [en:num $n%$value 1]"
        }
    } ;#--------------- composing between 13 and 99...
    if $n>=20 {
        set res $dic([expr $n/10])ty
        if  $n%10 {append res -$dic([expr $n%10])}
    } else {
        set res $dic([expr $n-10])teen
    } ;#----------- fix over-regular compositions
    regsub "twoty" $res "twenty" res
    regsub "threet" $res "thirt" res
    regsub "fourty"  $res  "forty"  res
    regsub "fivet"  $res  "fift"  res
    regsub  "eightt"   $res  "eight" res
    set res
} ;#RS

See also the converse English number reader.

I have more such spellers for Arab, Hebrew, Chinese, Thai, but can't paste them here because of Unicode constants (not \u.. escaped). Tcl 8.1+ lets us look into a world where Unicodes can be cut and pasted like everything -- but the future is some time after today ;-)

Better English speller :) edit

keith lea: It's a bit longer, but it supports up to 65 digits (or so. I forget :)

set pronounce {vigintillion novemdecillion octodecillion \
        septendecillion sexdecillion quindecillion quattuordecillion \
        tredecillion duodecillion undecillion decillion nonillion \
        octillion septillion sextillion quintillion quadrillion \
        trillion billion million thousand ""}
proc get_num num {
    foreach {a b} {0 {} 1 one 2 two 3 three 4 four 5 five 6 six 7 seven \
            8 eight 9 nine 10 ten 11 eleven 12 twelve 13 thirteen 14 \
            fourteen 15 fifteen 16 sixteen 17 seventeen 18 eighteen 19 \
            nineteen 20 twenty 30 thirty 40 forty 50 fifty 60 sixty 70 \
            seventy 80 eighty 90 ninety} {if {$num == $a} {return $b}}
    return $num
}

proc revorder list {
    for {set x 0;set y [expr {[llength $list] - 1}]} {$x < $y} \
            {incr x;incr y -1} {
        set t [lindex $list $x]
        set list [lreplace $list $x $x [lindex $list $y]]
        set list [lreplace $list $y $y $t]
    }
    return $list
}

proc pron_form num {
    global pronounce
    set x [join [split $num ,] {}]
    set x [revorder [split $x {}]]
    set pron ""
    set ct [expr {[llength $pronounce] - 1}]
    foreach {a b c} $x {
        set p [pron_num $c$b$a]
        if {$p != ""} {
            lappend pron "$p [lindex $pronounce $ct]"
        }
        incr ct -1
    }
    return [join [revorder $pron] ", "]
}

proc pron_num num {
    set num [string trimleft $num 0-]
    set hundred ""
    set ten ""
    set len [string length $num]
    if {$len == 3} {
        set hundred "[get_num [string index $num 0]] hundred"
        scan [string range $num 1 end] %d num
    }
    if {$num > 20 && $num != $num/10} {
        set tens [get_num [string index $num 0]0]
        set ones [get_num [string index $num 1]]
        set ten [join [concat $tens $ones] -]
    } else {
        set ten [get_num $num]
    }
    if {[string length $hundred] && [string length $ten]} {
        return [concat $hundred and $ten]
    } else {
        # One of these is empty, but don't bother to work out which!
        return [concat $hundred $ten]
    }
}

The result of the test below has been broken up over a few lines. :^)

% pron_form 12345678901234567890123456789012345678901234567890123456789012344
twelve vigintillion, three hundred and forty-five novemdecillion,
six hundred and seventy-eight octodecillion, nine hundred and one 
septendecillion, two hundred and thirty-four sexdecillion, five 
hundred and sixty-seven quindecillion, eight hundred and ninety 
quattuordecillion, one hundred and twenty-three tredecillion, four 
hundred and fifty-six duodecillion, seven hundred and eighty-nine 
undecillion, twelve decillion, three hundred and forty-five nonillion, 
six hundred and seventy-eight octillion, nine hundred and one
septillion, two hundred and thirty-four sextillion, five hundred
sixty-seven and quintillion, eight hundred and ninety quadrillion,
one hundred and twenty-three trillion, four hundred fifty-six
billion, seven hundred and eighty-nine million, twelve thousand,
three hundred and forty-four
%

neat, eh?

DKF: modified to put the word "and" in between a hundreds phrase and a tens-and-units phrase

according to my grade-school math teacher, that would be a bug. The word 'and' is ONLY used to signify the decimal place.

DKF: Your grade-school math teacher is wrong. Or at least not aware of the rules in all English-speaking locales. (Guess that's why they are a grade-school math teacher)

JBR: Your grade school math teacher is correct for American English although this usage is no longer main stream. The "and" after hundreds and before tens and units is not typical usage in American English.

kpv: fixed octal bug. The number 108 came back as "one hundred and 08"

French number speller edit

fr:num 99 => quatrevingt dix-neuf

proc fr:num {n {optional 0}} {
    if {[catch {set n [expr $n]}]}  {return $n}
    if {$optional && $n==0} {return ""}
    array set dic {
        0 zero 1 un 2 deux 3 trois 4 quatre 5 cinq 6 six 7 sept 
        8 huit 9 neuf 10 dix 11 onze 12 douze 13 treize 14 quatorze
        15 quinze 16 seize 20 vingt 30 trente 40 quarante 50 cinquante
        60 soixante 80 quatre-vingt
    }
    if [info exists dic($n)] {return $dic($n)}
    foreach {value word} {1000000 million 1000 mille 100 cent} {
        if {$n>=$value} {
            return "[fr:num $n/$value] $word [fr:num $n%$value 1]"
        }
    } ;#--------------- composing between 13 and 99...
    if $n>=80 {
        set res $dic(80)
        if  $n>80 {append res -[fr:num $n-80]}
    } elseif $n>=60 {
        set res $dic(60)
        if  $n>60 {append res -[fr:num $n-60]}
    } elseif $n>=20 {
        set res $dic([expr $n/10]0)
        if  $n%10 {append res -$dic([expr $n%10])}
    } else {
        set res dix-[fr:num $n-10]
    }
    set res
} ;#RS

German number speller edit

proc de:num {n {optional 0}} {
    #---------------- German spelling for integer numbers
    if {[catch {set n [expr $n]}]}  {return $n}
    if {$optional && $n==0} {return ""}
    array set dic {
        0 null 1 ein 2 zwei 3 drei 4 vier 5 fünf 6 sechs 7 sieben 
        8 acht 9 neun 10 zehn 11 elf 12 zwölf
    }
    if [info exists dic($n)] {return $dic($n)}
    foreach {value word} {1000000 Million 1000 _tausend 100 _hundert_} {
        if {$n>=$value} {
            set res "[de:num $n/$value] $word [de:num $n%$value 1]"
            regsub " _" $res "" res
            regsub "_ " $res "" res
            return $res
        }
    } ;#--------------- composing between 13 and 99...
    if $n>=20 {
        set res $dic([expr $n/10])zig
        if  $n%10 {set res $dic([expr $n%10])und$res}
    } else {
        set res $dic([expr $n-10])zehn
    } ;#----------- fix over-regular compositions
    regsub "chsz" $res "chz" res
    regsub "benz" $res "bz" res
    regsub "weizi"  $res  "wanzi"  res
    regsub  "dreizig"   $res  "dreißig" res
    set res
} ;#RS

Swedish number speller edit

proc sv:num {n {optional 0}} {
    #---------------- Swedish spelling for integer numbers
    #         Based on en|fr|de:num by Richard Suchenwirth
    if {[catch {set n [expr $n]}]}  {return $n}
    if {$optional && $n==0} {return ""}
    array set dic {
        0 noll 1 ett 2 två 3 tre 4 fyra 5 fem 6 sex 7 sju
        8 åtta 9 nio 10 tio 11 elva 12 tolv 14 fjorton 18 arton
    }
    if [info exists dic($n)] {return $dic($n)}
    foreach {value word} {1000000000 miljard 1000000 miljon 1000 _tusen 100 _hundra_} {
        if {$n>=$value} {
            set s {}
            if {$word == "miljard" || $word == "miljon"} {
                set v [expr {$n/$value}]
                if {$v == 1} {
                    set s "en $word"
                } else {
                    set s "[sv:num $v] ${word}er"
                }
            } else {
                set s "[sv:num $n/$value] ${word}"
            }
            set res "$s [sv:num $n%$value 1]"
            regsub " _" $res "" res
            regsub "_ " $res "" res
            regsub " $" $res "" res
            regsub "ttt" $res "tt" res
            return $res
        }
    } ;#--------------- composing between 13 and 99...
    if {$n>=20} {
        set res $dic([expr $n/10])tio
        if  $n%10 {set res $res$dic([expr $n%10])}
    } else {
        set res $dic([expr $n-10])ton
    } ;#----------- fix over-regular compositions
    regsub "ret" $res "rett" res
    regsub "jut" $res "jutt" res
    regsub "niot" $res "nitt" res
    regsub "tvåti" $res "tjug" res
    regsub "fyrat" $res "fyrt" res
    regsub "ttat" $res "tt" res
    set res
}

Italian number speller edit

proc it:num {n {optional 0}} {
    #---------------- Italian spelling for integer numbers
    #---------------- by Stefano Taschini 2002-05-15
    #---------------- based on a template by Richard Suchenwirth
    if {[catch {set n [expr $n]}]}  {return $n}
    if {$optional && $n==0} {return ""}
    array set dic {
        0 zero 1 uno 2 due 3 tre 4 quattro 5 cinque 6 sei 7 sette
        8 otto 9 nove 10 dieci 11 undici 12 dodici 13 tredici 14 quattordici
        15 quindici 16 sedici 17 diciassette 18 diciotto 19 diciannove
        20 venti 30 trenta 40 quaranta 50 cinquanta 60 sessanta 70 settanta
        80 ottanta 90 novanta
    }
    if [info exists dic($n)] {return $dic($n)}
    #--------------- recursive for numbers greater than 99
    foreach {value sing plur} {1000000000 "un miliardo, " " miliardi, " 
        1000000 "un milione, " " milioni, " 
        1000 "mille " "mila " 
        100 cento cento
    } {
        if {$n>=$value} {
            if {$n >= 2*$value} {set res [it:num $n/$value]$plur} {set res $sing}
            append res [it:num $n%$value 1]
            regsub ",? *$" $res "" res
            regsub {ém} $res {em} res    
            regsub {oo} $res {o} res    
            return $res
        }
    }
    #--------------- composing between 21 and 99...
    set dic(3) "tré"
    regsub {nt[ia]([uo])} $dic([expr $n/10]0)$dic([expr $n%10]) {nt\1} res
    set res
}

Turkish number speller edit

tr:num 99 => doksan-dokuz

proc tr:num {n {optional 0}} {
    #---------------- Turkish spelling for integer numbers
    #         Based on en|fr|de:num by Richard Suchenwirth
    #       Adapted to Turkish by Sedat Serper [SeS] 2012-05-09
    if {[catch {set n [expr $n]}]}  {return $n}
    if {$optional && $n==0} {return ""}
    array set dic {
        0 SIFIR 1 bir 2 iki 3 üç 4 dört 5 bes 6 alti 7 yedi 8 sekiz 9 dokuz 
    }
    if [info exists dic($n)] {return $dic($n)}
    foreach {value word} {1000000000 milyar 1000000 milyon 1000 bin 100 yüz} {
        if {$n>=$value} {
            regsub "bir y" "[tr:num $n/$value] $word [tr:num $n%$value 1]" "y" res
            regsub "bir b" $res "b" res
            return $res
        }
    } ;#--------------- composing between 10 and 99...
    if $n>=10 {
        set res $dic([expr $n/10])t
        if  $n%10 {append res -$dic([expr $n%10])}
    } else {
        set res $dic([expr $n-10])
    } ;#----------- fix over-regular compositions
    regsub "birt"   $res "on"     res
    regsub "ikit"   $res "yirmi"  res
    regsub "üçt"    $res "otuz"   res
    regsub "dörtt"  $res "k?rk"   res
    regsub "best"   $res "elli"   res
    regsub "altit"  $res "altmis" res
    regsub "yedit"  $res "yetmis" res
    regsub "sekizt" $res "seksen" res
    regsub "dokuzt" $res "doksan" res
    set res
} ;#SeS

Dutch number speller edit

nl:num 99 => negen-en-negentig

proc nl:num {n {optional 0}} {
    #---------------- Dutch spelling for integer numbers
    #         Based on en|fr|de:num by Richard Suchenwirth
    #       Adapted to Dutch by Sedat Serper [SeS] 2012-05-09
    if {[catch {set n [expr $n]}]}  {return $n}
    if {$optional && $n==0} {return ""}
    array set dic {
        0 nul 1 een 2 twee 3 drie 4 vier 5 vijf 6 zes 
        7 zeven 8 acht 9 negen 10 tien 11 elf 12 twaalf
    }
    if [info exists dic($n)] {return $dic($n)}
    foreach {value word} {1000000000 miljard 1000000 miljoen 1000 _duizend 100 _honderd_} {
        if {$n>=$value} {
            set res "[nl:num $n/$value] $word [nl:num $n%$value 1]"
            regsub " _" $res "" res
            regsub "_ " $res "" res
            return $res
        }
    } ;#--------------- composing between 13 and 99...
    if $n>=20 {
        set res $dic([expr $n/10])tig
        if  $n%10 {set res $dic([expr $n%10])-en-$res}
    } else {
        set res $dic([expr $n-10])tien
    } ;#----------- fix over-regular compositions
    regsub "tweetig"  $res  "twintig"  res
    regsub "drietien" $res  "dertien"  res
    regsub "drietig"  $res  "dertig"   res
    regsub "viertien" $res  "veertien" res
    regsub "viertig"  $res  "veertig"  res
    regsub "achttig"  $res  "tachtig" res    
    set res
} ;#SeS

German time speller edit

Converts exact HH:MM times to fuzzy colloquial wording, optional Northern (viertel vor vier) or Southern style (dreiviertel vier) ;-) Requires de:num (see above)

proc de:time {{t now} {region n}} {
    # format HH:MM time to spoken German (north or south)
    set u "usage: de:time HH:MM|now ?n|s?"
    if {$t=="now"} {set t [clock format [clock seconds] -format %H:%M]}
    if [scan $t %d:%d h min]!=2 {error $u}
    array set dic {
        5 fünf z zehn V Viertel h halb d dreiviertel v vor n nach + ""
    };  # dictionary of words used (digits also from de:num)
    switch -- $region {
        n {set cdic {{} 5n zn Vn zvh 5vh h 5nh znh Vv zv 5v +}}
        s {set cdic {{} 5n zn V zvh 5vh h 5nh znh d zv 5v +}}
        default {error $u}
    };  # byte-coded names for 5-minute increments (see dic)
    set m5 [expr int(round($min/5.))]
    set phr [lindex $cdic $m5]
    if {[regexp {[vVhd+]} $phr] && ![regexp Vn $phr]} {incr h}
    if ![set h [expr $h%12]] {set h 12}
    if ![regexp {[vn]} $phr] {
        set d [expr $min-$m5*5]
        if $d<0 {
            set res "kurz vor "
        } elseif $d>0 {
            set res "kurz nach "
        } else {
            set res "genau "
        }
    }
    foreach i [split $phr ""] {if {$i!="+"} {lappend res $dic($i)}}
    lappend res [de:num $h]
    regsub ein$ $res eins res
    set res
} ;#RS

English ordinal suffix edit

proc en_ordinal n {
    set suffix th
    if {($n%100)<10 || ($n%100)>20} {
        switch -- [expr abs($n)%10] {
            1 {set suffix st}
            2 {set suffix nd}
            3 {set suffix rd}
        }
    }
    append n $suffix
} ;# RS

% en_ordinal 1
1st
% en_ordinal 2
2nd
% en_ordinal 3
3rd
% en_ordinal 4
4th

MG 2004-04-02: And another, which only requires that the string ends in a number, and is also forgiving of strings which don't.

proc en_ordinal2 {num} {
    regexp {^[^0-9]*([0-9]+)$} $num -> tnum
    if {$tnum == "11" || $tnum == "12" || $tnum == "13"} {
        return ${num}th;
    }
    switch [string range $num end end] {
        1 {append num st}
        2 {append num nd}
        3 {append num rd}
        4 -
        5 -
        6 -
        7 -
        8 -
        9 -
        0 {append num th}
    }
    return $num;
}

RS: Note however that this version reacts wrong in the range 11..13:

% en_ordinal2 11
11st
% en_ordinal2 12
12nd
% en_ordinal2 13
13rd

MG: So it does. Fixed now so that it works properly. Also fixed so that numbers ending in '0' get 'th' rather than nothing added.

MG: Second Formatter moved to Formatting durations, now that I've stumbled across the right page.

Category Mathematics

Category Package

Category Human Language

Category Date and Time

Category Local

Arts and crafts of Tcl-Tk programming