Updated 2016-07-10 11:54:18 by dkf

Keith Vetter 2003-11-12 : Babelfish is sorely missing a key language, one known to many school kids, and that's Pig Latin. Luckily Tcl can come to the rescue. See Tclay - Ooltay Ommandcay Anguagelay for a famous example text.

Everything you type into the left text box will be automatically translated into Pig Latin in the right text box. It only works with ASCII characters and is not the smartest about deciding what exactly is a word but that's left as an exercise for the reader.

For the technically minded, I just learned the linguistic terms describing where the split occurs: onset is the initial consonant cluster and the rime is the remainder of the word.
##+##########################################################################
 #
 # piglatin.tcl -- translates english text into pig latin
 # by Keith Vetter, Nov 12, 2003
 #
 
 package require Tk
 package require textutil
 
 proc PigLatinizeWord {word} {
    if {! [regexp -nocase {^[a-z]} $word]} { return $word } ;# Not alpha
 
    # Tricky part is keeping case consistent
    set case tolower
    if {[regexp {^[A-Z]} $word]} {set case totitle}
    if {[regexp {^[A-Z]{2,}} $word]} {set case toupper}
 
    if {[regexp -nocase {^[aeiou]} $word]} { set word "w$word"} ;# Initial vowel
    regexp -nocase {^([^aeiou][^aeiouy]*)(.*)} $word => onset rime
    set piglatin "$rime${onset}ay"
    return [string $case $piglatin]
 }
 proc DoText {} {
    set txt [string trimright [.tplain_org get 1.0 end]]
 
    # Not the smartest way to split into words but simple and good enough
    set words [::textutil::splitx $txt {(\W)}]  ;# Split into words
    
    .tpig config -state normal
    .tpig delete 1.0 end
    foreach word $words {
        .tpig insert end [PigLatinizeWord $word]
    }
    .tpig config -state disabled
 }
 
 proc DoDisplay {} {
    wm title . "English to Pig Latin Translation"
    label .lplain -text "Plain Text"
    .lplain configure -font "[font actual [.lplain cget -font]] -weight bold"
    label .lpig -text "Pig Latin Text"
    .lpig configure -font "[font actual [.lpig cget -font]] -weight bold"    
    text .tplain -width 50 -height 30 -wrap word
    text .tpig -width 50 -height 30 -wrap word -bg #dbe1ff -state disabled
    canvas .plain2pig -width 40 -height 20 -bd 0 -highlightthickness 0 
    .plain2pig create line 5 10 35 10 -tag z -arrow last -width 5
    
    grid .lplain x .lpig -row 0
    grid .tplain x .tpig -sticky news
    grid config .plain2pig -row 1 -column 1 -padx 5
 
    grid columnconfig . {0 2} -weight 1
    grid rowconfig . 1 -weight 1
 
    focus .tplain
    rename .tplain .tplain_org
    proc .tplain {cmd args} {
        set rval [eval .tplain_org $cmd $args]
        if {$cmd == "insert" || $cmd == "delete"} {
            DoText
        }
        return $rval
    }
 }
 
 DoDisplay
 set sample {NAME
 Tcl - Tool Command Language 
 SYNOPSIS
 Summary of Tcl language syntax.
 
 DESCRIPTION
 The following rules define the syntax and semantics of the Tcl language: 
 
 
 [1] Commands.
 A Tcl script is a string containing one or more commands. Semi-colons and newlines are command separators unless quoted as described below. Close brackets are command terminators during command substitution (see below) unless quoted. 
 ...
 }
 .tplain insert end $sample

RS: Super-cute!

TV Is this transformation strictly invertable? It seems redundant, but eh, we don't ask 'compress' for that concerning normal language either..

KPV It's almost but not quite invertable due to the vowel rule, specifically any word that starts with a "w" and is also a word when that "w" is beheaded will both map to the same pig latin word. For example, wad and ad both map to adway. A quick search of a word list I have came up with about 120 such words, my favorites being witch, wassail and weighty.

KPV I just realized that there's a whole second class of non-invertable words. These usually are pronounced differently but are spelt the same. This class is most easily described by example: thin and hint both become inthay in pig latin (they differ in pronunciation due to where the syllable is split). Other fun examples include kiss/skis and quirts/squirts. In my word 95,000 entry word list I get 444 words which are non-introvertable.

Donald Arseneau That second class shows a "bug" in the pig latin rules. Since there are rules for leading sh and th, there needs to be a rule for a leading h coupled with a terminal s or t. Omitting the h would make sense (but not help ensure reversible mappings).

lexfiend 2008-02-05: Somehow, Tcl --> Tclay doesn't seem right to me. Given that it's an acronym and conventionally pronounced "Tickle", I would've thought the correct transformation should be Icltay or something similar.