Hanglish

Richard Suchenwirth 2001-02-06 -- Hangul is the Korean writing system. Each syllable is represented by an often square arrangement of its constituting letters ("jamo") in either left-right or top-bottom fashion. Transliteration is element-by-element conversion of text in one writing system to another (if to English/Latin, it's also called romanization). "Hanglish" is the name of the following romanization scheme from The Lish family , chosen in analogy to Greeklish often used on the Net to write Greek in Latin.

There is an ISO agreement (ISO/TC46/SC2/WG4, 1992) on Hangul transliteration from which I slightly deviate:

let only L stand for "r,l"
let only Q stand for "('),ng" - at least it looks circular. Empty strings are ugly in transliteration. All other consonants are unchanged.
let W stand for "eu", let E stand for "eo"
let EI stand for "e" (two distinct graphemes, easily segmented)
for the bottom/right diphtongs, don't use "w-" indiscriminated for U and O. Instead, OA for "wa", UE for "weo", UI for "wi", WI for "yi"
the palatal vowels "ya, yeo, yo, yu" will hardly be segmented into the extra dot and the base vowel. So, use remaining letters for these graphemes: V for "ya", X for "yeo", Y for "yo", Z for "yu"
thus, the "e" above is rendered as EI, "ye" as XI. While XI is easily segmented, the "ae/yae" diphtongs would rather come as single graphemes.

One could still express the composition with AI for "ae", VI for "yae". For best adaptation to OCR/interpretation needs, I however prefer to use the two left-over letters: F for "yae", R for "ae".

After so much theory, here's the code:

 proc hangul2hanglish {numuc} {
    # takes a numeric Unicode so far (until scan works, from 8.1b1)
    set ncount [expr 21*28]
    set index [expr $numuc - 0xAC00] ;# offset of Unicode 2.0 Hangul
    append res [lindex {G GG N D DD L M B BB S SS Q J JJ C K T P H}\
            [expr int($index/$ncount)]]
    append res [lindex {A R V F E EI X XI O OA OR OI Y U UE UEI UI Z W WI I}\
            [expr int(($index%$ncount)/28)]]
    append res [lindex {"" G GG GS N NJ NH D L LG LM LB LS LT LP LH \
            M B BS S SS Q J C K T P H}\
            [expr $index%28]]
    return $res
 }
 proc hanglish2uc {hanglish} {
    # convert a Hanglish string to one Unicode 2.0 Hangul if possible
    set L ""; set V "" ;# in case regexp doesn't hit
    regexp {^([GNDLMBSQJCKTPH]+)([ARVFEIXOYUZW]+)([GNDLMBSQJCKTPH]*)$} \
            [string toupper $hanglish] ->  L V T 
    ;# lead consonant - vowel - trail cons.
    if {$L=="" || $V==""} {return $hanglish}
    set l [lsearch {G GG N D DD L M B BB S SS Q J JJ C K T P H} $L]
    set v [lsearch {A R V F E EI X XI O OA OR OI Y U UE UEI UI Z W WI I} $V]
    set t [lsearch {"" G GG GS N NJ NH D L LG LM LB LS LT LP LH  \
            M B BS S SS Q J C K T P H} $T] ;# trailing consonants
    if {[min $l $v $t]<0} {return $hanglish}
    set uc [expr $l*21*28 + $v*28 + $t + 0xAC00]
    return [format %c $uc]
 }
 proc hanglish {args} {
    # tolerant converter: makes Unicode 2.0 Hangul where possible
    set res ""
    foreach i $args {
        set word ""
        foreach {from to} {
            ai r vi f
        } {regsub -all $from $i $to i}
        foreach j [split $i "-"] {
            set t [hanglish2uc $j]
            if {$j==$t} {set word $i; break} ;# all syllables must fit
            append word $t
        }
        lappend res $word 
    }
    return $res
 }

Usage example: [hanglish Se-qul] produces the hangul for s.Korea's capital. Note that the circle jamo is written as Q, although it's silent at the beginning of a syllable (at end, it is /ng/)

These routines have been incorporated into taiku, see taiku goes multilingual, which also introduces liberalisations - for Q you can write NG, or you can omit it at syllable-initial position, so se-ul has the same effect there. Also, going both ways, and with a GUI: A little Hangul converter.

Arts and crafts of Tcl-Tk programming

A little Korean editor

Category Characters