ASCII

The American Standard Code for Information Interchange, published by ANSI, specifies a set of 128 characters (control characters and graphic characters, such as letters, digits, and symbols) with their coded representation. 646

is an internationalized version of ASCII. ISO/IEC 8859

is a set of 8-bit codes based on ASCII, intended to be combined with a standard set of terminal control sequences.

Reference edit

INCITS 4:1986, Information Systems - Coded Character Sets - 7-Bit American NationalStandard Code for Information Interchange (7-Bit ASCII) link dead!, incits.org
American Standard Code for Information Interchange, by Dennis Howe, 1995 foldoc
The American Standard Code for Infomation Interchange (alternate), Richard Botting, updated 2010-01-19
ASCII: American Standard Code for Information Infiltration (alternate), Tom Jennings, 2004-10-29
1963: ASCII Debuts, by Mary Brandel
Bob Bemer and Communication (ASCII), Bob Bemer: An article about Bob, by Bob.
That Powerful ESCAPE Character -- Key and Sequences, by Bob Bemer, 2003-10-25
US ASCII, ANSI X3.4-1986 (ISO 645 International Reference Version), The Kermit Project
ASCII Codes, Paul Bourke, 1995

Resource edit

thuglife (thuglife.org): ascii art website
chris.com: more ascii art
ASCII dammit: written as a Python library and capable of ASCIIfying not only MS smart quotes but (with varying degrees of accuracy) most of ISO-Latin-1. For use in fits of parochialism when you want something in ASCII, dammit.

Description edit

ASCII specifies the numerical encoding and meaning of 128 symbols, 95 printable characters and 33 control characters. The first letter of ASCII stands for American, so there's no use complaining that the printable characters don't include various accented letters or non-English characters.

(For those extra characters, you need another character set. Luckily there's many that do the job very nicely, like ISO 8859-1 for western european languages, which are also proper supersets of the ASCII set.)

One very frequently-asked question is how one converts between display and numeric format for ASCII characters. Scan provides the usual answer.

And to convert an integer to a character, Format provides the usual answer.

RS: Pure ASCII is a 7-bit encoding, covering byte values \x00..\x7F, so the "128 symbols" mentioned above have no room. But as the ASCII is at the core of iso8859-x encodings, Windows and Mac codepages, and even the Unicode, it's easy to extract this core for looking at the 94 printable characters:

proc ascii {} {
    set res {}
    for {set i 33} {$i<127} {incr i} {
        append res "[format %2.2X:%c $i $i] "
        if {$i%16==0} {append res \n}
    }
    set res
}

21:! 22:" 23:# 24:$ 25:% 26:& 27:' 28:( 29:) 2A:* 2B:+ 2C:, 2D:- 2E:. 2F:/ 30:0 
31:1 32:2 33:3 34:4 35:5 36:6 37:7 38:8 39:9 3A:: 3B:; 3C:< 3D:= 3E:> 3F:? 40:@ 
41:A 42:B 43:C 44:D 45:E 46:F 47:G 48:H 49:I 4A:J 4B:K 4C:L 4D:M 4E:N 4F:O 50:P 
51:Q 52:R 53:S 54:T 55:U 56:V 57:W 58:X 59:Y 5A:Z 5B:[ 5C:\ 5D:] 5E:^ 5F:_ 60:` 
61:a 62:b 63:c 64:d 65:e 66:f 67:g 68:h 69:i 6A:j 6B:k 6C:l 6D:m 6E:n 6F:o 70:p 
71:q 72:r 73:s 74:t 75:u 76:v 77:w 78:x 79:y 7A:z 7B:{ 7C:| 7D:} 7E:~

CJU: ASCII values > 127 are considered "Extended ASCII," IIRC. Correct me if I'm wrong but I seem to remember that IBM were the ones to originally implement it when they introduced the first IBM PC. Among other fancy symbols, it contains glyphs for drawing single-line and double-line boxes on a text terminal.

AMG: IBM (or was it Microsoft? doubtful...) had the notion of code pages which are little more than alternative fonts for characters numbered 128 through 255. (I suppose a few code pages might redefine ASCII but I don't know if this was ever done.) The code page number used by the system would (should?) get saved somewhere in the filesystem in order to give meaning to the character numbers.

I don't know anyone who has ever used anything other than CP437, the famous one with the solid and shaded boxes and the single and double line drawing characters plus a handful of accented vowels, international currencies, and a couple Greek letters and math symbols. (But no multiplication sign!) So-called ASCII art (like in BitchX) is typically done using CP437 symbols.

LV: One of the most common code page encounters I have has to do with special symbols for the quotation mark ("), the apostrophe ('), and the hyphen (-). In the old days (and I suspect this continues today), Microsoft Word used to use smart characters, which resulted in the character typed in by the user being replaced with another. For instance, one might type the quotation mark, and what was replaced was one of the two special code page characters that more closely resembled open and closed quotation marks.

AMG: Also, Microsoft gets it wrong for contractions where the first part of the word is replaced with an apostrophe, for example when abbreviating a year: '08. Word and friends treat that initial apostrophe as an initial single opening quote, which is incorrect.

lordmundi: I'm curious if someone can help me. I have a mixed string of ascii characters and encoded values, and by that I mean, the function I am calling returns a normal string for most characters but for items like spaces, parenthesis, etc., it encodes them with the ascii value so that a string can be passed back and forth without worrying about special characters. For example, one string I have is:

EDGE\032on\032localhost\032\04025880\041

so, as you can see, all of the spaces are written as "\032" and parenthesis with their ascii code, and so on. How can I pass this string to a function and have it interpret any "\###" codes it encounters and return the the decoded string?

AMG: Try [subst -nocommands -novariables].

RLE 2011-03-08: Do you have control over what is returned by the other function? If so, and you can modify it to return hex encoded values (\x20 instead of \032) or to encode the characters as octal then subst will perform the backslash substitutions for you.

AMG: RLE, I thought that was octal, but I guess you're right: space should be \040. However, \x should be avoided due to its surprising behavior when the encoded character is followed by valid hexadecimal digits. Use four-digit \u instead.

lordmundi: Unfortunately I don't have control over it. This is the way the encoded string is coming back to me from the bonjour/DNS-SD protocol (mainly so it can be sent in later in the same format). This is what I ended up making to decode the string for printing - let me know if you guys see any way I could improve it:

# A proc to remove leading zeroes from a string
proc stripzeros {value} {
    set retval [string trimleft $value 0]
    if { ![string length $retval] } {
        return 0
    }
    return $retval
}

# A procedure to decode decimal ascii sequences in a string
proc get_printable_name { encoded_name } {
    set regex {\\[0-9][0-9][0-9]}
    set sub {[format %c [stripzeros [string range "\\&" 1 end]]]}
    set retval [subst [regsub -all $regex $encoded_name $sub]]
    return $retval
}

puts [get_printable_name {kramer\032\09100\05826\05818\058fc\0588f\05809\093._workstation._tcp.local.}]

prints:

kramer [00:26:18:fc:8f:09]._workstation._tcp.local.

RLE 2011-03-09: With a small tip from the discussion on the scan page:

proc get_printable_name {encoded_name} { 
    return [subst [regsub -all {\\([0-9]{3})} $encoded_name {[
        format %c [scan \1 %d]]} ] ]
}

% get_printable_name {kramer\032\09100\05826\05818\058fc\0588f\05809\093._workstation._tcp.local.}
kramer [00:26:18:fc:8f:09]._workstation._tcp.local.

lordmundi 2011-03-11: Wow... that is a lot better. Thanks!

Category Characters

Category Glossary

See Also edit

Reference edit

Resource edit

Description edit