Page contents
Commands edit
ASCII map edit
No algorithm at all, but may come in handy ;-)If you're on a UNIX box, try: man asciiproc ascii {} {return { 00 nul 01 soh 02 stx 03 etx 04 eot 05 enq 06 ack 07 bel 08 bs 09 ht 0a nl 0b vt 0c np 0d cr 0e so 0f si 10 dle 11 dc1 12 dc2 13 dc3 14 dc4 15 nak 16 syn 17 etb 18 can 19 em 1a sub 1b esc 1c fs 1d gs 1e rs 1f us 20 sp 21 ! 22 " 23 # 24 $ 25 % 26 & 27 ' 28 ( 29 ) 2a * 2b + 2c , 2d - 2e . 2f / 30 0 31 1 32 2 33 3 34 4 35 5 36 6 37 7 38 8 39 9 3a : 3b ; 3c < 3d = 3e > 3f ? 40 @ 41 A 42 B 43 C 44 D 45 E 46 F 47 G 48 H 49 I 4a J 4b K 4c L 4d M 4e N 4f O 50 P 51 Q 52 R 53 S 54 T 55 U 56 V 57 W 58 X 59 Y 5a Z 5b [ 5c \ 5d ] 5e ^ 5f _ 60 ` 61 a 62 b 63 c 64 d 65 e 66 f 67 g 68 h 69 i 6a j 6b k 6c l 6d m 6e n 6f o 70 p 71 q 72 r 73 s 74 t 75 u 76 v 77 w 78 x 79 y 7a z 7b { 7c | 7d } 7e ~ 7f del }} ;#RS
Linebreak edit
Break a string into lines of specified maximum length: join the output of this with \n for hard-wrapped text:proc linebreak {s {width 80}} { set res {} while {[string length $s]>$width} { set pos [string wordstart $s $width] lappend res [string range $s 0 [expr $pos-1]] set s [string range $s $pos end] } lappend res $s } ;# RSArjen Markus An elegant solution, but two remarks:
- The file word.tcl remarks that word boundaries are platform-dependent. Is this also used in string wordstart?
- More importantly: The above fails if a word is longer than the given width! You get into an endless loop.
word Wrap Via Tk Text Widget edit
D. McC: A problem with string wordstart, if you want hard-wrapped text that reliably looks right, is this (from the string wordstart page on the Wiki): "A word is considered to be any contiguous range of alphanumeric (Unicode letters or decimal digits) or underscore (Unicode connector punctuation) characters, or any single character other than these" (emphasis added). So, if you have a word (in the ordinary sense) preceded or followed by punctuation marks, string wordstart will treat the punctuation marks as separate "words," and they may not come out on the same line as the word they go with!The Tk text widget, with word wrap on, displays wrapped text that reliably looks right, with no punctuation marks separated from the adjoining words. So, to get hard-wrapped text that looks right, you can determine the locations where the text widget ends the wrapped lines, and replicate the text with newlines at those locations. Here's some code I wrote that does this (feel free to suggest improvements). A global variable "formawid" is set to hold the desired width in characters of the wrapped text, and the text widget (here, ".tx") is configured to that width, with word wrap:.tx configure -width $formawid -wrap $wordwrap wm geometry . {} ; # Make sure the toplevel shrinks or expands to fitThen, after a delay of one-tenth of a second (which seems to be needed for some reason), the following procedure is run. The "whattodo" arg can be either "print" (to prepare hard-wrapped text for printing) or "show" (to display hard-wrapped text in the text widget). If there is a selection, only the selected text will be hard-wrapped; if there isn't, all text in the widget will be hard-wrapped.
proc formanew {whattodo} { global formawid # Identify beginning and end of text to format: if {[.tx tag ranges sel] eq {}} { set selon 1.0 set seloff [.tx index end] } else { set selon [.tx index sel.first] set seloff [.tx index sel.last] } set texin [expr int($selon)] set texend [expr int($seloff)] # Initialize variable to hold output: set formatext {} for {set i $texin} {$i <= $texend} {incr i} { # Get text to newline: set endolin [.tx index $i.end] set endochar [lindex [split $endolin .] end] set whatline [.tx get $i.0 $endolin] # If line is blank, insert only newline into output: if {[string trim $whatline] eq {}} { append formatext \n continue } # If not, then find out where line is wrapped: for {set c 1} {$c <= $endochar} {incr c} { .tx see $i.$c set ceemin [expr {$c-1}] set boxie [.tx get $i.$ceemin] # Get y coordinates of bounding boxes for adjoining characters: set pixy [lindex [.tx bbox $i.$ceemin] 1] set nexy [lindex [.tx bbox $i.$c] 1] # If y coordinate of bounding box is greater than for # preceding character, line has been wrapped, so # insert preceding character plus newline into output: if {$nexy > $pixy} { append formatext $boxie\n .tx see $i.$c } else { # Otherwise, insert only the preceding character: append formatext $boxie } } # Replicate existing newline from text widget: if {$i < $texend} { append formatext \n } } if {$whattodo eq {print}} { return $formatext } else { .tx delete $selon $seloff .tx insert $selon $formatext .tx edit separator } } after 100 formanewhv If you want to format the lines, look into textutil::adjust. It can do left-, right-, or center-justify, adjust the text width, and do hyphenation.
Occurrence Count edit
Count number of occurrences of a substring in a string:proc scount {subs string} { regsub -all $subs $string $subs string } proc scount2 {subs string} { regexp -all $subs $string ;# 8.3 }The latter can also be defined by currying (see Custom curry):
interp alias {} scount3 {} regexp -allJH writes, in a comp.lang.tcl thread talking about this subject
proc scount4 {subs string} { regexp -all ***=$subs $string }and AM initially proposes:
proc scount5 {subs string} { set count [llength [split [string map [list $subs \uFFFF] $string] \uFFFF]] incr count -1 }but that doesn't make use of $string, so something is missing...Stephane A. writes:
proc countstrings {data search} { set l [string length $search] set count 0 while {[set i [string first $search $data]]>=0} { incr count incr i $l set data [string range $data $i end] } set count }
Reverse a String edit
proc sreverse s {join [lreverse [split $s {}]] {}} ;#RSwhere lreverse is of course on Additional list functions...
FW notes: this goes through a transitional list form. Still being a CPU speed and memory junkie, I feel obliged to correct this :P I'll call it sreverse:
proc sreverse s { set res {} set i [string length $s] while {$i >= 0} { set res "$res[string index $s [incr i -1]]" } return $res }RS agrees, except that append may be more efficient:
proc sreverse s { set l [string length $s] set res {} while {$l} {append res [string index $s [incr l -1]]} set res }LV notes that two more solutions recently appeared on comp.lang.tcl: The first is by Michael Schlenker:
proc string_reverse str { set rts {} for {set i [string length $str]; incr i -1} {$i >= 0} {incr i -1} { append rts [string index $str $i] } return $rts }and the second by [(R. T. Wurth]:
proc srev s { return [join [::struct::list reverse [split $s {}]] {}] }Am 2005-12-13: The question came up again in the chatroom yesterday, and I decided to measure the performance of various alternatives ... See: Performance of string reversing algorithms
Longest Common Prefix edit
Also found in tcllib::textutilArjen Markus At times I have had the need (not too urgent though) of a function/proc to determine the position where two strings (or lists) become different:"Arjen Markus" versus "Arjen Marcus" --> position: 9 "Arjen Markus" versus "Arjan Markus" --> position: 3(though mostly these are cases where my name and the variations that I commonly encounter are not involved) - RS: That would be the length of the common prefix which one might implement like this:
proc commonPrefix {a b} { set res {} foreach i [split $a {}] j [split $b {}] { if {$i eq $j} {append res $i} else break } set res }Here is a generalized version that takes any number of strings and returns the prefix which all have in common:
proc longestCommonPrefix strings { set res {} set i 0 foreach char [split [lindex $strings 0] {}] { foreach string [lrange $strings 1 end] { if {[string index $string $i] ne $char} { return $res } } append res $char incr i } set res } ;# RSMGS 2004-05-13: Here's another way - sort a list of strings and then compare the first and last:
proc string:common {string1 string2} { set i 1 while {[string equal -length $i $string1 $string2]} {incr i} return [string range $string1 0 [expr {$i-2}]] } proc string:common:list {args} { if { [llength $args] < 2 } { return -code error "wrong # args: must be >= 2" } set list [lsort $args] return [string:common [lindex $list 0] [lindex $list end]] }[JEC] 2010-03-18, notes the above string:common has a bug and never returns on identical strings. Even when fixed it does O(n^2) comparisonsBBH 2004-05-13: This got me thinking that RE back references would be a perfect fit, a little playing & it turns out I was right ;)
proc prefix {s1 s2} { regexp {^(.*).*\0\1} "$s1\0$s2" all pref return $pref }and easily generalized for multiple words
proc prefix {str args} { set re {^(.*).*} foreach s $args { append re {\0\1.*} append str "\0$s" } regexp $re $str all pref return $pref }NEM 2005-03-17: Another RE variation:
proc reprefix {str1 str2} { regexp -inline ^[join [split $str1 {}] ?]? $str2 }NEM 5mins later. 'Tis broken:
% reprefix iamadonut iamatoilet iamatWhoops.
Subscripts edit
The characters for which subscript versions exist in Unicode (digits, parens, some operators -see the list in code) are converted to their subscripted Unicodes. Others are left unchanged. (For superscripts the positions 1,2,3 seem not to be filled in my installation, so I didn't add the corresponding code...)proc subscript s { set res {} foreach char [split $s {}] { set pos [lsearch -exact {0 1 2 3 4 5 6 7 8 9 + - = ( )} $char] if {$pos>=0} {set char [format %c [incr pos 0x2080]]} append res $char } set res } ;# RS % puts H[subscript 2]O H?O
Strictly a Floating Point Number edit
the regular string is double term fires on integers too. The following fires on real floats only:proc isFloat x {expr {[string is double -strict $x] && ![string is int $x]}} ;#RS
Sort a String edit
AM: For some symbolic manipulations (w.r.t. group theory, oh just for the fun of it), I need to sort the characters in a string, so that for instance ababa becomes aaabb.This can be done via:proc charsort {string} { return [join [lsort [split $string {}]] {}] }
Longest String edit
Here's a little proc to find the length of the longest string in a list:proc maxlen args { # Written 2003 by Ed Suominen, hereby placed in the public domain if {[llength $args] > 1} { if { [set x [string length [lindex $args 0]]] - [set y [string length [lindex $args 1]]] < 0 } { return -1 } elseif {$x == $y} { return 0 } else { return 1 } } else { set x [lsort -decreasing -command maxlen [lindex $args 0]] return [string length [lindex $x 0]] } }Alternative:
proc maxlen args { if {[llength $args] == 1} { set args [lindex $args 0] } set res [string length [lindex $args 0]] foreach i [lrange $args 1 end] { set l2 [string length $i] if {$l2 > $res} {set res $l2} } set res } ;# RSKPV one liner using lmap:
set words {now is the time for all good men to come to the aid of their country} set how_long [::tcl::mathfunc::max {*}[lmap v $words {string length $v}]] set longest_word [lindex [lsort -index 1 -integer [lmap v $words { list $v [string length $v] }]] end 0]
String Insert edit
Insert a string into another at a given position, analogous to linsert:proc strinsert {string pos char} { set original [string index $string $pos] string replace $string $pos $pos $char$original } ;# RS % strinsert hello 1 x hxelloAMG: Why doesn't this come standard in the string command?Also, don't let the formal parameter name "char" fool you. With this code, the string to be inserted can be any length. I'd write it this way:
proc strinsert {string pos insertion} { append insertion [string index $string $pos] string replace $string $pos $pos $insertion } namespace ensemble configure string -map\ [dict replace [namespace ensemble configure string -map] insert strinsert] % string insert hello 1 xxx hxxxelloOr, even faster:
proc strinsert {string pos insertion} { string replace $string $pos $pos $insertion[string index $string $pos] }AMG: See TIP 475 [1] for a new [string insert] command.
N-th Occurrence of a Substring edit
Silas 2005-10-14: If you want to find the third or fourth occurrence of a string in another string, you'll have to use string first many times. The following proc could help:proc mystringFirst {substring mystring ocorrencia} { if {!$ocorrencia} {return -1} set index 0 for {set i 0} {$i < $ocorrencia} {incr i} { set index [string first $substring $mystring] set mystring [string range $mystring [expr $index + 1] [string length $mystring]] } return $index }MG 2017-08-17 This doesn't seem to work correctly (or at least gives very unexpected results for me):
% set str "foo bar baz boing sprocket banana" % mystringFirst ba $str 1 4 % mystringFirst ba $str 2 3A couple of alternatives that give the results I would expect:
proc mystringFirst2 {needle haystack count} { set pos -1 while { $count && [set pos [string first $needle $haystack $pos+1]] > -1 } { incr count -1 } return $pos } proc mystringFirst3 {needle haystack count} { set indices [regexp -all -inline -indices $needle $haystack] if { [llength $indices] >= $count } { return [lindex $indices $count-1 0] } else { return -1 } }mystringFirst3 obviously takes a regexp needle, rather than a plain string, but if you add
set needle [regsub -all {([^a-zA-Z0-9 ])} $needle {\\\1}]to the start it'll work on a plain string instead.
String to Proper English Title edit
D. McC 2005-10-20: The "string totitle" subcommand only, and always, capitalizes the first letter in a string, no matter how many words are in the string or what the words are. Here's some code to convert a multiple-word expression in English to a properly capitalized title, in which all initial letters are capitalized except those of articles, conjunctions, and prepositions with four or fewer letters. (Needs improvement--see revised version, farther below)proc title str { set output {} set nocaps [list a an and at but by for from in into of on or the to with] foreach word [split $str] { if {[lsearch $nocaps $word] == -1} { set word [string totitle $word] } append output "$word " } return [string trim $output] }Example:
% set bogomips "groundwork of the metaphysics of balderdash" groundwork of the metaphysics of balderdash % title $bogomips Groundwork of the Metaphysics of BalderdashDKF: Cool! It's not quite right though. The following gets closer, but isn't right yet either (testing on your paragraph above, of course!)
proc title {str} { set output {} set nocaps {a an and at but by for from in into of on or the to with} foreach word [regexp -all -inline {[\w'.]+|\W+} $str] { if {[string match {[A-Za-z]*} $word] && [lsearch $nocaps $word] == -1} { # In 8.5 should use the 'ni' operator instead of the [lsearch] set word [string totitle $word] } append output $word } return [string trim $output] }LES: You forget that you'll want those "nocaps" words to be turned to title if they begin a sentence. Here is a proc I use to rename files (usually mp3). It doesn't handle punctuation, but handles hyphens, e.g.: "Artist - A Song Title". Tweaking it to also take punctuation (periods or first word of title) in consideration should be trivial. That is not desired in file renaming because periods (dots) do not mean the same as in regular titles.
proc mp3 args { set _nocaps { a as à às ao aos de da das do dos e em na nas no nos o os ou para por que sem sob an and are at but for from if in is it's not of on or the to under vs vs. with without au aux avec dans des en et le la les ou par pour qui si con del el en la las los sin y }; # German and Italian, anyone? foreach _file [glob *] { if { $_file eq {.:} } { continue } # lowercase the whole name set _old $_file set _file [string tolower $_file] set _new {} ;# empty _new in each iteration set _c 0 ;# zero counter in each iteration foreach _word $_file { incr _c # anything right after " - " probably is the first word of a phrase. # if current word is "-", reset counter so next word is 1 and gets caps if {$_word eq {-}} {set _c 0} # if current word is 1, it gets caps if {$_c == 1} {set _word [string totitle $_word]} # if it is not in exceptions, it gets caps if {[lsearch $_nocaps $k ] < 0} { set k [string totitle $k] } # add the word to the new name lappend _new $k } # UNCOMMENT the two next lines if you want files renamed automatically #file rename $_old RENAME_TEMP #file rename RENAME_TEMP $_new puts "$_old\n$_new\n" } }D. McC: OK, I do want the "nocaps" words capitalized if they (1) begin the title or (2) come right after a colon. Also, as I belatedly noticed, quotation marks need to be stripped out for "string totitle" to work right, but then put back into the finished product. I'd go a long way to avoid a regular expression like {[\w'.]+|\W+}, though. Let's try this:
proc title str { set output {} set nocaps [list a an and at but by for from in into of on or the to with] set count 0 foreach word [split $str] { # Strip quotation marks: if {[string index $word 0] eq "\""} { set quote 1 set word [string trim $word \"] } else { set quote 0 } # Always capitalize the first word; otherwise, # don't capitalize any words in the "nocaps" list: if {$count == 0 || [lsearch $nocaps $word] == -1} { set word [string totitle $word] } # Add word plus space, with or without quotation marks, to output: if {$quote} { append output "\"$word\" " } else { append output "$word " } # Capitalize any word after a colon: if {[string index $word end] eq {:}} { set count 0 } else { incr count } } return [string trim $output] }Example:
% set wordsworth {what I say is: by gum, give me the finest "bogomips" in the universe!} what I say is: by gum, give me the finest "bogomips" in the universe! % title $wordsworth What I Say Is: By Gum, Give Me the Finest "Bogomips" in the Universe!
proc mystringFirst {substring mystring ocorrencia} { if {!$ocorrencia} {return -1} set index 0 for {set i 0} {$i < $ocorrencia} {incr i} { set index [string first $substring $mystring] set mystring [string range $mystring [expr $index + 1] [string length $mystring]] } return $index }pw: Here's an example that doesn't use foreach.
proc title-case text { for {set index 0} {$index < [string length $text]} {incr index} { if { [string wordstart $text $index] == $index } { set text [string replace $text $index $index [string toupper [string index $text $index]]] } else { set text [string replace $text $index $index [string tolower [string index $text $index]]] } } return $text }
Misc edit
LV: Take a look at Tcl-FAQ's part 5 to gather ideas on other packages with string functions.Letterspacing edit
escargo:spreading a string by inserting blanks between each two characters. KBK notes that there's a printer's proverb: Anyone who would l e t t e r s p a c e l o w e r c a s e would steal sheep. Simply functional:proc letterspace s {join [split $s {}] { }} ;# RS % letterspace {steal sheep} s t e a l s h e e pIn the book Stop Stealing Sheep & find out how type works, by Erik Spiekermann and E.M. Ginger, Adobe Press, 1993, on page 7, in the side bar the quotation attributed to Frederic Goudy was, "Anyone who would letterspace black letter would steal sheep."