- It should be a pure Tcl script that finds the answer for an arbitrary piece of text
- It should be "best" in some way
- the shortest (smallest size of the code for any artistic but consistent definition of size)
- the fastest (as measured by the [time] command)
- the least number of command executions (as measured by the [info cmdcount] command)
- the most fantastic use of regular expressions
- the most obfuscated
- the most elegant variation on the question (such as presenting a histogram of repeated substrings during the computation ...)
- the best in any other challenging category
adavis Does this fit the bill?...
# Text to be checked for repeated substrings. set testtext "well then, has this got some repeated strings? this is a test string" # Length of required substrings minus one. set findlength 3 for {set i 0} {$i <= [expr {[string length $testtext] - $findlength}]} {incr i} { set teststring [string range $testtext $i [expr {$i + $findlength}]] if {[catch {incr result($teststring)}]} {set result($teststring) 1} } parray result...I've used short text and substring for clarity.AM Yes, seems a winner sofar (23 december 10:10 GMT) :)AM I should have thought a bit longer about regexps, here is a solution with a single pattern:
set testtext "..." ;# As above regexp {(.{3,}).*\1} $text => str puts "$str"gives: " th" after thinking very hard ... Not the longest, but at least one that is repeated.(As for speed:
[AM] solution: 10387172 microseconds per iteration [adavis] solution: 4946 microseconds per iteration)
Peter Spjuth: I tried the variation of finding the longest.Inspired by Arjen's regexp I did this very slow thingy:
set minlength 4 set start 0 while {[regexp -indices -start $start "(.{$minlength,}).*\\1" $testtext => ixs]} { foreach {si ei} $ixs break set minlength [expr {$ei - $si + 2}] set start [expr {$si + 1}] } puts \"[string range $testtext $si $ei]\"A faster one:
# Find the longest repeated substring proc findLongestSubstring {str {minlen 5}} { set len [string length $str] set first 0 set last -1 for {set t 0 ; set u [expr {$minlen - 1}]} {$u < $len} {incr u} { if {[string first [string range $str $t $u] $str [expr {$u + 1}]] >= 0} { set first $t set last $u } else { incr t } } string range $str $first $last }
DKF: Hmm, why is the regexp version so slow? This experiment is a little better, but still not great...
for {set i 0;set len 0} {$i+$len < [string length $testtext]} {incr i} { regexp -start $i {(.+).*\1} $testtext -> matched if {[string length $matched] > $len} { puts >>$matched<< set len [string length $matched] } }