time {perl testfind.pl} 5 | 171135507 microseconds per iteration |
time {globfind C:/ {string match -nocase *.pdf} -type f} 5 | 115472017 microseconds per iteration |
time {globtraverse C:/ -pattern *.pdf} 5 | 97552737 microseconds per iteration |
time {::fileutil::find C:/ {string match -nocase *.pdf}} 5 | 309160279 microseconds per iteration |
time {::fileutil::findByPattern C:/ -glob *.pdf} 5 | 300450509 microseconds per iteration |
time {globfind C:/ -pat *.pdf -type f} 5 | 95964709 microseconds per iteration |
Function | Timing Ratio | Notes |
globfind C:/ -pat *.pdf -type f | 1 | using prefilter only |
globtraverse C:/ -pattern *.pdf | 1.02 | core prefilter routine |
globfind C:/ {string match -nocase *.pdf} -type f | 1.20 | equivalent use of postfilter |
perl testfind.pl | 1.78 | used find2perl to generate test script |
::fileutil::findByPattern C:/ -glob *.pdf | 3.13 | |
::fileutil::find C:/ {string match -nocase *.pdf} | 3.22 |
% perl find2perl / -name *.pdf > testfind.plBelow instructions are excerpted from the program file:
globfind.tcl -- Written by Stephen Huntley (stephen.huntley@alum.mit.edu) License: Tcl license Version 1.5 The proc globfind is a replacement for tcllib's fileutil::find Usage: globfind ?basedir ?filtercmd? ?switches?? Options: basedir - the directory from which to start the search. Defaults to current directory. filtercmd - Tcl command; for each file found in the basedir, the filename will be appended to filtercmd and the result will be evaluated. The evaluation should return 0 or 1; only files whose return code is 1 will be included in the final return result. switches - The switches will "prefilter" the results before the filtercmd is applied. The available switches are: -depth - sets the number of levels down from the basedir into which the filesystem hierarchy will be searched. A value of zero is interpreted as infinite depth. -pattern - a glob-style filename-matching wildcard. ex: -pattern *.pdf -types - any value acceptable to the "types" switch of the glob command. ex: -types {d hidden} Side effects: If somewhere within the search space a directory is a link to another directory within the search space, then the variable ::globfind::REDUNDANCY will be set to 1 (otherwise it will be set to 0). The name of the redundant directory will be appended to the variable ::globfind::redundant_files. This may be used to help track down and eliminate infinitely looping links in the search space. Unlike fileutil::find, the name of the basedir will be included in the results if it fits the prefilter and filtercmd criteria (thus emulating the behavior of the standard Unix GNU find utility). ---- globfind is designed as a fast and simple alternative to fileutil::find. It takes advantage of glob's ability to use multiple patterns to scan deeply into a directory structure in a single command, hence the name. It reports symbolic links along with other files by default, but checks for nesting of links which might otherwise lead to infinite search loops. It reports hidden files by default unless the -types switch is used to specify exactly what is wanted. globfind may be used with Tcl versions earlier than 8.4, but emulation of missing features of the glob command in those versions will result in slower performance. globfind is generally two to three times faster than fileutil::find, and fractionally faster than perl's File::Find function for comparable searches. The filtercmd may be omitted if only prefiltering is desired; in this case it may be a bit faster to use the proc globtraverse, which uses the same basedir value and command-line switches as globfind, but does not take a filtercmd value. If one wanted to search for pdf files for example, one could use the command: globfind $basedir {string match -nocase *.pdf} It would, however, in this case be much faster to use: globtraverse $basedir -pattern *.pdf
24mar2011 gavino This won't work for me. I noticed when I posted code that I had to indent all lines 1 space or I lost formatting. Perhaps this happened here? nice job if it works. I am trying to source globfind.tcl a file where I copied the above code. I get errors trying to run the pdf examples.SEH: sorry, I shouldn't have left out-of-date code here for so long. If you have trouble with the latest version linked to above, let me know.
4apr2011 gavino I may be obtuse but how can I load the globfind functionality into tclsh 8.6? I tried to source globfind.tcl in tclsh8.6, but the globtraverse command was not available..nor globfind command. I then ran package require fileutil::globfind which returned 1.5 running globfind or globtraverse command above on this page still didn't work, nor trying to run fileutil::globfind tclsh 8.6, linux, please advise
SEH: The code is in its own namespace called ::fileutil::globfind. So in order to run the command you must either run "::fileutil::globfind::globfind" or import the commands into the current namespace; i.e., "namespace import ::fileutil::globfind::*". After the import you can simply use the command "globfind".See[AQI]: 16Jul2016 Another alternative to fileutil find, and is much faster
proc rglob { basedir pattern } { # Fix the directory name, this ensures the directory name is in the # native format for the platform and contains a final directory seperator set basedir [string trimright [file join [file normalize $basedir] { }]] set fileList {} # Look in the current directory for matching files, -type {f r} # means ony readable normal files are looked at, -nocomplain stops # an error being thrown if the returned list is empty foreach fileName [glob -nocomplain -type {f r} -path $basedir $pattern] { lappend fileList $fileName } # Now look for any sub direcories in the current directory foreach dirName [glob -nocomplain -type {d r} -path $basedir *] { # Recusively call the routine on the sub directory and append any # new files to the results set subDirList [rglob $dirName $pattern] if { [llength $subDirList] > 0 } { foreach subDirFile $subDirList { lappend fileList $subDirFile } } } return $fileList }time {util::rglob [ pwd ] *.tcl} 1001505.3 microseconds per iterationtime {fileutil::findByPattern [ pwd ] *.tcl} 1008957.71 microseconds per iterationboth provide the same result for basic glob matchesits even better on larger file structurestime {fileutil::findByPattern [ pwd ] *.tcl} 102696438.9 microseconds per iterationtime {util::rglob [ pwd ] *.tcl} 10277771.5 microseconds per iteration
JOB - 2016-07-12 20:07:18See also getfiles cached which relies on the same glob statement plus the ability to store search result in an additional cache file.