- get tcl from cvs here: http://sourceforge.net/projects/tcl/
- get the tclbench module from cvs here: http://sourceforge.net/projects/tcllib
- mkdir tcl_bld ; cd tcl_bld
- export CC="gcc -pg"
- ../tcl/unix/configure --prefix=/tmp --disable-shared
- make
- create do_bench.sh
- sh do_bench.sh
- gprof tclsh test_gmon.out
#!/bin/sh for f in ../tclbench/tcl/*.bench do echo $f env SCRIPT="../tclbench/runbench.tcl -verbose -notk -iterations 500 $f" make shell mv gmon.out `basename $f`_gmon.out doneThe statistics are written upon the exit of the tclsh, and currently tclbench execs each test file, so we can't run a group of tests and have valid results.With NetBSD 2.0 running on a Pentium 4, 1.8 GHz system, the following was the result for gprof tclsh array.bench_gmon.out:
Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 11.11 0.01 0.01 20394 0.00 0.00 malloc_bytes 11.11 0.02 0.01 19490 0.00 0.00 TclpAlloc 11.11 0.03 0.01 14686 0.00 0.00 free_bytes 11.11 0.04 0.01 10926 0.00 0.00 memcpy 11.11 0.05 0.01 1290 0.01 0.01 Tcl_ParseBraces 11.11 0.06 0.01 386 0.03 0.03 __stat13 11.11 0.07 0.01 357 0.03 0.03 Tcl_ObjGetVar2 11.11 0.08 0.01 221 0.05 0.05 InitCompiledLocals 11.11 0.09 0.01 94 0.11 0.11 TclCompileReturnCmd 0.00 0.09 0.00 35662 0.00 0.00 TclParseWhiteSpaceFor further study we may use gcov to get finer results for individual functions.
MS george, thanks for this. Note that tclbench may not be that well balanced for this particular purpose. In any case, it was not designed for it. Not that I'd know what well balanced might mean.(after talking with MS in the chat) George Peter Staplin: yep, and we will work something out. (MS and I discussed using new interps for each test file, rather than new processes.)
George Peter Staplin: I wrote a hacky patch for tclbench tonight to allow running all benchmarks in a single process. The patch for the tclbench HEAD is here: http://www.xmission.com/~georgeps/engineering/software/patches/tcl_bench_Mar_2_2005_b.diff (DEAD LINK)It isn't a complete patch, but it provided enough capability for me to find the gprof results with all benchmarks.My script to start runbench is:
#!/bin/sh env SCRIPT="../tclbench/runbench.tcl -verbose -notk -iterations 500 -singleprocess" make shellHere are the gprof tclsh gmon.out results:
Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 20.89 31.34 31.34 4530233 0.00 0.00 TclExecuteByteCode 9.51 45.61 14.27 42151760 0.00 0.00 memcpy 3.59 51.00 5.39 925617 0.00 0.00 FreeListInternalRep 3.05 55.58 4.58 501918 0.00 0.00 DupListInternalRep 2.98 60.05 4.47 906974 0.00 0.00 memmove 2.97 64.50 4.45 46178718 0.00 0.00 malloc_bytes 2.23 67.85 3.35 46151153 0.00 0.00 free_bytes 1.93 70.75 2.90 3364 0.00 0.00 TclUniCharMatch 1.73 73.34 2.60 44449352 0.00 0.00 Tcl_CreateHashEntry 1.25 75.21 1.87 3252510 0.00 0.00 TableToUtfProc 1.25 77.08 1.87 935198 0.00 0.00 __syscall 1.11 78.74 1.66 17069796 0.00 0.00 TclEvalObjvInternal 1.09 80.38 1.64 8228996 0.00 0.00 miss 1.02 81.91 1.53 38764876 0.00 0.00 ResetObjResult 0.95 83.34 1.43 35481596 0.00 0.00 Tcl_ListObjAppendElement 0.90 84.69 1.35 34128887 0.00 0.00 Tcl_GetCommandFromObj 0.87 85.99 1.30 211379 0.00 0.00 __stat13 0.86 87.28 1.29 1438022 0.00 0.00 Tcl_StringObjCmd 0.82 88.50 1.23 52457749 0.00 0.00 Tcl_GetThreadData 0.80 89.70 1.20 45503044 0.00 0.00 Tcl_Alloc 0.77 90.86 1.15 58877099 0.00 0.00 Tcl_GetStringFromObj 0.77 92.00 1.15 3936217 0.00 0.00 TclObjInterpProc 0.76 93.14 1.14 98732472 0.00 0.00 Tcl_InvalidateStringRep 0.75 94.27 1.12 46605190 0.00 0.00 ifree 0.71 95.33 1.06 152642 0.00 0.00 SetListFromAny 0.70 96.38 1.05 45995249 0.00 0.00 malloc 0.69 97.41 1.03 38764718 0.00 0.00 Tcl_ResetResult 0.68 98.43 1.02 49664344 0.00 0.00 Tcl_UniCharToLower 0.68 99.45 1.02 228790 0.00 0.00 TableFromUtfProc 0.67 100.45 1.00 18099244 0.00 0.00 Tcl_SetObjResult 0.66 101.44 0.99 45453081 0.00 0.00 Tcl_SetLongObj 0.62 102.38 0.93 7886398 0.00 0.00 pickss 0.61 103.30 0.92 8598571 0.00 0.00 Tcl_UniCharToUtfDString 0.57 104.15 0.85 14440572 0.00 0.00 AppendUtfToUtfRep 0.56 105.00 0.84 47111137 0.00 0.00 Tcl_GetString 0.55 105.83 0.83 46003853 0.00 0.00 free 0.55 106.65 0.82 35201 0.00 0.00 Tcl_SplitObjCmd 0.55 107.47 0.82 34336900 0.00 0.00 Tcl_ListObjLength 0.54 108.28 0.81 24284081 0.00 0.00 HashStringKey 0.54 109.09 0.81 3073 0.00 0.00 brk 0.53 109.89 0.80 4676175 0.00 0.00 TclLookupSimpleVar 0.51 110.66 0.76 46612082 0.00 0.00 imalloc 0.49 111.39 0.74 123229 0.00 0.00 open 0.44 112.06 0.67 9766679 0.00 0.00 TclFindElement 0.44 112.72 0.67 1512790 0.00 0.00 Tcl_GetsObj 0.43 113.37 0.65 4512497 0.00 0.00 TclFormatInt 0.41 113.99 0.62 84895926 0.00 0.00 Tcl_UniCharToUtf 0.40 114.59 0.60 23306949 0.00 0.00 CompareStringKeys 0.39 115.17 0.58 13720118 0.00 0.00 TclPtrSetVar 0.39 115.75 0.58 9477816 0.00 0.00 TclLindexFlat 0.39 116.33 0.58 184250 0.00 0.00 chdir 0.38 116.91 0.57 8107886 0.00 0.00 Tcl_FindHashEntry 0.38 117.47 0.57 52545 0.00 0.00 Tcl_RegsubObjCmd 0.37 118.03 0.56 14865956 0.00 0.00 TclLookupArrayElement 0.35 118.56 0.53 45837682 0.00 0.00 TclpAlloc 0.35 119.09 0.53 1231201 0.00 0.00 shortest 0.33 119.59 0.50 26374110 0.00 0.00 SetStringFromAny 0.33 120.09 0.50 21594049 0.00 0.00 TclpCheckStackSpace 0.32 120.57 0.48 45809952 0.00 0.00 Tcl_Free 0.32 121.05 0.48 45809952 0.00 0.00 TclpFree 0.30 121.50 0.46 2977677 0.00 0.00 Tcl_BinaryObjCmd 0.30 121.95 0.45 4379643 0.00 0.00 strstr 0.30 122.41 0.45 14118 0.00 0.00 write 0.29 122.84 0.44 28352669 0.00 0.00 TclGetIntForIndex 0.27 123.25 0.40 21594049 0.00 0.00 TclInterpReady 0.27 123.64 0.40 74603 0.00 0.00 read 0.26 124.03 0.39 3884612 0.00 0.00 TclDeleteCompiledLocalVars 0.26 124.42 0.39 7886398 0.00 0.00 getvacant 0.25 124.80 0.38 5508812 0.00 0.00 TclObjLookupVar 0.25 125.18 0.38 101609 0.00 0.00 __getcwd 0.25 125.56 0.38 2000100 0.00 0.00 Tcl_SetByteArrayLength 0.25 125.92 0.37 26304535 0.00 0.00 Tcl_AsyncReady 0.25 126.30 0.37 17684600 0.00 0.00 Tcl_DStringSetLength 0.24 126.66 0.36 17779843 0.00 0.00 Tcl_GetObjResult 0.23 127.01 0.35 123481 0.00 0.00 close 0.23 127.35 0.34 6408784 0.00 0.00 ScanNumber 0.23 127.69 0.34 3096591 0.00 0.00 Tcl_ExternalToUtf 0.22 128.02 0.33 17229166 0.00 0.00 Tcl_NewLongObj 0.22 128.34 0.33 8490527 0.00 0.00 Tcl_UniCharToUpper 0.22 128.67 0.33 7897768 0.00 0.00 SortCompare 0.21 128.99 0.32 53161 0.00 0.00 access 0.21 129.31 0.32 17085127 0.00 0.00 TclCleanupCommand 0.21 129.62 0.31 11585918 0.00 0.00 Tcl_ListObjGetElements 0.20 129.92 0.30 14622665 0.00 0.00 TclFreeObj 0.20 130.22 0.30 89167 0.00 0.00 __lstat13 0.19 130.51 0.29 26299501 0.00 0.00 Tcl_LimitReady 0.19 130.80 0.29 16778728 0.00 0.00 SelectObjFromSublist 0.19 131.09 0.29 173882 0.00 0.00 malloc_make_chunks 0.19 131.37 0.28 89217 0.00 0.00 fchdir 0.18 131.65 0.28 21603229 0.00 0.00 Tcl_LimitExceeded 0.18 131.91 0.27 4524251 0.00 0.00 TclCompEvalObj 0.18 132.19 0.27 4129276 0.00 0.00 Tcl_AppendObjToObj 0.17 132.44 0.26 7877404 0.00 0.00 TclStackAlloc 0.17 132.71 0.26 1997210 0.00 0.00 AllocStringEntry 0.17 132.96 0.25 6229846 0.00 0.00 Tcl_NewStringObj 0.17 133.21 0.25 3282618 0.00 0.00 Tcl_GetAssocData 0.16 133.44 0.24 1993138 0.00 0.00 ExprRandFunc 0.16 133.68 0.23 3671835 0.00 0.00 Tcl_SetObjLength 0.15 133.90 0.22 7877402 0.00 0.00 TclStackFree 0.15 134.12 0.22 4836977 0.00 0.00 Tcl_ListObjIndex 0.15 134.34 0.22 16968 0.00 0.00 CopyAndTranslateBuffer 0.14 134.56 0.21 8598568 0.00 0.00 AppendUnicodeToUtfRep 0.14 134.77 0.21 5894322 0.00 0.00 Tcl_NewIntObj 0.14 134.98 0.21 11227377 0.00 0.00 Tcl_AppendUnicodeToObj 0.13 135.18 0.20 1617303 0.00 0.00 Tcl_ListObjReplace 0.13 135.38 0.20 24868 0.00 0.00 TranslateInputEOL 0.13 135.57 0.20 3453 0.00 0.00 TclThreadDataKeyGet 0.13 135.76 0.19 14908149 0.00 0.00 Tcl_UtfToUniChar 0.13 135.96 0.19 9478817 0.00 0.00 TclLindexList 0.13 136.15 0.19 2497064 0.00 0.00 TclLookupVar 0.13 136.34 0.19 1752393 0.00 0.00 ExprIntFunc 0.12 136.52 0.18 1567820 0.00 0.00 FilterInputBytes 0.12 136.70 0.18 3610585 0.00 0.00 Tcl_NewDoubleObj 0.12 136.88 0.18 3279889 0.00 0.00 Tcl_GetChannel 0.12 137.06 0.18 2494299 0.00 0.00 TclVarTraceExists 0.12 137.24 0.18 809092 0.00 0.00 Tcl_AttemptSetObjLength 0.12 137.41 0.17 8627161 0.00 0.00 Tcl_GetLongFromObj 0.12 137.59 0.17 4377459 0.00 0.00 UpdateStringOfInt 0.11 137.76 0.17 7256550 0.00 0.00 Tcl_GetIndexFromObj 0.11 137.93 0.17 3936217 0.00 0.00 TclProcCompileProc 0.11 138.09 0.16 1998880 0.00 0.00 UpdateStringOfByteArray 0.11 138.25 0.16 374304 0.00 0.00 Tcl_ScanObjCmd 0.10 138.41 0.15 9624841 0.00 0.00 Tcl_DStringFree 0.10 138.56 0.15 4801045 0.00 0.00 Tcl_NewWideIntObj 0.10 138.72 0.15 3678888 0.00 0.00 Tcl_GetCharLength 0.10 138.87 0.15 6441980 0.00 0.00 Tcl_GetDoubleFromObj 0.10 139.01 0.14 3936939 0.00 0.00 Tcl_PushCallFrame 0.09 139.15 0.14 716160 0.00 0.00 __qdivrem 0.09 139.29 0.14 209406 0.00 0.00 vfprintf_unlocked 0.09 139.43 0.14 16882 0.00 0.00 Tcl_UtfToUpper 0.09 139.56 0.14 9765386 0.00 0.00 TclCopyAndCollapse 0.09 139.70 0.14 3521845 0.00 0.00 Tcl_ObjSetVar2 0.09 139.84 0.14 L2 0.09 139.97 0.13 9627551 0.00 0.00 Tcl_DStringInit 0.09 140.09 0.13 8490706 0.00 0.00 UtfCount 0.09 140.22 0.13 4031242 0.00 0.00 Tcl_NewObj 0.09 140.35 0.13 23583 0.00 0.00 Tcl_NumUtfChars 0.08 140.47 0.12 4415780 0.00 0.00 Tcl_NextHashEntry 0.08 140.59 0.12 3081154 0.00 0.00 FormatNumber 0.08 140.72 0.12 2121681 0.00 0.00 memset 0.08 140.84 0.12 110080 0.00 0.00 FillUnicodeRep 0.07 140.94 0.11 5158785 0.00 0.00 GetFormatSpec 0.07 141.06 0.11 3936939 0.00 0.00 TclPushStackFrame 0.07 141.16 0.11 3936938 0.00 0.00 Tcl_PopCallFrame 0.07 141.28 0.11 2494299 0.00 0.00 InfoExistsCmd 0.07 141.38 0.11 1237399 0.00 0.00 Tcl_LreplaceObjCmd 0.07 141.50 0.11 1231201 0.00 0.00 find 0.07 141.60 0.11 1022100 0.00 0.00 Tcl_DeleteHashTable 0.07 141.72 0.11 862680 0.00 0.00 Tcl_Tell 0.07 141.82 0.11 Tcl_DbCkfree 0.07 141.93 0.10 2496778 0.00 0.00 Tcl_InfoObjCmd 0.07 142.03 0.10 873733 0.00 0.00 longest 0.07 142.13 0.10 2104934 0.00 0.00 initialize 0.07 142.24 0.10 1648122 0.00 0.00 strtoul 0.07 142.34 0.10 1527933 0.00 0.00 FileWatchProc 0.07 142.44 0.10 1306318 0.00 0.00 MergeLists 0.07 142.53 0.10 692920 0.00 0.00 Tcl_DuplicateObj 0.07 142.63 0.10 594475 0.00 0.00 Tcl_EvalObjEx 0.07 142.74 0.10 374304 0.00 0.00 ValidateFormat 0.07 142.84 0.10 295163 0.00 0.00 Tcl_FormatObjCmd 0.07 142.94 0.10 19709 0.00 0.00 RebuildTable 0.06 143.03 0.10 2290165 0.00 0.00 Tcl_ScanCountedElement 0.06 143.12 0.10 1511163 0.00 0.00 Tcl_GetsObjCmd 0.06 143.22 0.09 1231201 0.00 0.00 TclReExec 0.06 143.31 0.09 862680 0.00 0.00 Tcl_TellObjCmd 0.06 143.40 0.09 606234 0.00 0.00 free_pages(Further entries omitted)
VK 2005-03-03: It is no surprise that "malloc_bytes" is used often, and I, personally, tried and succeeded following trick. This trick takes into collaboration Perl, and I performed this trick on Win32. I compiled Perl and Tcl/Tk to use common memory allocator. I used Perl's memory allocator when building Perl, and replaced in tclAlloc.c call to malloc with Perl_malloc (and 2 other: free and realloc)After building this, I got Tcl.dll dependent on Perl.dll (because now Tcl uses Perl's allocator) and gained faster Tcl.It appears that
- Perl's memory allocator is faster than Tcl's, and I think it is quite normal to benefit from this fact.
- This trick works for building on PocketPC, and Tcl/Tk became visually much faster with this approach.
- Rebuilding Tk after this trick with Tcl was not necessary.