Updated 2004-07-16 12:42:03 by NEM

A Garbage Collector for C and C++

http://www.hpl.hp.com/personal/Hans_Boehm/gc/

GPS: July 18, 2003 I've been impressed by how well it works. I think that it might be interesting to build a Tcl-prototype that use the Boehm-GC for Tcl_Obj management. It could possibly eliminate thousands of lines of code and improve the quality of Tcl/Tk.

There was an excellent article in C/C++ Users Journal about the Boehm-GC.

Jacob Levy July 18, 2003: Having used a commercial version of this garbage collector, I would strongly recommend against using it for Tcl. The free version is likely to be worse than the commercial version with which I have experience. Here are some points against using it: First of all, the GC does not work on all the platforms on which Tcl needs to run. Its allocator is about ten times slower than straight malloc. The Boehm-Weiser GC is very conservative -- it doesn't know anything about Tcl's data types, so it cannot determine where all the pointers are in a piece of memory. It also does not know where all the pointers are on the stack and what their real type is, so the end result is that it retains much more memory than actually needed by the application. The GC interacts very poorly with threads on several platforms, so its going to be a binary choice between threaded Tcl builds and builds that support this GC. Tcl is also used in soft-real-time situations where predictability of performance is crucial, so Tcl cannot tolerate arbitrary unpredictable pauses. The allocator violates the C++ semantics since it cannot run destructors for cleanup (it knows nothing about them) so it'd become impossible to write Tcl extensions in C++, if we used it. Some compilers, at higher levels of optimizations, create "inner pointers" that point into the middle of a data structure. However, the GC is not always able to find these "inner pointers" so objects may get collected incorrectly in some situations.

Finally, the biggest reason not to use it is that it's a cop-out: Tcl should get the refcounting mechanism (and in general, all memory management) to "just work", instead of abdicating this responsibility.

GPS July 18, 2003: I wasn't aware of these problems. I've tested it with various apps while watching top to see how it performed and I didn't see these problems. It works well for me in w3m and a small library I wrote.

The problem with reference counting in C is that if a programmer forgets to add just one Tcl_IncrRefCount, you end up with an invalid pointer to a Tcl_Obj, so of course a core-dump (if you're lucky) or corruption occurs. This could be avoided with a little more work if we sprinkled assert macros that would check if the address a pointer points to is valid. It could also expand to pass __FILE__ and __LINE__ to help track down the problem. This would of course require keeping track of memory allocated and valid ranges, and it wouldn't work with stack-allocated-memory (but many operating systems now come with some stack-protection).

DKF: Tcl already comes with some of that sort of thing when you enable memory debugging as a compile-time option.

GPS: The Tcl_Db* functions only solve part of the problem. They may catch some Tcl bugs, but there are many that only happen with Tcl-extensions. Most users of Tcl are not running debug builds and they use extensions. We need a fast solution that works with all builds.

Jacob Levy July 18, 2003: The threading problem may be fixed now and this GC is used in the implementation of GNU's Java native compiler (http://gcc.gnu.org/java/). However, the other problems remain -- extremely conservative approach to deciding whether a value in-memory represents a pointer so that the pointed-at entity should be retained, violating C++ semantics, it doesn't work on some platforms that Tcl is ported to, and problems with "inner pointers" on some platforms.

If you want to see a refcounting scheme that works without any need for user intervention (the user's code is not sprinkled with the likes of Tcl_IncrRefCount and such), look at Metakit (http://www.equi4.com/metakit), e4Graph (http://www.e4graph.com/e4graph), and ColdStore (http://coldstore.sourceforge.net/). These packages are written in C++. The reason to mention them here is to point out that refcounting does not have to be imposed on the user at all, it can be handled automatically behind the user's back. I agree with GPS that Tcl's scheme is problematic for the reasons he mentioned.

I'm wondering about the performance impact of enabling these refcounting macros that DKF mentioned. Any data appreciated.

[P.K.]: But refcounting is slow! It is much slower than any mark & sweep GC. Each time you copy a reference value you must increase/decrease a refcounter. This MUST be thread safe operaration so you must do it using a mutex or other system operation. And such mutex is a REALLY SLOW thing. Refcounting gcs are usually 10-100 slower than mark & sweep gcs. They make unpredictable pauses too and besides they force programmer to think of pointer cycles. Boehm's GC is the best conservative gc I know about. If you are frightened of pauses and memory leaks, you should probably use a soft-real-time, precise, mark & sweep garbage collector for C++ based on smart pointers (like refcounting, but much faster because it doesn't need to do any synchronization while copying or creating pointers). Of course, unlike Boehm's GC, it doesn't stop any threads during the mark&sweep phase. More info: pkolaczk@elka.pw.edu.pl.

NEM: Your assertion that refcounting must be thread safe is simply incorrect. Tcl side-steps this issue by not allowing Tcl_Objs to be shared across threads. If you check tcl.h, you will see that Tcl_{Incr,Decr}RefCount are simple macros which don't involve any mutexes. So, the refcounting operations are pretty fast. Tcl's everything is a string semantics, should make it impossible to create a structure with cyclic references (except for the empty string, possibly).