Updated 2008-10-18 18:16:47 by lars_h

There are two well-known packages for processing XML documents in Tcl: TclXML [1] (and related packages) and tDOM. One very common question is "Which package should I use for XML processing, and why?". This page summarizes the features and differences between the two packages in order to provide a (hopefully) unbiased view.

I've started this page on the Wiki so that members of both the TclXML and tDOM projects can contribute on an equal basis.

(A table would be best here, but I don't know how to get that in Wiki)
TclXML tDOM
Packages TclXML is separated into three packages: TclXML for parsing, TclDOM for tree processing, TclXSLT for transformations. tDOM is a single, integrated package.
Parser implementations Pure-Tcl (no extension required), C wrappers for expat, and (in v3.0) libxml2. C wrapper for expat (plus the "simple" but even faster parser based on work by D. Richard Hipp, typically well suited for 'data-oriented' XML)
DOM implementations TclDOM: There are three distinct implementations - pure Tcl (no extension required) and 2 C extensions (TclDOM/C and TclDOM/libxml2 (a wrapper for libxml2)). With TclDOM version 3.0, only one of these implementations is installed and will be used by the application. C extension.
XSLT implementations TclXSLT: C wrapper for libxslt - only works with TclDOM/libxml2. C implementation.
TEA Compliance TclXML family: Yes Yes

Packages:

  • TclXML is separated into three packages: TclXML for parsing, TclDOM for tree processing, TclXSLT for transformations.
  • tDOM is a single, integrated package.

Parser implementations:

  • TclXML: Pure Tcl (no extension required), C wrappers for expat and (in v3.0) libxml2.
  • tDOM: C wrapper for expat (plus the "simple" but even faster parser based on work by D. Richard Hipp, typically well suited for 'data-oriented' XML)

DOM implementations:

  • TclDOM: There are three distinct implementations - pure Tcl (no extension required) and 2 C extensions (TclDOM/C and TclDOM/libxml2 (a wrapper for libxml2)). With TclDOM version 3.0, only one of these implementations is installed and will be used by the application.
  • tDOM: C extension.

XSLT implementations:

  • TclXSLT: C wrapper for libxslt - only works with TclDOM/libxml2.
  • tDOM: C implementation.

TEA Compliance:

  • TclXML family: Yes.
  • tDOM: Yes.

Standards Support:

  • TclXML/tcl: XML 1.0 2nd Edition, XML Namespaces, XPath 1.0 (partial)
  • TclXML/expat: XML 1.0 2nd Edition, XML Namespaces
  • TclXML/libxml2: XML 1.0 2nd Edition, XML Namespaces, XPath 1.0
  • TclDOM/tcl: DOM Level 1, DOM Level 2 (partial), DOM Level 3 (partial), XPath 1.0 (partial)
  • TclDOM/libxml2 v3.0: DOM Level 1, DOM Level 2 (partial), DOM Level 3 (partial), XPath 1.0, XML Schemas, RelaxNG
  • TclXSLT: XSLT v1.0, EXSLT
  • tDOM: ? (Someone please fill this in)

Performance:

  • tDOM is reported to have superior runtime performance.
  • Both tDOM and libxml2/libxslt out-perform most other XSLT processors, though MSXML is reputed to also be a well-performing processor.

Memory demand:

  • tDOM trees need 1.5 - 2 times less memory than TclDOM/libxml2 trees. TclDOM/Tcl needs much more memory.
  • Most Java DOM implementations need notably more memory to represent a DOM tree.

Parsing XML

  • TclXML: SAX-style callback API, including interposing on external entity resolution. Version 3.0 supports combined DOM building and SAX events during the same parsing step.
  • tDOM: SAX-style callback API, including interposing on external entity resolution. Supports more then one script per sax event. Allows DOM building and SAX events in one parsing step.

Validating XML:

  • TclXML: No support for validation.
  • TclDOM/tcl: No support for validation.
  • TclDOM/libxml2: Posteriori validation (ie. a DOM tree may be validated after it has been parsed). Support for DTD, XML Schema and RelaxNG schema languages. Schema documents may be compiled for later (cached) use.
  • tDOM: DTD validation. Posteriori validation (DOM tree validation).

DOM Scripting:

  • TclDOM fairly strictly adheres to the W3C DOM API: IDL Interfaces are mapped to Tcl commands, live node lists, etc. Tree nodes are represented as "tokens" that are passed as arguments to the DOM commands (ie. tree nodes are mutable objects). In version 3.0 node tokens are also defined as Tcl commands, allowing a more "Tcl-ish" use of the package.
  • tDOM is somewhat more "Tclish": Tree nodes are defined as Tcl commands. Additionally, tDOM supports also representing nodes as "tokens". Serializing/parsing of subtrees to/from nested Tcl lists. Very 'Tclish' way to create new subtrees (appendFromScript).

XSLT Scripting:

  • TclXSLT: Allows stylesheets to be compiled, transformations performed and reuse of the compiled stylesheet. Also allows XSLT/XPath extensions to be implemented as Tcl callbacks. Script interface to information about the stylesheet (output method, parameters).
  • tDOM: Yes (including stylesheet "compilation" and reuse).

XPath

  • tDOM supports XPath queries. It works very fast.
  • TclDOM/libxml2: Very fast, and complete, XPath support.
  • TclDOM/tcl: Partial implementation of XPath (slow).

Events

  • TclDOM implements the DOM Level 2 Event module. Tcl application may register a Tcl script as an event listener and may post events. The package itself generates mutation events when a DOM tree is modified.

Multi-Threading

  • TclXML family: Version 3.0 is thread-safe.
  • tDOM: MT safe. Allows to share DOM trees across threads. Could be build to run out of the box as AOLServer module.

Deployment

  • tDOM can be used as one C library. There are tDOM modules as part of Tclkit. It is part of the ActiveState Tcl distribution.
  • TclXML, TclDOM have a pure Tcl implementation, which can be advantageous for extension-free deployment.
  • The libxml2/libxslt wrapper for TclDOM and TclXSLT are available in the ActiveState distribution.
  • Both tDOM and TclDOM/libxml2, TclXSLT are available in Mac OS X binary distributions.

Examples

  • TclXML: Demo scripts included in distribution.
  • TclDOM: Demo scripts and sample applications provided, particularly tkxmllint.
  • TclXSLT: Sample application provided: tkxsltproc.

de: It is a bit confusing, that TclDOM inlcudes not one DOM implementation, but at least three different ones with different characteristics. Not everything, mentioned above about 'TclDOM' is true for every of the included implementations. For example, if you want to use XSLT with TclDOM, you must use the libxml2 wrapper, with the others you could not use XSLT.

SRB That's true - there are a number of separate implementations of TclXML and TclDOM. However, they all present the same scripting API to the application, so developers can prototype using the Tcl implementation and then use the C implementation for speed.

MC 15 June 2003: Steve, above you've written that "TclDOM/libxml2 supports XPath queries", while the pure-Tcl version of "TclXML/TclDOM has a partial implementation of XPath". While both versions "present the same scripting API", obviously the actual implemented features available via the "same scripting API" differ depending on which flavor of TclDOM is in use, no?

SRB Different implementations have different characteristics - some parsers are non-validating, some are faster, etc. Sometimes there is no way to avoid the fact that implementations don't all behave the same way. These situations must be documented, so that the developer can be aware of them. However, where the behaviour is the same then the API should be consistent so that application code doesn't need to change to use another implementation.

Now, regarding XPath in the Tcl implementation of TclXML and TclDOM: it is incomplete and I regard that as a bug. I may or may not finish it, depending on whether I have time to do the work (or whether someone else does the work). However, accessing the XPath feature is the same for both the Tcl and libxml2 implementations. It's a similar story with validation in the XML parser.

de The note about strictly adherence to the W3C DOM API targets mainly the scripted TclDOM.

SRB Not true. All implementations adhere to the same API and data/processing model (modulo implementation details, see above).

de The notes about speed and memory usage are about the C wrapper, the scripted TclDOM is much slower/memory hungrier. Etc.

SRB True. Tcl implementations are always slower and use more memory than C implementations.

MC 15 June 2003: Personally, I'm glad Tcl has both tDOM and TclXML. Having choices--just like we have with OO-extensions--is a Good Thing for Tcl in my opinion. I found Steve's XML tutorials at the Ninth Annual Tcl/Tk Conference to be quite educational, and when I got home and had trouble with the build dependencies of TclXML, I discovered tDOM to be very simple to compile and begin using. YMMV. I appreciate the work of both the TclXML and tDOM developers, and am glad that both projects continue to be actively developed.