Updated 2018-09-26 20:52:29 by Quigi

MC 2003-05-27: Inspired by a question on the Tcl Chatroom, I decided to start some tDOM XML Tutorials.

First we need to load the package.
package require tdom

Then, let's start with a small XML document. Note, XML allows attributes to be surrounded by either single or double quotes. The advantage of using single quotes is that they don't need to be escaped in Tcl.
set XML "
<order number='1'>
    <customer>John Doe</customer>
    <phone>555-4321</phone>
    <email>jdoe@example.com</email>
    <website/>
    <parts>
        <widget sku='XYZ123' />
        <widget sku='ABC789' />
    </parts>
</order>
"

Now let's parse it:
set doc [dom parse $XML]

and get the root node so we can start working with the DOM tree:
set root [$doc documentElement]

Now, suppose we want to print out the text data that the <phone/> number tag contains. If we wanted to be really verbose we could observe that the text node is the only child of the <phone> node, which is the second child of the <order> node (which is our $root node).
set node [$root firstChild]  ;# <customer/>
set node [$node nextSibling] ;# <phone/>
set node [$node firstChild]  ;# <phone>'s text node

We could write the above all on one line as:
# ERRATUM set node [$node firstChild [$node nextSibling [$root firstChild]]]
set node [[[$root firstChild] nextSibling] firstChild]

Or we could use the selectNodes method and an xpath expression to specify the node we want:
set node [$root selectNodes /order/phone/text()]

Personally, I prefer the latter. Now, to print out the phone number we can use either of:
puts [$node data]
puts [$node nodeValue]

Now let's suppose we want to change the phone number (maybe to include an area code?). We can specify a new value for the text node:
$node nodeValue "(999) 555-4321"

Now, let's look at attributes a bit. We can easily get an attribute from a node:
set order_num [$root getAttribute number]

It's an error to attempt to get the value of an attibute that doesn't exist unless we provide a default:
set bogus_attrib [$root getAttribute foobar "this is a default value"]

And we can set an attribute, which will either replace any current attribute (if already present) or create a new attribute:
$root setAttribute status "Shipped to Customer"

We can also easily test for the presence of an attribute and easily remove an attribute we no longer need/want:
if {[$root hasAttribute foobar]} {
    $root removeAttribute foobar
}

Now let's suppose we want to add some additional widgets to this customer's order. There are several ways, first is the appendXML method:
set node [$root selectNodes /order/parts]
set sku NEW456
$node appendXML "<widget sku='$sku'/>"

The other way is to use the appendFromList method. For this we need a 3-element list:

  1. tag name
  2. any attributes in key/value pair (same format returned by array get)
  3. nested contents (another list of this form) or the empty string {} if this element has no children
$node appendFromList [list widget {sku OLD999} {}]

Another way is to create new nodes and then append or insert them into the DOM tree:
set comment [$doc createComment "this is a comment"]
$root appendChild $comment

set node [$doc createElement widget]
$node setAttribute sku FOO333

[$root selectNodes /order/parts] appendChild $node

An easy way to add a text node is to use the appendFromList method. Since a text node doesn't have any child-nodes we omit the 3rd list element. We use the special tag name #text:
set node [$root selectNodes /order/website]
# check and make sure there isn't already a child text() node
if {[$node selectNodes text()] == ""} {
    $node appendFromList [list #text http://somewhere.example.com]
}

# another equivalent
if {[$node selectNodes text()] == ""} {
    $node appendChild [$doc createTextNode http://somewhere.example.com]
}

Let's delete the text node we just added:
[$root selectNodes /order/website/text()] delete

Next, for no real good reason, let's move the <widget/> whose sku is ABC789 to the top of the list of widgets (i.e., the first child of <parts>).
set node [$root selectNodes /order/parts]

# Two different (equivalent) approaches for selecting the node to move
# [BAS] added the // to search for widget anywhere in the tree
set move [$root selectNodes {//widget[@sku='ABC789']}]
set move [$root find sku ABC789]

# remove the child
$node removeChild $move

# and insert it before the current first child
$node insertBefore $move [$node firstChild]

Let's loop over our DOM tree and print out the name and type of each node, as well as a list of attributes (if any) for that node:
proc explore {parent} {
    set type [$parent nodeType]
    set name [$parent nodeName]

    puts "$parent is a $type node named $name"

    if {$type != "ELEMENT_NODE"} then return

    if {[llength [$parent attributes]]} {
        puts "attributes: [join [$parent attributes] ", "]"
    }

    foreach child [$parent childNodes] {
        explore $child
    }
}

explore $root

We can serialize our DOM tree back to XML using:
set XML [$root asXML]
puts $XML

Now let us see how to handle a situation when an XML document has elements of the same name at the same level in the document. Notice that there are more than one <order> elements below:
set XML "
<orders>
    <order>
        <customer>John Doe</customer>
        <phone>555-4321</phone>
        <email>jdoe@example.com</email>
        <website/>
        <parts>
            <widget sku='XYZ123' />
            <widget sku='ABC789' />
        </parts>
    </order>
    <order>
        <customer>Jane Doe</customer>
        <phone>555-4321</phone>
        <email>jdoe@example.com</email>
        <website/>
        <parts>
            <widget sku='XYZ123' />
            <widget sku='ABC789' />
        </parts>
    </order>
</orders>
"

Let us parse the order elements
set doc [dom parse $XML]
set root [$doc documentElement]

# Since there are more than one order nodes a Tcl list will be returned from the selectNodes method.
set nodeList [$root selectNodes /orders/order/customer/text()]

# Parse node1 from the returned list.
set node1 [lindex $nodeList 0]

# Parse node2 from the returned list.
set node2 [lindex $nodeList 1]

# Display their values
puts [$node1 nodeValue]
puts [$node2 nodeValue]

NEM 30 May: tDOM comes with a simple and very natural way to build up XML documents, using the createNodeCmd method. Here's an example taken from an RSS generator I'm writing (for grabbing the Recent Changes from this wiki as an RSS feed):
# First create our top-level document
set doc [dom createDocument rss]
set root [$doc documentElement]
# Set RSS version number
$root setAttribute version "0.91"

# Create the commands to build up our XML document
dom createNodeCmd elementNode channel
dom createNodeCmd elementNode title
dom createNodeCmd elementNode description
dom createNodeCmd textNode t

# Build our XML document
$root appendFromScript {
    channel {
        title { t "Tcl'ers Wiki Recent Changes" }
        description { t "A daily dose of Tcl inspiration!" }
    }
}

That's it. Each command can take a list of attributes and values, along with a script to evaluate at the end (to create further child nodes). To add attributes just use:
# Add another channel to the document, with a made-up attribute
$root appendFromScript {
    channel {foo bar} {
        t "Testing..."
    }
}

# Finally, show the resulting doc:
puts [$root asXML]

Which gives us:
<rss version="0.91">
    <channel>
        <title>Tcl'ers Wiki Recent Changes</title>
        <description>A daily dose of Tcl inspiration!</description>
    </channel>
    <channel foo="bar">Testing...</channel>
</rss>

ALX 03 Dec: An other simple way to write XML documents:
package require tdom

set doc [dom createDocument example]

set root [$doc documentElement]
$root setAttribute version 1.0

set node [$doc createElement description]
$node appendChild [$doc createTextNode "Date and Time"]
$root appendChild $node

set subnode [$doc createElement dt]
$root appendChild $subnode

set node [$doc createElement date]
$node appendChild [$doc createTextNode 2006-12-03]
$subnode appendChild $node

set node [$doc createElement time]
$node appendChild [$doc createTextNode 09:22:14]
$subnode appendChild $node

$root asXML

or easier using nodeCommands:
package require tdom

dom createNodeCmd textNode text
foreach tag {description dt date time} {
  dom createNodeCmd -tagName $tag elementNode Tag_$tag
}

set doc [dom createDocument example]
set root [$doc documentElement]
$root setAttribute version 1.0

$root appendFromScript {
  Tag_description { text {Date and Time} }
  Tag_dt {
    Tag_date { text {2006-12-03} }
    Tag_time { text {09:22:14} }
  }
}

puts [$root asXML]

$doc delete

And the output:
<example version="1.0">
    <description>Date and Time</description>
    <dt>
        <date>2006-12-03</date>
        <time>09:22:14</time>
    </dt>
</example>

Further reading  edit

XPath:

There isn't really anything Tcl-specific about xpath other than brackets [ ] have special meaning to both Tcl and the xpath engine (so be sure to properly quote/escape them).

  • add links to good xpath tutorials/references here

XSLT:

tDOM's xslt support is very complete (and well-tested). The only Tcl-specific aspects are any (optional) additional xpath functions used in your stylesheet that are defined in Tcl. The xslt.tcl script in the apps directory of the tDOM distribution is a good example to study.

Wiki pages that demonstrate the use of tDOM (for further study):


MHo 2011-02-18: How to write out modified XML to the original file, preserving encoding, comments etc.?!

[Heino] - 2013-02-27 10:22:23

setResultEncoding is no longer supported. Use encoding instead.

markus - 2016-02-11 11:43:02

How can I add a textnode containing (valid) XML/HTML? Using this:
$node appendFromList [list #text "<strong>example</strong"]

I get this as a result using $root asXML (or asHTML):
&lt;strong&gt;example&lt;/strong&gt;

I need the text to be inserted "as is".

Try
$node appendFromList [list #cdata "<strong>example</strong"]

result:
<![CDATA[<strong>example</strong]]>

Text in a XML or HTML document cannot contain literals such as < etc, but a CDATA section can. Note that HTML does not have CDATA sections; you can output the above with asXML, but asHTML crashes tDOM and the shell it was running in.

markus - 2017-07-19 08:19:03

Is there a way to change a nodeName? I need to change the nodeNames selected with selectNodes. If not, what's a convenient way to create a new node before the selected node with the new name and place the content of the to-be-renamed node into it (and then delete the now empty node).

PYK 2017-07-20: Currently the only option for doing this cloneNode -deep, replaceChild, and delete.