Document Object Model (DOM) is defined by W3C as a set of recommendations. The DOM core recommendations define a set of objects, each of which represents some information relevant to the XML document. There are also well defined relationships between these objects, to represent the document's organization.

A DOM parser essentially reads the XML document and constructs a tree structure in memory that represents the original document, as Screenshot shows. This tree is composed of well-defined objects. apps can then navigate through the branches of this tree and manipulate the XML. The parser implementations, including the reference implementation, internally use a SAX parser to read the XML into memory. The XML is then analyzed for the relationships between the component parts and is organized into a tree structure that can be traversed.

Java Click To expand
Screenshot: Schema showing tree structure of DOM

DOM is organized into levels. Level 1 details the functionality and navigation of content within a document. Level 2 (also refered to as DOM 2.0) adds to Level 1 (e.g., ability to access tree members by namespace names). Level 2 is composed of a set of specifications, as shown below. Everything except the core is optional to implement for a DOM 2-compliant parser. Level 3 is a working draft.

  1. DOM Level 2 Core: Defines the basic object model to represent structured data

  2. DOM Level 2 Views: Allows access and update of the representation of a DOM
  3. DOM Level 2 Style: Allows access and update of style sheets
  4. DOM Level 2 Traversal and Range: Allows walk through, identify, modify, and delete a range of content in the DOM

Unlike SAX, DOM is specified in an implementation-independent manner and defines all its constructs for these objects via Object Management Group Interface Definition Language (OMG IDL). It then defines Java language bindings for those constructs. These Java bindings are packaged by the W3C in the org.w3c.dom package, which is also overlaid with JAXP.

Java Start Sidebar

JAXP endorses only the DOM 2.0 core, which is fully namespace aware. DOM specifications from the W3C can be found at

Java End Sidebar

Screenshot shows the class relationships between the major component interfaces of the org.w3c.dom package.

Java Click To expand
Screenshot: The org.w3c.dom package

In DOM, a root element is a Node, which contains methods for working with the node name and attributes. Each subclass shown in Screenshot represents a specific type of item from the XML. The Document represents the entire XML structure and is the conceptual root of the tree. It has methods that relate to creating nodes, assembling these nodes into the tree, and locating elements by name. Listing 9.3 describes the IDL definition of the Document object, as defined in the W3C specification. Notice how the IDL constructs did directly map to Java interfaces in the org.w3c.dom package in Screenshot.

Listing 9.3: IDL description of the DOM Document objectListing
interface Document : Node {
 readonly attribute DocumentType doctype;
 readonly attribute DOMImplementation implementation;
 readonly attribute Element documentElement;
 Element createElement(in DOMString tagName) raises(DOMException);
 DocumentFragment createDocumentFragment();
 Text createTextNode(in DOMString data);
 Comment createComment(in DOMString data);
 CDATASection createCDATASection(in DOMString data) raises(DOMException);
 ProcessingInstruction createProcessingInstruction(in DOMString target,
 in DOMString data) raises(DOMException);
 Attr createAttribute(in DOMString name) raises(DOMException);
 EntityReference createEntityReference(in DOMString name) raises(DOMException);
 NodeList getElementsByTagName(in DOMString tagname);
 Node importNode(in Node importedNode, in boolean deep) raises(DOMException);
 Element createElementNS(in DOMString namespaceURI, in DOMString qualifiedName)
 Attr createAttributeNS(in DOMString namespaceURI, in DOMString qualifiedName)
 NodeList getElementsByTagNameNS(in DOMString namespaceURI, in DOMString localName);
 Element getElementById(in DOMString elementId);

Java End example


JAXP includes the W3C DOM package and a JAXP layer on top of it. This is similar to the SAX portion of JAXP described earlier:

The DocumentBuilderFactory is used to obtain a reference to the underlying DOM parser—that is, the instance of the DocumentBuilder interface. It can be used to select from different parsers, although the current implementation comes with only one DOM parser. The event sequence that occurs when the factory is instantiated is similar to that described previously for the SAXParserFactory. First, the system property javax.xml.parsers.DocumentBuilderFactory is checked, then the lib/ file is checked for this property, following which the JAR files are searched for the META-INF/services/javax.xml .parsers.DocumentBuilderFactory file. If none of these is found, the default DocumentBuilderFactory of JAXP is used. The code segment below shows the earlier SAXParsing example, adapted to show the basic structure used to create a DOM tree from an XML file. The flow remains the same: a DocumentBuilderFactory is created, a DocumentBuilder instance is obtained from it, and the XML is parsed using that instance.

package com.flutebank.parsing;
import javax.xml.parsers.*;
public class DOMParsing{
 public static void main(String[] arg) {
 try {
 String filename = arg[0];
// Create a new factory that will create the SAX parser
 DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
// Use the factory to create a DOM parser
 DocumentBuilder parser = factory.newDocumentBuilder();
// Create a new handler to handle content
 parser.setErrorHandler(new MyErrorHandler());
 Document xml = parser.parse(new File(filename));
// Do something useful with the XML tree represented by the Document object
 } catch (Exception e) {

Usually, the DOM representation in memory is not an exact replica of the conceptual model. The primary disparity is that the tree includes Text nodes for ignorable white spaces (white space that falls between tags—e.g., a carriage return). The parsing code has to normalize the tree and handle these node types.