SAX is a low-level, event-style API for parsing XML documents. SAX originated in Java but has been implemented in many languages. We'll begin our discussion of the Java XML APIs here, at this lower level, and work our way up to higher-level (and sometimes more convenient) APIs as we go.


To use SAX, we'll be drawing on classes from the org.xml.sax package, standardized by the W3C. This package holds mainly interfaces common to all implementations of SAX. To perform the actual parsing, we'll need the javax.xml.parsers package, which is the standard Java package for accessing XML parsers. The java.xml.parsers package is part of the Java API for XML Processing (JAXP), which allows different parser implementations to be used with Java in a portable way. To read an XML document with SAX, we first register an org.xml.sax.ContentHandler class with the parser. The ContentHandler has methods that are called in response to parts of the document. For example, the ContentHandler's startElement( ) method is called when an opening tag is encountered, and the endElement( ) method is called when the tag is closed. Attributes are provided with the startElement( ) call. Text content of elements is passed through a separate method called characters( ). The characters( ) method can be invoked repeatedly to supply more text as it is read, but it often gets the whole string in one bite. The following are the method signatures of these methods of the ContentHandler class.

 public void startElement(
 String namespace, String localname, String qname, Attributes atts );
 public void characters(
 char[] ch, int start, int len );
 public void endElement(
 String namespace, String localname, String qname );

The qname parameter is the qualified name of the element. This is the element name, prefixed with namespace if it has one. When working with namespaces, the namespace and localname parameters are also supplied, providing the namespace and unqualified name separately. (Java 5.0 introduced a new javax.xml.namespace package that holds concrete classes representing namespaces and qualified names. The SAX API doesn't use them yet, however.) The ContentHandler interface also contains methods called in response to the start and end of the document, startDocument( ) and endDocument( ), as well as those for handling namespace mapping, special XML instructions, and whitespace that can be ignored. We'll confine ourselves to the three methods above for our examples. As with many other Java interfaces, a simple implementation, org.xml.sax.helpers.DefaultHandler, is provided for us that allows us to override just the methods we're interested in.


To perform the parsing, we'll need to get a parser from the javax.xml.parsers package. JAXP abstracts the process of getting a parser through a factory pattern, allowing different parser implementations to be plugged into the Java platform. The following snippet constructs a SAXParser object and then gets an XMLReader used to parse a file:

 import javax.xml.parsers.*;
 SAXParserFactory factory = SAXParserFactory.newInstance( );
 SAXParser saxParser = factory.newSAXParser( );
 XMLReader parser = saxParser.getXMLReader( );
 parser.setContentHandler( myContentHandler );
 parser.parse( myfile.xml" );

You might expect the SAXParser to have the parse method. The XMLReader intermediary was added to support changes in the SAX API between 1.0 and 2.0. Later we'll discuss some options that can be set to govern how XML parsers operate. These options are normally set through methods on the parser factory (e.g., SAXParserFactory) and not the parser itself. This is because the factory may wish to use different implementations to support different required features.

SAX's strengths and weaknesses

The primary motivation for using SAX instead of the higher-level APIs that we'll discuss later is that it is lightweight and event-driven. SAX doesn't require maintaining the entire document in memory. If, for example, you need to grab the text of just a few elements from a document, or if you need to extract elements from a large stream of XML, you can do so efficiently with SAX. The event-driven nature of SAX also allows you to take actions as the beginning and end tags are parsed. This can be useful for directly manipulating your own models without first going through another representation. The primary weakness of SAX is that you are operating on a tag-by-tag level with no help from the parser to maintain context. We'll talk about how to overcome this limitation next. Later, we'll also talk about the new XPath API, which combines much of the benefits of both SAX and DOM in a form that is easier to use.

Building a Model Using SAX

The ContentHandler mechanism for receiving SAX events is very simple. It should be easy to see how one could use it to capture the value or attributes of a single element in a document. What may be harder to see is how one could use SAX to populate a real Java object model. Creating or pushing data into Java objects from XML is such a common activity that it's worth considering how the SAX API applies to this problem. The following example, SAXModelBuilder, does just this, reading an XML description and creating Java objects on command. This example is a bit unusual in that we resort to using reflection to do the job, but this is a case where we're trying to interact with Java objects dynamically. Later, we'll discuss more powerful, standard tools for statically generating and building models for XML documents. In this section, we'll start by creating some XML along with corresponding Java classes that serve as the model for this XML. The final step in this example is to create the generic model builder that reads the XML and populates the model classes with their data. The idea is that the developer is creating only XML and model classesno custom codeto do the parsing. You might use code like this to read configuration files for an app or to implement a custom XML "language" for describing workflows. The advantage is that there is no parsing code in the app at all, only in the generic builder tool.

Creating the XML file

The first thing we'll need is a nice XML document to parse. Luckily, it's inventory time at the zoo! The following document, zooinventory.xml, describes two of the zoo's residents, including some vital information about their diets:

 <?xml version="1.0" encoding="UTF-8"?>
 <!-- file zooinventory.xml -->
 <Animal animal >
 <Name>Song Fang</Name>
 <Species>Giant Panda</Species>
 <Animal animal >
 <Habitat>Central Africa</Habitat>
 <Name>Gorilla Chow</Name>

The document is fairly simple. The root element, <Inventory>, contains two <Animal> elements as children. <Animal> contains several simple text elements for things like name, species, and habitat. It also contains either a simple <Food> element or a compound <FoodRecipe> element. Finally, note that the <Animal> element has one attribute (animalClass) that describes the zoological classification of the creature.

The model

Now let's make a Java object model for our zoo inventory. This part is very mechanicaleasy, but tedious to do by hand. We simply create objects for each of the complex element types in our XML, using the standard JavaBeans property design patterns ("setters" and "getters") so that our builder can automatically recognize them later. (We'll prove the usefulness of these patterns later when we see that these same model objects can be understood by the Java XMLEncoder tool.) For convenience, we'll have our model objects extend a base SimpleElement class that handles text content for any element, but you could eliminate this requirement easily.

 public class SimpleElement {
 StringBuffer text = new StringBuffer( );
 public void addText( String s ) { text.append( s ); }
 public String getText( ) { return text.toString( ); }
 public void setAttributeValue( String name, String value ) {
 throw new Error( getClass( )+": No attributes allowed");
 public class Inventory extends SimpleElement {
 List<Animal> animals = new ArrayList<Animal>( );
 public void addAnimal( Animal animal ) { animals.add( animal ); }
 public List<Animal> getAnimals( ) { return animals; }
 public void setAnimals( List<Animal> animals ) { this.animals = animals; }
 public class Animal extends SimpleElement {
 public final static int MAMMAL = 1;
 int animalClass;
 String name, species, habitat, food, temperament;
 FoodRecipe foodRecipe;
 public void setName( String name ) { = name ; }
 public String getName( ) { return name; }
 public void setSpecies( String species ) { this.species = species ; }
 public String getSpecies( ) { return species; }
 public void setHabitat( String habitat ) { this.habitat = habitat ; }
 public String getHabitat( ) { return habitat; }
 public void setFood( String food ) { = food ; }
 public String getFood( ) { return food; }
 public void setFoodRecipe( FoodRecipe recipe ) {
 this.foodRecipe = recipe; }
 public FoodRecipe getFoodRecipe( ) { return foodRecipe; }
 public void setTemperament( String temperament ) {
 this.temperament = temperament ; }
 public String getTemperament( ) { return temperament; }
 public void setAnimalClass( int animalClass ) {
 this.animalClass = animalClass; }
 public int getAnimalClass( ) { return animalClass; }
 public void setAttributeValue( String name, String value ) {
 if ( name.equals("animalClass") && value.equals("mammal") )
 setAnimalClass( MAMMAL );
 throw new Error("Invalid attribute: "+name);
 public String toString( ) { return name +"("+species+")"; }
 public class FoodRecipe extends SimpleElement {
 String name;
 List<String> ingredients = new ArrayList<String>( );
 public void setName( String name ) { = name ; }
 public String getName( ) { return name; }
 public void addIngredient( String ingredient ) {
 ingredients.add( ingredient ); }
 public void setIngredients( List<String> ingredients ) {
 this.ingredients = ingredients; }
 public List<String> getIngredients( ) { return ingredients; }
 public String toString( ) { return name + ": "+ ingredients.toString( ); }

The SAXModelBuilder

Let's get down to business and write our builder tool. The SAXModelBuilder we create in this section receives SAX events from parsing an XML file and constructs classes corresponding to the names of the tags. Our model builder is simple, but it handles the most common structures: elements with text or simple element data. We handle attributes by passing them to the model class, allowing it to map them to their own identifiers (perhaps Animal.MAMMAL). Here is the code:

 import org.xml.sax.*;
 import org.xml.sax.helpers.*;
 import java.util.*;
 import java.lang.reflect.*;
 public class SAXModelBuilder extends DefaultHandler
 Stack<SimpleElement> stack = new Stack<SimpleElement>( );
 SimpleElement element;
 public void startElement(
 String namespace, String localname, String qname, Attributes atts )
 throws SAXException
 SimpleElement element = null;
 try {
 element = (SimpleElement)Class.forName(qname).newInstance( );
 } catch ( Exception e ) { }
 if ( element == null )
 element = new SimpleElement( );
 for(int i=0; i<atts.getLength( ); i++)
 element.setAttributeValue( atts.getQName(i), atts.getValue(i) );
 stack.push( element );
 public void endElement( String namespace, String localname, String qname )
 throws SAXException
 element = stack.pop( );
 if ( !stack.empty( ) )
 try {
 setProperty( qname, stack.peek( ), element );
 } catch ( Exception e ) { throw new SAXException( "Error: "+e ); }
 public void characters(char[] ch, int start, int len ) {
 String text = new String( ch, start, len );
 stack.peek( ).addText( text );
 void setProperty( String name, Object target, Object value )
 throws SAXException
 Method method = null;
 try {
 method = target.getClass( ).getMethod("add"+name, value.getClass( ));
 } catch ( NoSuchMethodException e ) { }
 if ( method == null ) try {
 method = target.getClass( ).getMethod("set"+name, value.getClass( ));
 } catch ( NoSuchMethodException e ) { }
 if ( method == null ) try {
 value = ((SimpleElement)value).getText( );
 method = target.getClass( ).getMethod( "add"+name, String.class );
 } catch ( NoSuchMethodException e ) { }
 try {
 if ( method == null )
 method = target.getClass( ).getMethod("set"+name, String.class);
 method.invoke( target, value );
 } catch ( Exception e ) { throw new SAXException( e.toString( ) ); }
 public SimpleElement getModel( ) { return
 element; }

The SAXModelBuilder extends DefaultHandler to help us implement the ContentHandler interface. We use the startElement( ), endElement( ), and characters( ) methods to receive information from the document. Because SAX events follow the structure of the XML document, we use a simple stack to keep track of which object we are currently parsing. At the start of each element, the model builder attempts to create an instance of a class with the same name as the element and push it onto the top of the stack. Each nested opening tag creates a new object on the stack until we encounter a closing tag. Upon reaching an end of the element, we pop the current object off the stack and attempt to apply its value to its parent (the enclosing element), which is the new top of the stack. The final closing tag leaves the stack empty, but we save the last value in the result variable. Our setProperty( ) method uses reflection and the standard JavaBeans naming conventions to look for the appropriate property "set" method to apply a value to its parent object. First, we check for a method named add<Property> or set<Property>, accepting an argument of the child element type (for example, the addAnimal( Animal animal ) method of our Inventory object). Failing that, we look for an "add" or "set" method accepting a String argument and use it to apply any text content of the child object. This convenience saves us from having to create trivial classes for properties containing only text. The common base class SimpleElement helps us in two ways. First, it provides a method allowing us to pass attributes to the model class. Next, we use SimpleElement as a placeholder when no class exists for an element, allowing us to store the text of the tag.

Test drive

Finally, we can test-drive the model builder with the following class, TestModelBuilder, which calls the SAX parser, setting an instance of our SAXModelBuilder as the content handler. The test class then prints some of the information parsed from the zooinventory.xml file:

 import org.xml.sax.*;
 import org.xml.sax.helpers.*;
 import javax.xml.parsers.*;
 public class TestModelBuilder
 public static void main( String [] args ) throws Exception
 SAXParserFactory factory = SAXParserFactory.newInstance( );
 SAXParser saxParser = factory.newSAXParser( );
 XMLReader parser = saxParser.getXMLReader( );
 SAXModelBuilder mb = new SAXModelBuilder( );
 parser.setContentHandler( mb );
 parser.parse( "zooinventory.xml" );
 Inventory inventory = (Inventory)mb.getModel( );
 System.out.println("Animals = "+inventory.getAnimals( ));
 Animal cocoa = (Animal)(inventory.getAnimals( ).get(1));
 FoodRecipe recipe = cocoa.getFoodRecipe( );
 System.out.println( "Recipe = "+recipe );

The output should look like this:

 Animals = [Song Fang(Giant Panda), Cocoa(Gorilla)]
 Recipe = Gorilla Chow: [Fruit, Shoots, Leaves]

In the following sections, we'll generate the equivalent output using different tools.

Limitations and possibilities

To make our model builder more complete, we could use more robust naming conventions for our tags and model classes (taking into account packages and mixed capitalization, etc.). But more generally, we might not want to name our model classes strictly based on tag names. And, of course, there is the problem of taking our model and going the other way, using it to generate an XML document. Furthermore, as we've said, writing the model classes in this way is tedious and error-prone. All this is a good indication that this area is ripe for auto-generation of classes. We'll discuss tools that do that later in the chapter.


Java includes a standard tool for serializing JavaBeans classes to XML. The java.beans package XMLEncoder and XMLDecoder classes are analogous to ObjectInputStream and ObjectOutputStream. Instead of using the native Java serialization format, they store the object state in a high-level XML format. We say that they are analogous, but the XML encoder is not a general replacement for Java object serialization. Instead, it is specialized to work with objects that follow the JavaBeans design patterns, and it can only store and recover the state of the object that is expressed through a bean's public properties in this way (using getters and setters). When you call it, the XMLEncoder attempts to construct an in-memory copy of the graph of beans that you are serializing, using only public constructors and JavaBean properties. As it works, it writes out the steps required as "instructions" in an XML format. Later, the XMLDecoder executes these instructions and reproduces the result. The primary advantage of this process is that it is highly resilient to changes in the class implementation. While standard Java object serialization can accommodate many kinds of "compatible changes" in classes, it requires some help from the developer to get it right. Because the XMLEncoder uses only public APIs and writes instructions in simple XML, it is expected that this form of serialization will be the most robust way to store the state of JavaBeans. The process is referred to as "long-term persistence" for JavaBeans. Give it a whirl. You can use the model-builder example to create the beans and compare the output to our original XML. You can add this bit to our TestModelBuilder class, which will populate the beans for us before we write them out:

 import java.beans.XMLEncoder;
 XMLEncoder xmle = new XMLEncoder( System.out );
 xmle.writeObject( inventory );
 xmle.close( );


Further thoughts

It might seem at first like this would obviate the need for our SAXModelBuilder example. Why not simply write our XML in the format that XMLDecoder understands and use it to build our model? Although XMLEncoder is very efficient at eliminating redundancy, you can see that its output is still very verbose (about two to three times larger than our original XML) and not very human-friendly. Although it's possible to write it by hand, this XML format wasn't designed for that. Finally, although XMLEncoder can be customized for how it handles specific object types, it suffers from the same problem that our model builder does, in that "binding" (the namespace of tags) is determined strictly by our Java class names. As we've said before, what is really needed is a more general tool to generate classes or to map our own classes to XML and back.