Parsing with SAX
Without spending any further time on the preliminaries, it's time to code. As a sample to familiarize you with SAX, this chapter details the SAXTreeViewer class. This utility uses SAX to parse an XML document, and displays the document visually as a Swing JTRee.
|
The first thing you need to do in any SAX-based app is get an instance of a class that implements the SAX org.xml.sax.XMLReader interface; remember, this is why you downloaded a SAX-compliant parser in the first place.
Instantiating a Reader
SAX provides the org.xml.sax.XMLReader interface for all SAX-compliant XML parsers to implement. For example, the Xerces SAX parser implementation, org.apache.xerces.parsers.SAXParser, implements the XMLReader interface. If you have access to the source of your parser, you should see the same interface implemented in your parser's main SAX parser class. Each XML parser must have one class (and sometimes has more than one) that implements this interface, and that is the class you need to instantiate to allow for parsing XML:
// Instantiate a Reader XMLReader reader = new org.apache.xerces.parsers.SAXParser( ); // Do something with the parser reader.parse(uri);
|
This approach ties you tightly to your parser vendor, though; you can use SAX's org.xml.sax.helpers.XMLReaderFactory to get away from this:
XMLReader reader = XMLReaderFactory.createXMLReader( );
Just set the org.xml.sax.driver system property, and you can get your vendor's XMLReader implementation, without importing your vendor's classes:
java -Dorg.xml.sax.driver=org.apache.xerces.parsers.SAXParser [MyClassName]
Even better, most vendor's will set this property internally, meaning you don't have to worry about this system property at all; just call createXMLReader( ), and go.
|
With that in mind, it's worth looking at a more realistic app. Example 3-1 is the skeleton for the SAXTreeViewer class, which allows viewing of an XML document as a graphical tree.
Example This class sets up an XMLReader and then lists the basic parsing steps
public class SAXTreeViewer extends JFrame { // Swing-related variables and methods, including // setting up a JTree and basic content pane public void buildTree(DefaultTreeModel treeModel, DefaultMutableTreeNode base, String xmlURI) throws IOException, SAXException { // Create instances needed for parsing XMLReader reader = XMLReaderFactory.createXMLReader( ); // Register content handler // Register error handler // Parse } public static void main(String[] args) { try { if (args.length != 1) { System.out.println( "Usage: java javaxml3.SAXTreeViewer " + "[XML Document]"); return; } SAXTreeViewer viewer = new SAXTreeViewer( ); viewer.init(args[0]); viewer.setVisible(true); } catch (Exception e) { e.printStackTrace( ); } } } |
|
The buildTree( ) method is where we'll be spending our time in this chapter; you can already see I've placed a few comments to outline the basic steps involved in parsing with SAX.
Parsing the Document
Once a reader is loaded and ready for use, use the parse( ) method to parse XML; this method accepts either an org.xml.sax.InputSource or a simple string. It's a much better idea to use the SAX InputSource class, because it can be constructed with an I/O InputStream, Reader, or a string URI.
U-R-What?A URI is a uniform resource identifier. As the name suggests, it provides a standard means of identifying (and thereby locating, in most cases) a specific resource; this resource is almost always some sort of XML document, for the purposes of this tutorial. URIs are also related to URLs, uniform resource locators. In fact, a URL is always a URI (although the reverse is not true). So in the examples in this and other chapters, you could specify a filename or a URL, like http://www.ibiblio.org/xml/examples/shakespeare/othello.xml, and either would be accepted. |
Because the code loads an XML document, either locally or remotely, a java.io.IOException may result, and must be caught. In addition, the org.xml.sax.SAXException will be thrown if problems occur while parsing the document. Notice that the buildTree method can throw both of these exceptions:
public void buildTree(DefaultTreeModel treeModel, DefaultMutableTreeNode base, String xmlURI) throws IOException, SAXException { // Create instances needed for parsing XMLReader reader = XMLReaderFactory.createXMLReader( ); // Register content handler // Register error handler // Parse InputSource inputSource = new InputSource(xmlURI); reader.parse(inputSource); }
Using InputSource for input
The advantage to using an InputSource instead of directly supplying a URI is simple: InputSource can provide more information to the parser. An InputSource encapsulates information about a single object, the document to parse. In situations where a system identifier, public identifier, or stream may all be tied to one URI, using an InputSource for encapsulation can become very handy. The class has accessor and mutator methods for its system ID and public ID, a character encoding, a byte stream (java.io.InputStream), and a character stream (java.io.Reader). When passed as an argument to the parse( ) method, SAX also guarantees that the parser will never modify the InputSource. The original input to a parser is still available unchanged after its use by a parser or XML-aware app. To put this in perspective, consider parsing a document with a simple DTD reference:
<!DOCTYPE PLAY SYSTEM "play.dtd">
By using an InputSource and wrapping the supplied XML URI, you have set implicitly the system ID of the document. This effectively sets up the path to the document for the parser and allows it to resolve all relative paths within that document, like the play.dtd file. If instead of setting this ID, you parsed an I/O stream, the DTD wouldn't be able to be located (as it has no frame of reference); you could simulate this by changing the code in the buildTree( ) method to what is shown here:
// Parse InputSource inputSource = new InputSource(new java.io.FileInputStream( new java.io.File(xmlURI))); reader.parse(inputSource);
You'll now get the following exception when running the viewer:
/usr/local/writing/javaxml3>java javaxml3.SAXTreeViewer /usr/local/contents.xml org.xml.sax.SAXParseException: File "file:///usr/local/writing/javaxml3/play.dtd" not found.
While this seems a little silly (wrapping a URI in a file and I/O stream), it's actually quite common to see people using I/O streams as input to parsers. You just need to set a system ID for the XML stream (using the setSystemID( ) method on InputSource). So the above code sample could be "fixed" by changing it to the following:
// Parse InputSource inputSource = new InputSource(new java.io.FileInputStream( new java.io.File(xmlURI))); inputSource.setSystemID(xmlURI); reader.parse(inputSource);
Not much going on...
If you compile and run the program now, nothing of any real interest seems to happen. Despite appearance, though, the XML document is parsed.
|
However, you've provided no callbacks to take action during the parsing; without these callbacks, a document is simply parsed quietly. Parser callbacks let you insert action into the program flow, and turn the rather boring, quiet parsing of an XML document into an app that can react to the data, elements, attributes, and structure of the document being parsed, as well as interact with other programs and clients along the way.