Filters and Writers - XML - Java Programming Language

At this point, I want to diverge from the beaten path. There are a lot of additional features in SAX that can really turn you into a power developer, and take you beyond the confines of "standard" SAX. In this section, I'll introduce you to two of these: SAX filters and writers. Using classes both in the standard SAX distribution and available separately from the SAX web site (http://www.saxproject.org), you can add some fairly advanced behavior to your SAX apps. This will also get you in the mindset of using SAX as a pipeline of events, rather than a single layer of processing.

XMLFilters

First on the list is the org.xml.sax.XMLFilter class that comes in the basic SAX download, and should be included with any parser distribution supporting SAX 2. This class extends the XMLReader interface, and adds two new methods to that class, as shown in .

Extra methods defined by the XMLFilter interface

It might not seem like there is much to say here; what's the big deal, right? Well, by allowing a hierarchy of XMLReaders through this filtering mechanism, you can build up a processing chain, or pipeline, of events. To understand what I mean by a pipeline, you first need to understand the normal flow of a SAX parse:

Events in an XML document are passed to the SAX reader.
The SAX reader and registered handlers pass events and data to an app.

What developers started realizing, though, is that it is simple to insert one or more additional links into this chain:

Events in an XML document are passed to the SAX reader.
The SAX reader performs some processing and passes information to another SAX reader.
Repeat until all SAX processing is done.
Finally, the SAX reader and registered handlers pass events and data to an app.

It's the middle two steps that create a pipeline, where one reader that performed specific processing passes its information on to another reader, repeatedly, instead of having to lump all code into one reader. When this pipeline is set up with multiple readers, modular and efficient coding results. And that's what the XMLFilter class allows for: chaining of XMLReader implementations through filtering. Enhancing this even further is the class org.xml.sax.helpers.XMLFilterImpl, which provides a simple implementation of XMLFilter. It is the convergence of an XMLFilter and the DefaultHandler class: the XMLFilterImpl class implements XMLFilter, ContentHandler, ErrorHandler, EntityResolver, and DTDHandler, providing pass-through versions of each method of each handler. In other words, it sets up a pipeline for all SAX events, allowing your code to override any methods that need to insert processing into the pipeline. Again, it's best to see these in action. Example 4-2 is a working, ready-to-use filter. You're past the basics, so I'm going to move through this rapidly.

Example This simple filter allows for wholesale replacement of a namespace URI with a new URI

package javaxml3;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLFilterImpl;
public class NamespaceFilter extends XMLFilterImpl {
 /** The old URI, to replace */
 private String oldURI;
 /** The new URI, to replace the old URI with */
 private String newURI;
 public NamespaceFilter(XMLReader reader,
 String oldURI, String newURI) {
 super(reader);
 this.oldURI = oldURI;
 this.newURI = newURI;
 }
 public void startPrefixMapping(String prefix, String uri)
 throws SAXException {
 // Change URI, if needed
 if (uri.equals(oldURI)) {
 super.startPrefixMapping(prefix, newURI);
 } else {
 super.startPrefixMapping(prefix, uri);
 }
 }
 public void startElement(String uri, String localName,
 String qName, Attributes attributes)
 throws SAXException {
 // Change URI, if needed
 if (uri.equals(oldURI)) {
 super.startElement(newURI, localName, qName, attributes);
 } else {
 super.startElement(uri, localName, qName, attributes);
 }
 }
 public void endElement(String uri, String localName, String qName)
 throws SAXException {
 // Change URI, if needed
 if (uri.equals(oldURI)) {
 super.endElement(newURI, localName, qName);
 } else {
 super.endElement(uri, localName, qName);
 }
 }
}

Start out by extending XMLFilterImpl, so you don't have to worry about any events that you don't want to deal with (like DefaultHandler, you'll get no-op methods "for free"); the XMLFilterImpl class takes care of them by passing on all events unchanged unless a method is overridden. All that's left, in this example, is to change a namespace URI from an old one, to a new one.

If this example seems trivial, don't underestimate its usefulness. Many times in the last several years, the URI of a namespace for a specification (such as XML Schema or XSLT) has changed. Rather than having to hand-edit all of my XML documents or write code for XML that I receive, this NamespaceFilter takes care of the problem for me.

Passing an XMLReader instance to the constructor sets that reader as its parent, so the parent reader receives any events passed on from the filter (which is all events, by virtue of the XMLFilterImpl class, unless the NamespaceFilter class overrides that behavior). By supplying two URIsthe URI to be replaced, and the URI to replace that old one withyour filter is ready to use. The three overridden methods handle any needed interchanging of that URI. Once you have a filter like this in place, you supply a reader to it, and then operate upon the filter, not the reader. For example, suppose that the SAXTreeViewer app is used to display XML versions of Oracle tutorials, and the Oracle namespace URI for these tutorials is universally being changed from http://www.oracle.com to http://safari.oracle.com. In that case, you could use the filter like this:

public void buildTree(DefaultTreeModel treeModel,
 DefaultMutableTreeNode base, String xmlURI)
 throws IOException, SAXException {
 String featureURI = "";
 XMLReader reader = null;
 try {
 // Create instances needed for parsing
 reader = XMLReaderFactory.createXMLReader( );
 JTreeHandler jTreeHandler =
 new JTreeHandler(treeModel, base);
 NamespaceFilter filter = new NamespaceFilter(reader, "http://www.oracle.com",
 "http://safari.oracle.com");
 // Register content handler
 filter.setContentHandler(jTreeHandler);
 // Register error handler
 filter.setErrorHandler(jTreeHandler);
 // Register entity resolver
 filter.setEntityResolver(new SimpleEntityResolver( ));
 // Register lexical handler
 filter.setProperty("http://xml.org/sax/properties/lexical-handler",
 jTreeHandler);
 // Turn on validation
 featureURI = "http://xml.org/sax/features/validation";
 filter.setFeature(featureURI, true);
 // Turn on schema validation, as well
 featureURI = "http://apache.org/xml/features/validation/schema";
 filter.setFeature(featureURI, true);
 // Parse
 InputSource inputSource = new InputSource(xmlURI);
 filter.parse(inputSource);
 } catch (SAXNotRecognizedException e) {
 System.err.println("The parser class " + reader.getClass( ).getName( ) +
 " does not recognize the feature URI '" + featureURI + "'");
 System.exit(-1);
 } catch (SAXNotSupportedException e) {
 System.err.println("The parser class " + reader.getClass( ).getName( ) +
 " does not support the feature URI '" + featureURI + "'");
 System.exit(-1);
 }
}

Of course, you can chain these filters together as well, and use them as standard libraries. When I'm dealing with older XML documents, I often create several of these with old XSL and XML Schema URIs and put them in place so I don't have to worry about incorrect URIs:

XMLReader reader = XMLReaderFactory.createXMLReader(vendorParserClass);
NamespaceFilter xslFilter = new NamespaceFilter(reader, "http://www.w3.org/TR/XSL",
 "http://www.w3.org/1999/XSL/Transform");
NamespaceFilter xsdFilter = new NamespaceFilter(xslFilter, "http://www.w3.org/TR/XMLSchema",
 "http://www.w3.org/2001/XMLSchema");

Here, I'm building a longer pipeline to ensure that no old namespace URIs sneak by and cause my apps any trouble.

Be careful not to build too long a pipeline; each new link in the chain adds some processing time. All the same, this is a great way to build reusable components for SAX.

XMLWriter

Now that you understand how filters work in SAX, I want to introduce you to a specific filter, XMLWriter. This class, as well its subclass, DataWriter, can be downloaded from David Megginson's site at http://www.megginson.com/Software.

David Megginson shepherded SAX through its early days and has now returned to the fold. David is a SAX guru, and even though he no longer actively works on XMLWriter (or DataWriter), he has created some incredibly useful classes, and still hosts them on his personal web site.

XMLWriter extends XMLFilterImpl, and DataWriter extends XMLWriter. Both of these filter classes are used to output XML, which may seem a bit at odds with what you've learned so far about SAX. However, it's not that unusual; you could easily insert statements into a startElement( ) or characters( ) callback that fires up a java.io.Writer and outputs to it. In fact, that's awfully close to what XMLWriter and DataWriter do. I'm not going to spend a lot of time on this class, because it's not really the way you want to be outputting XML in the general sense; it's much better to use DOM, JDOM, or another XML API if you want mutability. However, the XMLWriter class offers a valuable way to inspect what's going on in a SAX pipeline. By inserting it between other filters and readers in your pipeline, it can be used to output a snapshot of your data. For example, in the case where you're changing namespace URIs, it might be that you want to actually store the XML document with the new namespace URI (be it a modified Oracle URI, an updated XSL URI, or whatever other use-case you come up with). This is a piece of cake with the XMLWriter class. Since you've already got SAXTreeViewer using the NamespaceFilter, I'll use that as an example. First, add import statements for java.io.Writer (for output), and the com.megginson.sax.XMLWriter class. Once that's in place, you'll need to insert an instance of XMLWriter between the NamespaceFilter and the XMLReader instances; this means output will occur after namespaces have been changed but before the visual events occur:

public void buildTree(DefaultTreeModel treeModel,
 DefaultMutableTreeNode base, String xmlURI)
 throws IOException, SAXException {
 String featureURI = "";
 XMLReader reader = null;
 try {
 // Create instances needed for parsing
 reader = XMLReaderFactory.createXMLReader( );
 JTreeHandler jTreeHandler =
 new JTreeHandler(treeModel, base);
 XMLWriter writer = new XMLWriter(reader, new FileWriter("snapshot.xml"));
 NamespaceFilter filter = new NamespaceFilter(writer, "http://www.oracle.com",
 "http://safari.oracle.com");
 // Register content handler
 filter.setContentHandler(jTreeHandler);
 // Register error handler
 filter.setErrorHandler(jTreeHandler);
 // Register entity resolver
 filter.setEntityResolver(new SimpleEntityResolver( ));
 // Register lexical handler
 filter.setProperty("http://xml.org/sax/properties/lexical-handler",
 jTreeHandler);
 // Turn on validation
 featureURI = "http://xml.org/sax/features/validation";
 filter.setFeature(featureURI, true);
 // Turn on schema validation, as well
 featureURI = "http://apache.org/xml/features/validation/schema";
 filter.setFeature(featureURI, true);
 // Parse
 InputSource inputSource = new InputSource(xmlURI);
 filter.parse(inputSource);
 } catch (SAXNotRecognizedException e) {
 System.err.println("The parser class " + reader.getClass( ).getName( ) +
 " does not recognize the feature URI '" + featureURI + "'");
 System.exit(-1);
 } catch (SAXNotSupportedException e) {
 System.err.println("The parser class " + reader.getClass( ).getName( ) +
 " does not support the feature URI '" + featureURI + "'");
 System.exit(-1);
 }
}

Be sure you set the parent of the NamespaceFilter instance to be the XMLWriter, not the XMLReader. Otherwise, no output will actually occur.

Once you've got these changes compiled in, run the example. You should get a snapshot.xml file created in the directory from which you're running the example. Both XMLWriter and DataWriter offer a lot more in terms of methods to output XML, both in full and in part, and you should check out the Javadoc included with the downloaded package.