Previous | Next
Input and Output XSLT processors, like other XML tools, can read their input data from many different sources. In the most basic scenario, you will load a static stylesheet and XML document using the System Identifiers, Files, and URLsThe simple examples presented earlier in this chapter introduced the concept of a system identifier. As mentioned before, system identifiers are nothing more than URIs and are used frequently by XML tools. For example, public interface Source { String getSystemId( ); void setSystemId(String systemId); } The second method, <xsl:import href="commonfooter.xslt"/> When it comes to XSLT developing, you will use methods in public static void main(String[] args) { // assume that the first command-line arg contains a file name // - on Windows, something like "C:\home\index.xml" // - on Unix, something like "/usr/home/index.xml" String fileName = args[0]; File fileObject = new File(fileName); URL fileURL = fileObject.toURL( ); String systemID = fileURL.toExternalForm( ); This code was written on several lines for clarity; it can be consolidated as follows: String systemID = new File(fileName).toURL().toExternalForm( ); Converting from a system identifier back to a filename or a URL url = new URL(systemID); String fileName = url.getFile( ); File fileObject = new File(fileName); And once again, this code can be condensed into a single line as follows: File fileObject = new File((new URL(systemID)).getFile( )); JAXP I/O DesignThe Figure 5-3. Source and Result interfacesAs you can see, JAXP is not particular about where it gets its data or sends its results. Remember that two instances of JAXP Stream I/OAs shown in Figure 5-3, public StreamSource( ) public StreamSource(File f) public StreamSource(String systemId) public StreamSource(InputStream byteStream) public StreamSource(InputStream byteStream, String systemId) public StreamSource(Reader characterStream) public StreamSource(Reader characterStream, String systemId) For the constructors that take <xsl:import href="commonfooter.xslt"/> system identifier as a parameter to the // construct a Source that reads from an InputStream Source mySrc = new StreamSource(anInputStream); // specify a system ID (a String) so the Source can resolve relative URLs // that are encountered in XSLT stylesheets mySrc.setSystemId(aSystemId); The documentation for
public StreamResult( ) public StreamResult(File f) public StreamResult(String systemId) public StreamResult(OutputStream byteStream) public StreamResult(Writer characterStream) Let's look at some of the other options for Example 5-4. Streams.javapackage chap5; import java.io.*; import javax.xml.transform.*; import javax.xml.transform.stream.*; /** * A simple demo of JAXP 1.1 StreamSource and StreamResult. This * program downloads the XML specification from the W3C and prints * it to a temporary file. */ public class Streams { // an identity copy stylesheet The "identity copy" stylesheet simply matches // construct a Transformer without any XSLT stylesheet Transformer trans = transFact.newTransformer( ); In this case, the processor will provide its own stylesheet and do the same thing that our example does. This is useful when you need to use JAXP to convert a DOM tree to XML text for debugging purposes because the default JAXP DOM I/OIn many cases, the fastest form of transformation available is to feed an instance of org.w3c.dom.Document domDoc = createDomDocument( ); The remainder of the transformation looks identical to the file-based transformation shown in Example 5-4. JAXP needs only the alternate input Source object shown here to read from DOM. JAXP SAX I/OXSLT is designed to transform well-formed XML data into another format, typically HTML. But wouldn't it be nice if we could also use XSLT stylesheets to transform nonXML data into HTML? For example, most spreadsheets have the ability to export their data into Comma Separated Values (CSV) format, as shown here: Burke,Eric,M Burke,Jennifer,L Burke,Aidan,G One approach is parsing the file into memory, using DOM to create an XML representation of the data, and then feeding that information into JAXP for transformation. This approach works but requires an intermediate developing step to convert the CSV file into a DOM tree. A better option is to write a custom SAX parser, feeding its output directly into JAXP. This avoids the overhead of constructing the DOM tree, offering better memory utilization and performance. The approachIt turns out that writing a SAX parser is quite easy.[21] All a SAX parser does is read an XML file top to bottom and fire event notifications as various elements are encountered. In our custom parser, we will read the CSV file top to bottom, firing SAX events as we read the file. A program listening to those SAX events will not realize that the data file is CSV rather than XML; it sees only the events. Figure 5-4 illustrates the conceptual model.
![]() Figure 5-4. Custom SAX parserIn this model, the XSLT processor interprets the SAX events as XML data and uses a normal stylesheet to perform the transformation. The interesting aspect of this model is that we can easily write custom SAX parsers for other file formats, making XSLT a useful transformation language for just about any legacy application data. In SAX, Obtaining an instance of TransformerFactory transFact = TransformerFactory.newInstance( ); As before, the if (transFact.getFeature(SAXTransformerFactory.FEATURE)) { If this returns SAXTransformerFactory saxTransFact = (SAXTransformerFactory) transFact; // create a ContentHandler, don't specify a stylesheet. Without // a stylesheet, raw XML is sent to the output. TransformerHandler transHand = saxTransFact.newTransformerHandler( ); In the code shown here, a stylesheet was not specified. JAXP defaults to the identity transformation stylesheet, which means that the SAX events will be "transformed" into raw XML output. To specify a stylesheet that performs an actual transformation, pass a Source xsltSource = new StreamSource(myXsltSystemId); TransformerHandler transHand = saxTransFact.newTransformerHandler( xsltSource); Detailed CSV to SAX designBefore delving into the complete example program, let's step back and look at a more detailed design diagram. The conceptual model is straightforward, but quite a few classes and interfaces come into play. Figure 5-5 shows the pieces necessary for SAX-based transformations. Figure 5-5. SAX and XSLT transformationsThis diagram certainly appears to be more complex than previous approaches, but is similar in many ways. In previous approaches, we used the TransformerFactory transFact = TransformerFactory.newInstance( ); if (transFact.getFeature(SAXTransformerFactory.FEATURE)) { // downcast is allowed SAXTransformerFactory saxTransFact = (SAXTransformerFactory) transFact; If TransformerHandler transHand = saxTransFact.newTransformerHandler(myXsltSource); This object now represents your XSLT stylesheet. As Figure 5-5 shows, Writing the custom parserWriting the actual SAX parser sounds harder than it really is. The process basically involves implementing the Example 5-5. AbstractXMLReader.javapackage com.anonymous.javaxslt.util; import java.io.IOException; import java.util.*; import org.xml.sax.*; /** * An abstract class that implements the SAX2 XMLReader interface. The * intent of this class is to make it easy for subclasses to act as * SAX2 XMLReader implementations. This makes it possible, for example, for * them to emit SAX2 events that can be fed into an XSLT processor for * transformation. */ public abstract class AbstractXMLReader implements org.xml.sax.XMLReader { private Map featureMap = new HashMap( ); private Map propertyMap = new HashMap( ); private EntityResolver entityResolver; private DTDHandler dtdHandler; private ContentHandler contentHandler; private ErrorHandler errorHandler; /** * The only abstract method in this class. Derived classes can parse * any source of data and emit SAX2 events to the ContentHandler. */ public abstract void parse(InputSource input) throws IOException, SAXException; public boolean getFeature(String name) throws SAXNotRecognizedException, SAXNotSupportedException { Boolean featureValue = (Boolean) this.featureMap.get(name); return (featureValue == null) ? false : featureValue.booleanValue( ); } public void setFeature(String name, boolean value) throws SAXNotRecognizedException, SAXNotSupportedException { this.featureMap.put(name, new Boolean(value)); } public Object getProperty(String name) throws SAXNotRecognizedException, SAXNotSupportedException { return this.propertyMap.get(name); } public void setProperty(String name, Object value) throws SAXNotRecognizedException, SAXNotSupportedException { this.propertyMap.put(name, value); } public void setEntityResolver(EntityResolver entityResolver) { this.entityResolver = entityResolver; } public EntityResolver getEntityResolver( ) { return this.entityResolver; } public void setDTDHandler(DTDHandler dtdHandler) { this.dtdHandler = dtdHandler; } public DTDHandler getDTDHandler( ) { return this.dtdHandler; } public void setContentHandler(ContentHandler contentHandler) { this.contentHandler = contentHandler; } public ContentHandler getContentHandler( ) { return this.contentHandler; } public void setErrorHandler(ErrorHandler errorHandler) { this.errorHandler = errorHandler; } public ErrorHandler getErrorHandler( ) { return this.errorHandler; } public void parse(String systemId) throws IOException, SAXException { parse(new InputSource(systemId)); } } Creating the subclass, Burke,Eric,M Burke,Jennifer,L Burke,Aidan,G The XML representation of this file is shown in Example 5-6. The only real drawback here is that CSV files are strictly positional, meaning that names are not assigned to each column of data. This means that the XML output merely contains a sequence of three Example 5-6. Example XML output from CSV parser<?xml version="1.0" encoding="UTF-8"?> <csvFile> <line> <value>Burke</value> <value>Eric</value> <value>M</value> </line> <line> <value>Burke</value> <value>Jennifer</value> <value>L</value> </line> <line> <value>Burke</value> <value>Aidan</value> <value>G</value> </line> </csvFile> One enhancement would be to design the CSV parser so it could accept a list of meaningful column names as parameters, and these could be used in the XML that is generated. Another option would be to write an XSLT stylesheet that transformed this initial output into another form of XML that used meaningful column names. To keep the code example relatively manageable, these features were omitted from this implementation. But there are some complexities to the CSV file format that have to be considered. For example, fields that contain commas must be surrounded with quotes: "Consultant,Author,Teacher",Burke,Eric,M Teacher,Burke,Jennifer,L None,Burke,Aidan,G To further complicate matters, fields may also contain quotes ("). In this case, they are doubled up, much in the same way you use double backslash characters (\\) in Java to represent a single backslash. In the following example, the first column contains a single quote, so the entire field is quoted, and the single quote is doubled up: "test""quote",Teacher,Burke,Jennifer,L This would be interpreted as: test"quote,Teacher,Burke,Jennifer,L The code in Example 5-7 shows the complete implementation of the CSV parser. Example 5-7. CSVXMLReader.javapackage com.anonymous.javaxslt.util; import java.io.*; import java.net.URL; import org.xml.sax.*; import org.xml.sax.helpers.*; /** * A utility class that parses a Comma Separated Values (CSV) file * and outputs its contents using SAX2 events. The format of CSV that * this class reads is identical to the export format for Microsoft * Excel. For simple values, the CSV file may look like this: * <pre> * a,b,c * d,e,f * </pre> * Quotes are used as delimiters when the values contain commas: * <pre> * a,"b,c",d * e,"f,g","h,i" * </pre> * And double quotes are used when the values contain quotes. This parser * is smart enough to trim spaces around commas, as well. * * @author Eric M. Burke */ public class CSVXMLReader extends AbstractXMLReader { // an empty attribute for use with SAX private static final Attributes EMPTY_ATTR = new AttributesImpl( ); /** * Parse a CSV file. SAX events are delivered to the ContentHandler * that was registered via <code>setContentHandler</code>. * * @param input the comma separated values file to parse. */
public void parse(InputSource input) throws IOException, SAXException { // if no handler is registered to receive events, don't bother // to parse the CSV file ContentHandler ch = getContentHandler( ); if (ch == null) { return; } The first thing this method does is check for the existence of a SAX The SAX // convert the InputSource into a BufferedReader BufferedReader br = null; if (input.getCharacterStream( ) != null) { br = new BufferedReader(input.getCharacterStream( )); } else if (input.getByteStream( ) != null) { br = new BufferedReader(new InputStreamReader( input.getByteStream( ))); } else if (input.getSystemId( ) != null) { java.net.URL url = new URL(input.getSystemId( )); br = new BufferedReader(new InputStreamReader(url.openStream( ))); } else { throw new SAXException("Invalid InputSource object"); } Assuming that our ch.startDocument( ); // emit <csvFile> ch.startElement("","","csvFile",EMPTY_ATTR); The XSLT processor interprets this to mean the following: <?xml version="1.0" encoding="UTF-8"?> <csvFile> Our parser simply ignores many SAX 2 features, particularly XML namespaces. This is why many values passed as parameters to the various The CSV file itself is very straightforward, so we merely loop over every line in the file, emitting SAX events as we read each line. The // read each line of the file until EOF is reached String curLine = null; while ((curLine = br.readLine( )) != null) { curLine = curLine.trim( ); if (curLine.length( ) > 0) { // create the <line> element ch.startElement("","","line",EMPTY_ATTR); parseLine(curLine, ch); ch.endElement("","","line"); } } And finally, we must indicate that the parsing is complete: // emit </csvFile> ch.endElement("","","csvFile"); ch.endDocument( ); The remaining methods in <value>Some Text Here</value> SAX parsers use the public void characters(char[] ch, int start, int length) Although this method could have been designed to take a Our parser uses a relatively straightforward approach, simply converting a // emit the <value>text</value> element ch.startElement("","","value",EMPTY_ATTR); ch.characters(firstToken.toCharArray(), 0, firstToken.length( )); ch.endElement("","","value"); Using the parserTo wrap things up, let's look at how you will actually use this CSV parser with an XSLT stylesheet. The code shown in Example 5-8 is a standalone Java application that allows you to perform XSLT transformations on CSV files. As the comments indicate, it requires the name of a CSV file as its first parameter and can optionally take the name of an XSLT stylesheet as its second parameter. All output is sent to Example 5-8. SimpleCSVProcessor.javapackage com.anonymous.javaxslt.util; import java.io.*; import javax.xml.transform.*; import javax.xml.transform.sax.*; import javax.xml.transform.stream.*; import org.xml.sax.*; /** * Shows how to use the CSVXMLReader class. This is a command-line * utility that takes a CSV file and optionally an XSLT file as * command line parameters. A transformation is applied and the * output is sent to System.out. */ public class SimpleCSVProcessor { public static void main(String[] args) throws Exception { if (args.length == 0) { System.err.println("Usage: java " + SimpleCSVProcessor.class.getName( ) + " <csvFile> [xsltFile]"); System.err.println(" - csvFile is required"); System.err.println(" - xsltFile is optional"); System.exit(1); } String csvFileName = args[0]; String xsltFileName = (args.length > 1) ? args[1] : null; TransformerFactory transFact = TransformerFactory.newInstance( ); if (transFact.getFeature(SAXTransformerFactory.FEATURE)) { SAXTransformerFactory saxTransFact = (SAXTransformerFactory) transFact; As mentioned earlier in this chapter, the TransformerHandler transHand = null; if (xsltFileName == null) { transHand = saxTransFact.newTransformerHandler( ); } else { transHand = saxTransFact.newTransformerHandler( new StreamSource(new File(xsltFileName))); } When the XSLT stylesheet is not specified, the transformer performs an identity transformation. This is useful when you just want to see the raw XML output without applying a stylesheet. You will probably want to do this first to see how your XSLT will need to be written. If a stylesheet is provided, however, it is used for the transformation. The custom parser is then constructed as follows: CSVXMLReader csvReader = new CSVXMLReader( ); The location of the CSV file is then converted into a SAX InputSource csvInputSrc = new InputSource( new FileReader(csvFileName)); And finally, the XSLT processor is attached to our custom parser. This is accomplished by registering the // attach the XSLT processor to the CSVXMLReader csvReader.setContentHandler(transHand); csvReader.parse(csvInputSrc); For a simple test, assume that a list of presidents is available in CSV format: Washington,George,, Adams,John,, Jefferson,Thomas,, Madison,James,, etc... Bush,George,Herbert,Walker Clinton,William,Jefferson, Bush,George,W, To see what the XML looks like, invoke the program as follows: java com.anonymous.javaxslt.util.SimpleCSVProcessor presidents.csv This will parse the CSV file and apply the identity transformation stylesheet, sending the following output to the console: <?xml version="1.0" encoding="UTF-8"?> <csvFile> <line> <value>Washington</value> <value>George</value> <value/> <value/> </line> <line> etc... </csvFile> Actually, the output is crammed onto a single long line, but it is broken up here to make it more readable. Any good XML editor application should provide a feature to pretty-print the XML as shown. In order to transform this into something useful, a stylesheet is required. The XSLT stylesheet shown in Example 5-9 takes any output from this program and converts it into an HTML table. Example 5-9. csvToHTMLTable.xslt<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html"/> <xsl:template match="/"> <table > <xsl:apply-templates select="csvFile/line"/> </table> </xsl:template> <xsl:template match="line"> <tr> <xsl:apply-templates select="value"/> </tr> </xsl:template> <xsl:template match="value"> <td> <!-- If a value is empty, print a non-breaking space so the HTML table looks OK --> <xsl:if test=".="""> <xsl:text> disable-output-escaping="yes">&nbsp;</xsl:text> </xsl:if> <xsl:value-of select="."/> </td> </xsl:template> </xsl:stylesheet> In order to apply this stylesheet, type the following command: java com.anonymous.javaxslt.util.SimpleCSVProcessor presidents.csv csvToHTMLTable.xslt As before, the results are sent to ConclusionAlthough writing a SAX parser and connecting it to JAXP does involve quite a few interrelated classes, the resulting application requires only two command-line arguments and will work with any CSV or XSLT file. What makes this example interesting is that the same approach will work with essentially any data source. The steps are broken down as follows:
For example, you might want to write a custom parser that accepts a SQL statement as input rather than a CSV file. Your parser could then connect to a database, issue the query, and fire SAX events for each row in the Feeding JDOM Output into JAXPThe DOM API is tedious to use, so many Java developers opt for JDOM instead. The typical usage pattern is to generate XML dynamically using JDOM and then somehow transform that into a web page using XSLT. This presents a problem because JAXP does not provide any direct implementation of the
JDOM to SAX approachThe SAX approach is generally preferable to other approaches. Its primary advantage is that it does not require an intermediate transformation to convert the JDOM tree into a DOM tree or text. This offers the lowest memory utilization and potentially the fastest performance. In support of SAX, JDOM offers the TransformerFactory transFact = TransformerFactory.newInstance( ); if (transFact.getFeature(SAXTransformerFactory.FEATURE)) { SAXTransformerFactory stf = (SAXTransformerFactory) transFact; // the 'stylesheet' parameter is an instance of JAXP's // javax.xml.transform.Templates interface TransformerHandler transHand = stf.newTransformerHandler(stylesheet); // result is a Result instance transHand.setResult(result); JDOM to DOM approachThe DOM approach is generally a little slower and will not work if JDOM uses a different DOM implementation than JAXP. JDOM, like JAXP, can utilize different DOM implementations behind the scenes. If JDOM refers to a different version of DOM than JAXP, you will encounter exceptions when you try to perform the transformation. Since JAXP uses Apache's Crimson parser by default, you can configure JDOM to use Crimson with the org.jdom.Document jdomDoc = createJDOMDocument( ); // add data to the JDOM Document ... // convert the JDOM Document into a DOM Document The second line is highlighted because it is likely to give you the most problems. When JDOM converts its internal object tree into a DOM object tree, it must use some underlying DOM implementation. In many respects, JDOM is similar to JAXP because it delegates many tasks to underlying implementation classes. The // use the default adapter class public DOMOutputter( ) // use the specified adapter class public DOMOutputter(String adapterClass) The first constructor shown here will use JDOM's default DOM parser, which is not necessarily the same DOM parser that JAXP uses. The second method allows you to specify the name of an adapter class, which must implement the JDOM to text approachIn the final approach listed earlier, you can utilize StringWriter sw = new StringWriter( ); org.jdom.output.XMLOutputter xmlOut = new org.jdom.output.XMLOutputter("", false); xmlOut.output(jdomDoc, sw); The parameters for StringReader sr = new StringReader(sw.toString( )); Source xmlSource = new javax.xml.transform.stream.StreamSource(sr); The transformation can then proceed just as it did in Example 5-4. The main drawback to this approach is that the XML, once converted to text form, must then be parsed back in by JAXP before the transformation can be applied. |