| Sign In/My Account | View Cart |
XSLT Processing with Java
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9
XSLT processors, like other XML tools, can read their input data
from many different sources. In the most basic scenario, you will load a
static stylesheet and XML document using the java.io.File class. More commonly, the XSLT stylesheet
will come from a file, but the XML data will be generated dynamically as the
result of a database query. In this case, it does not make sense to write the
database query results to an XML file and then parse it into the XSLT
processor. Instead, it is desirable to pipe the XML data directly into the
processor using SAX or DOM. In fact, we will even see how to read nonXML data
and transform it using XSLT.
The simple examples presented earlier in this chapter introduced
the concept of a system identifier. As mentioned before, system identifiers
are nothing more than URIs and are used frequently by XML tools. For example,
javax.xml.transform.Source, one of the key
interfaces in JAXP, has the following API:
public interface Source {
String getSystemId( );
void setSystemId(String systemId);
}
The second method, setSystemId( ), is
crucial. By providing a URI to the Source, the XSLT
processor can resolve URIs encountered in XSLT stylesheets. This allows XSLT
code like this to work:
<xsl:import href="commonFooter.xslt"/>
When it comes to XSLT programming, you will use methods in java.io.File and java.net.URL
to convert platform-specific file names into system IDs. These can then be
used as parameters to any methods that expect a system ID as a parameter. For
example, you would write the following code to convert a platform-specific
filename into a system ID:
public static void main(String[] args) {
// assume that the first command-line arg
// contains a file name
// - on Windows, something like
// "C:\home\index.xml"
// - on Unix, something like
// "/usr/home/index.xml"
String fileName = args[0];
File fileObject = new File(fileName);
URL fileURL = fileObject.toURL( );
String systemID = fileURL.toExternalForm( );
This code was written on several lines for clarity; it can be consolidated as follows:
String systemID = new File(fileName).toURL().toExternalForm( );
Converting from a system identifier back to a filename or a
File object can be accomplished with this code:
URL url = new URL(systemID);
String fileName = url.getFile( );
File fileObject = new File(fileName);
And once again, this code can be condensed into a single line as follows:
File fileObject = new File((new URL(systemID)).getFile( ));
The Source and Result interfaces in javax.xml.transform provide the basis for all
transformation input and output in JAXP 1.1. Regardless of whether a
stylesheet is obtained via a URI, filename, or InputStream, its data is fed into JAXP via an
implementation of the Source interface. The output
is then sent to an implementation of the Result
interface. The implementations provided by JAXP are shown in Figure 5-3.
|
As you can see, JAXP is not particular about where it gets its
data or sends its results. Remember that two instances of Source are always specified: one for the XML data and
another for the XSLT stylesheet.
As shown in Figure 5-3, StreamSource is one of the implementations
of the Source interface. In addition to the system
identifiers that Source provides, StreamSource allows input to be obtained from a File, an InputStream, or a
Reader. The SimpleJaxp
class in Example
5-3 showed how to use StreamSource to read from
a File object. There are also four constructors
that allow you to construct a StreamSource from
either an InputStream or Reader. The complete list of constructors is shown here:
public StreamSource( )
public StreamSource(File f)
public StreamSource(String systemId)
public StreamSource(InputStream byteStream)
public StreamSource(InputStream byteStream, String systemId)
public StreamSource(Reader characterStream)
public StreamSource(Reader characterStream, String systemId)
For the constructors that take InputStream and Reader as
arguments, the first argument provides either the XML data or the XSLT
stylesheet. The second argument, if present, is used to resolve relative URI
references in the document. As mentioned before, your XSLT stylesheet may
include the following code:
<xsl:import href="commonFooter.xslt"/>
By providing a system identifier as a parameter to the StreamSource, you are telling the XSLT processor where to
look for commonFooter.xslt. Without this parameter,
you may encounter an error when the processor cannot resolve this URI. The
simple fix is to call the setSystemId( ) method as
follows:
// construct a Source that reads from an InputStream
Source mySrc = new StreamSource(anInputStream);
// specify a system ID (a String) so the
// Source can resolve relative URLs
// that are encountered in XSLT stylesheets
mySrc.setSystemId(aSystemId);
The documentation for StreamSource
also advises that InputStream is preferred to Reader because this allows the processor to properly
handle the character encoding as specified in the XML declaration.
StreamResult is similar in functionality to StreamSource, although it is not necessary to resolve relative URIs. The available constructors are as follows:
public StreamResult( )
public StreamResult(File f)
public StreamResult(String systemId)
public StreamResult(OutputStream byteStream)
public StreamResult(Writer characterStream)
Let's look at a simple example to see some of the other options
for StreamSource and StreamResult. Example
5-4 is a modification of the SimpleJaxp program
that was presented earlier. It basically downloads the XML specification from
the W3C web site and stores it in a temporary file on your local disk. To
download the file, construct a StreamSource with a
system identifier as a parameter. The stylesheet is a simple one that merely
performs an identity transformation, copying the unmodified XML data to the
result tree. The result is then sent to a StreamResult using its File
constructor.
Example 5-4: Streams.java
package chap5;
import java.io.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;
/**
* A simple demo of JAXP 1.1 StreamSource and
* StreamResult. This program downloads the
* XML specification from the W3C and prints
* it to a temporary file.
*/
public class Streams {
// an identity copy stylesheet
private static final String IDENTITY_XSLT =
"<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform'"
+ " version='1.0'>"
+ "<xsl:template match='/'><xsl:copy-of select='.'/>"
+ "</xsl:template></xsl:stylesheet>";
// the XML spec in XML format
// (using an HTTP URL rather than a file URL)
private static String xmlSystemId =
"http://www.w3.org/TR/2000/REC-xml-20001006.xml";
public static void main(String[] args) throws IOException,
TransformerException {
// show how to read from a system identifier and a Reader
Source xmlSource = new StreamSource(xmlSystemId);
Source xsltSource = new StreamSource(
new StringReader(IDENTITY_XSLT));
// send the result to a file
File resultFile = File.createTempFile("Streams", ".xml");
Result result = new StreamResult(resultFile);
System.out.println("Results will go to: "
+ resultFile.getAbsolutePath( ));
// get the factory
TransformerFactory transFact = TransformerFactory.newInstance( );
// get a transformer for this particular stylesheet Transformer trans = transFact.newTransformer(xsltSource);
// do the transformation trans.transform(xmlSource, result);
}
}
The "identity copy" stylesheet simply matches "/", which is the document itself. It then uses <xsl:copy-of select='.'/> to select the document
and copy it to the result tree. In this case, we coded our own stylesheet. You
can also omit the XSLT stylesheet altogether as follows:
// construct a Transformer
// without any XSLT stylesheet
Transformer trans = transFact.newTransformer( );
In this case, the processor will provide its own stylesheet and
do the same thing that our example does. This is useful when you need to use
JAXP to convert a DOM tree to XML text for debugging purposes because the
default Transformer will simply copy the XML data
without any transformation.
In many cases, the fastest form of transformation available is
to feed an instance of org.w3c.dom.Document
directly into JAXP. Although the transformation is fast, it does take time to
generate the DOM; DOM is also memory intensive, and may not be the best choice
for large documents. In most cases, the DOM data will be generated dynamically
as the result of a database query or some other operation (see Chapter 1).
Once the DOM is generated, simply wrap the Document
object in a DOMSource as follows:
org.w3c.dom.Document domDoc = createDomDocument( );Source xmlSource = new javax.xml.transform.dom.DOMSource(domDoc);
The remainder of the transformation looks identical to the
file-based transformation shown in Example
5-4. JAXP needs only the alternate input Source
object shown here to read from DOM.
XSLT is designed to transform well-formed XML data into another format, typically HTML. But wouldn't it be nice if we could also use XSLT stylesheets to transform nonXML data into HTML? For example, most spreadsheets have the ability to export their data into Comma Separated Values (CSV) format, as shown here:
Burke,Eric,M
Burke,Jennifer,L
Burke,Aidan,G
One approach is parsing the file into memory, using DOM to create an XML representation of the data, and then feeding that information into JAXP for transformation. This approach works but requires an intermediate programming step to convert the CSV file into a DOM tree. A better option is to write a custom SAX parser, feeding its output directly into JAXP. This avoids the overhead of constructing the DOM tree, offering better memory utilization and performance.