Book HomeJava and XSLTSearch this book

5.4. Stylesheet Compilation

XSLT is a programming language, expressed using XML syntax. This is not for the benefit of the computer, but rather for human interpretation. Before the stylesheet can be processed, it must be converted into some internal machine-readable format. This process should sound familiar, because it is the same process used for every high-level programming language. You, the programmer, work in terms of the high-level language, and an interpreter or compiler converts this language into some machine format that can be executed by the computer.

Interpreters analyze source code and translate it into machine code with each execution. In this case of XSLT, this requires that the stylesheet be read into memory using an XML parser, translated into machine format, and then applied to your XML data. Performance is the obvious problem, particularly when you consider that stylesheets rarely change. Typically, the stylesheets are defined early on in the development process and remain static, while XML data is generated dynamically with each client request.

A better approach is to parse the XSLT stylesheet into memory once, compile it to machine-format, and then preserve that machine representation in memory for repeated use. This is called stylesheet compilation and is no different in concept than the compilation of any programming language.

5.4.1. Templates API

Different XSLT processors implement stylesheet compilation differently, so JAXP includes the javax.xml.transform.Templates interface to provide consistency. This is a relatively simple interface with the following API:

public interface Templates {
    java.util.Properties getOutputProperties( );
    javax.xml.transform.Transformer newTransformer( )
            throws TransformerConfigurationException;
}

The getOutputProperties( ) method returns a clone of the properties associated with the <xsl:output> element, such as method="xml", indent="yes", and encoding="UTF-8". You might recall that java.util.Properties (a subclass of java.util.Hashtable) provides key/value mappings from property names to property values. Since a clone, or deep copy, is returned, you can safely modify the Properties instance and apply it to a future transformation without affecting the compiled stylesheet that the instance of Templates represents.

The newTransformer( ) method is more commonly used and allows you to obtain a new instance of a class that implements the Transformer interface. It is this Transformer object that actually allows you to perform XSLT transformations. Since the implementation of the Templates interface is hidden by JAXP, it must be created by the following method on javax.xml.transform.TransformerFactory:

public Templates newTemplates(Source source)
        throws TransformerConfigurationException

As in earlier examples, the Source may obtain the XSLT stylesheet from one of many locations, including a filename, a system identifier, or even a DOM tree. Regardless of the original location, the XSLT processor is supposed to compile the stylesheet into an optimized internal representation.

Whether the stylesheet is actually compiled is up to the implementation, but a safe bet is that performance will continually improve over the next several years as these tools stabilize and vendors have time to apply optimizations.

Figure 5-6 illustrates the relationship between Templates and Transformer instances.

Figure 5-6

Figure 5-6. Relationship between Templates and Transformer

Thread safety is an important issue in any Java application, particularly in a web context where many users share the same stylesheet. As Figure 5-6 illustrates, an instance of Templates is thread-safe and represents a single stylesheet. During the transformation process, however, the XSLT processor must maintain state information and output properties specific to the current client. For this reason, a separate Transformer instance must be used for each concurrent transformation.

Transformer is an abstract class in JAXP, and implementations should be lightweight. This is an important goal because you will typically create many copies of Transformer, while the number of Templates is relatively small. Transformer instances are not thread-safe, primarily because they hold state information about the current transformation. Once the transformation is complete, however, these objects can be reused.

5.4.2. A Stylesheet Cache

XSLT transformations commonly occur on a shared web server with a large number of concurrent users, so it makes sense to use Templates whenever possible to optimize performance. Since each instance of Templates is thread-safe, it is desirable to maintain a single copy shared by many clients. This reduces the number of times your stylesheets have to be parsed into memory and compiled, as well as the overall memory footprint of your application.

The code shown in Example 5-10 illustrates a custom XSLT stylesheet cache that automates the mundane tasks associated with creating Templates instances and storing them in memory. This cache has the added benefit of checking the lastModified flag on the underlying file, so it will reload itself whenever the XSLT stylesheet is modified. This is highly useful in a web-application development environment because you can make changes to the stylesheet and simply click on Reload on your web browser to see the results of the latest edits.

Example 5-10. StylesheetCache.java

package com.oreilly.javaxslt.util;

import java.io.*;
import java.util.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;

/**
 * A utility class that caches XSLT stylesheets in memory.
 *
 */
public class StylesheetCache {
    // map xslt file names to MapEntry instances
    // (MapEntry is defined below)
    private static Map cache = new HashMap( );

    /**
     * Flush all cached stylesheets from memory, emptying the cache.
     */
    public static synchronized void flushAll( ) {
        cache.clear( );
    }

    /**
     * Flush a specific cached stylesheet from memory.
     *
     * @param xsltFileName the file name of the stylesheet to remove.
     */
    public static synchronized void flush(String xsltFileName) {
        cache.remove(xsltFileName);
    }

    /**
     * Obtain a new Transformer instance for the specified XSLT file name.
     * A new entry will be added to the cache if this is the first request
     * for the specified file name.
     *
     * @param xsltFileName the file name of an XSLT stylesheet.
     * @return a transformation context for the given stylesheet.
     */
    public static synchronized Transformer newTransformer(String xsltFileName)
            throws TransformerConfigurationException {
        File xsltFile = new File(xsltFileName);

        // determine when the file was last modified on disk
        long xslLastModified = xsltFile.lastModified( );
        MapEntry entry = (MapEntry) cache.get(xsltFileName);

        if (entry != null) {
            // if the file has been modified more recently than the
            // cached stylesheet, remove the entry reference
            if (xslLastModified > entry.lastModified) {
                entry = null;
            }
        }

        // create a new entry in the cache if necessary
        if (entry == null) {
            Source xslSource = new StreamSource(xsltFile);

            TransformerFactory transFact = TransformerFactory.newInstance( );
            Templates templates = transFact.newTemplates(xslSource);

            entry = new MapEntry(xslLastModified, templates);
            cache.put(xsltFileName, entry);
        }

        return entry.templates.newTransformer( );
    }

    // prevent instantiation of this class
    private StylesheetCache( ) {
    }

    /**
     * This class represents a value in the cache Map.
     */
    static class MapEntry {
        long lastModified;  // when the file was modified
        Templates templates;

        MapEntry(long lastModified, Templates templates) {
            this.lastModified = lastModified;
            this.templates = templates;
        }
    }
}

Because this class is a singleton, it has a private constructor and uses only static methods. Furthermore, each method is declared as synchronized in an effort to avoid potential threading problems.

The heart of this class is the cache itself, which is implemented using java.util.Map:

private static Map cache = new HashMap( );

Although HashMap is not thread-safe, the fact that all of our methods are synchronized basically eliminates any concurrency issues. Each entry in the map contains a key/value pair, mapping from an XSLT stylesheet filename to an instance of the MapEntry class. MapEntry is a nested class that keeps track of the compiled stylesheet along with when its file was last modified:

static class MapEntry {
    long lastModified;  // when the file was modified
    Templates templates;

    MapEntry(long lastModified, Templates templates) {
        this.lastModified = lastModified;
        this.templates = templates;
    }
}

Removing entries from the cache is accomplished by one of two methods:

public static synchronized void flushAll( ) {
    cache.clear( );
}

public static synchronized void flush(String xsltFileName) {
    cache.remove(xsltFileName);
}

The first method merely removes everything from the Map, while the second removes a single stylesheet. Whether you use these methods is up to you. The flushAll method, for instance, should probably be called from a servlet's destroy( ) method to ensure proper cleanup. If you have many servlets in a web application, each servlet may wish to flush specific stylesheets it uses via the flush(...) method. If the xsltFileName parameter is not found, the Map implementation silently ignores this request.

The majority of interaction with this class occurs via the newTransformer method, which has the following signature:

public static synchronized Transformer newTransformer(String xsltFileName)
        throws TransformerConfigurationException {

The parameter, an XSLT stylesheet filename, was chosen to facilitate the "last accessed" feature. We use the java.io.File class to determine when the file was last modified, which allows the cache to automatically reload itself as edits are made to the stylesheets. Had we used a system identifier or InputStream instead of a filename, the auto-reload feature could not have been implemented. Next, the File object is created and its lastModified flag is checked:

File xsltFile = new File(xsltFileName);

// determine when the file was last modified on disk
long xslLastModified = xsltFile.lastModified( );

The compiled stylesheet, represented by an instance of MapEntry, is then retrieved from the Map. If the entry is found, its timestamp is compared against the current file's timestamp, thus allowing auto-reload:

MapEntry entry = (MapEntry) cache.get(xsltFileName);

if (entry != null) {
    // if the file has been modified more recently than the
    // cached stylesheet, remove the entry reference
    if (xslLastModified > entry.lastModified) {
        entry = null;
    }
}

Next, we create a new entry in the cache if the entry object reference is still null. This is accomplished by wrapping a StreamSource around the File object, instantiating a TransformerFactory instance, and using that factory to create our Templates object. The Templates object is then stored in the cache so it can be reused by the next client of the cache:

// create a new entry in the cache if necessary
if (entry == null) {
    Source xslSource = new StreamSource(xsltFile);

    TransformerFactory transFact = TransformerFactory.newInstance( );
    Templates templates = transFact.newTemplates(xslSource);

    entry = new MapEntry(xslLastModified, templates);
    cache.put(xsltFileName, entry);
}

Finally, a brand new Transformer is created and returned to the caller:

return entry.templates.newTransformer( );

Returning a new Transformer is critical because, although the Templates object is thread-safe, the Transformer implementation is not. Each caller gets its own copy of Transformer so multiple clients do not collide with one another.

One potential improvement on this design could be to add a lastAccessed timestamp to each MapEntry object. Another thread could then execute every couple of hours to flush map entries from memory if they have not been accessed for a period of time. In most web applications, this will not be an issue, but if you have a large number of pages and some are seldom accessed, this could be a way to reduce the memory usage of the cache.

Another potential modification is to allow javax.xml.transform.Source objects to be passed as a parameter to the newTransformer method instead of as a filename. However, this would make the auto-reload feature impossible to implement for all Source types.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.