Book HomeSAX2Search this book

3.2. Bootstrapping an XMLReader

There are several ways to obtain an XMLReader. Here we'll look at a few of them, focusing first on the most commonly available ones. These are the "pure SAX" solutions.

It's good policy to reuse parsers, rather than constantly discard and recreate them. Some parsers are more expensive to create than others, so such reuse can improve performance if you parse many documents. Similarly, factory approaches add some fixed costs to achieve vendor neutrality, and those costs can add up. In contexts like servlets, where any number of threads may need to parse XML concurrently, parsers are often pooled so those bootstrapping costs won't increase per-request service times.

3.2.1. The XMLReaderFactory Class

The simplest way to get a parser is to use the default parser for your environment, as we saw earlier:

import org.xml.sax.helpers.XMLReaderFactory;

...

XMLReader       parser = null;

try {
    parser = XMLReaderFactory.createXMLReader ();
    // success!

} catch (SAXException e) {
    System.err.println ("Can't get default parser: " + e.getMessage ());
}
	    

Normally, the default parser is defined by setting the org.xml.sax.driver system property. Application startup should set that property, normally using JVM invocation flags. (In a very few cases System.setProperty() may be appropriate.)

$ java -Dorg.xml.sax.driver=gnu.xml.aelfred2.XMLReader

Unfortunately, in many cases the original reference implementation of that method is used. This is problematic in two situations: when the system property isn't set and when security permissions are set to prevent access to that system property; this is common for many applets. Good SAX2 distributions will ensure that this factory method succeeds in the face of such errors. The current release of the SAX2 helper classes makes this easy to do.[15]

[15]The current version of XMLReaderFactory has more intelligence and supports additional configuration mechanisms. For example, your application or parser distribution can configure a META-INF/services/org.xml.sax.driver resource into its class path, holding a single string to be used if the system property hasn't been set. SAX2 parser distributions are expected to work even if the system property or class path resource hasn't been set.

Because of that problem, you may choose to code your application so parser choice is a configuration option encoded through some other mechanism than system properties. You can't keep it in your application's XML-format configuration file. Once you get that configuration data you'll probably use a different XMLReaderFactory call:

import org.xml.sax.helpers.XMLReaderFactory;

...

XMLReader       parser = null;
String          className = ...;

try {
    parser = XMLReaderFactory.createXMLReader (className);
    // success!

    } catch (SAXException e) {
    System.err.println ("Can't get default parser: " + e.getMessage ());
}
	    

Using this factory call, the class name identifies the SAX parser you want to use. It may well be one of the entries in Table 3-1, though some frameworks bundle other parsers.

Table 3-1. SAX2 XMLReader implementation classes

Parser (and type)

Class name

Ælfred (nonvalidating)

gnu.xml.aelfred2.SAXDriver

Ælfred (optionally validating)

gnu.xml.aelfred2.XmlReader

Crimson (optionally validating)

org.apache.crimson.XmlReaderImpl

Xerces (optionally validating)

org.apache.xerces.parsers.SAXParser

If you're using a parser without a settable option for validation, you may want to let distinct parsers be configured for validating and nonvalidating usage, assuming that your application needs both. Parsers with validation support are significantly larger than ones without it, which is partly whyÆlfred still has a nonvalidating class.

3.2.2. Calling Parser Constructors

If you need to force the use of some particular parser, you can invoke its constructor directly. Every SAX2 XMLReader must have a default constructor in order to work with the XMLReaderFactory class. Since it exists, you can invoke it directly using the same class names you may have passed to the XMLReaderFactory, if you used application-level configuration:

import org.xml.sax.XMLReader;
import gnu.xml.aelfred2.XmlReader;

...

XMLReader       parser = new XmlReader ();
	    

In some cases you may actually prefer to force use of some particular parser. In other cases, you may have no option, maybe because of class loader or security configuration. If you run into trouble with those mechanisms, you may not be able to use factory APIs to access parsers unless they are visible through the system class loader.

In general, avoid such nonportable coding decisions; use a factory API wherever you can.

3.2.3. Using JAXP

Sun's JAXP 1.1 supports yet another way to bootstrap SAX parsers. It's a more complex process, taking several steps instead of just one:

  1. First, get a javax.xml.parsers.SAXParserFactory.

  2. Tell it to return parsers that will do the kind of processing needed by your application.

  3. Ask it to give you a JAXP parser of type javax.xml.parsers.SAXParser.

  4. Finally, ask the JAXP parser to give you the XMLReader that is normally lurking inside of it.

Conceptually this is like the no-parameters XMLReaderFactory.createXMLReader() method, except it's complicated by expecting the factory to return preconfigured parsers.[16] Configuring the parser using the SAX2 flags and properties directly is preferable; the API "surface area" is smaller. Other than having different default namespace-processing modes, the practical difference is primarily availability: many implementations ensure that a JAXP system default is always accessible, but they haven't paid the same attention to providing the default SAX2 parser. (Current versions of the SAX2 classes make that easier, but you might not be using such versions.)

[16]You can also look at this as choosing between parsers. For example, JAXP 1.2 will probably say how to request that schema validation be done. That's most naturally done as a layer on top of SAX, with a parser filter postprocessing the output of some other SAX parser.

The code to use the JAXP bootstrap API to get a SAX2 parser looks like this:

import org.xml.sax.*;
import javax.xml.parsers.*;

XMLReader        parser;

try {
    SAXParserFactory factory;

    factory = SAXParserFactory.newInstance ();
    factory.setNamespaceAware (true);
    parser = factory.newSAXParser ().getXMLReader ();
    // success!

} catch (FactoryConfigurationError err) {
    System.err.println ("can't create JAXP SAXParserFactory, "
	+ err.getMessage ());
} catch (ParserConfigurationException err) {
    System.err.println ("can't create XMLReader with namespaces, "
	+ err.getMessage ());
} catch (SAXException err) {
    System.err.println ("Hmm, SAXException, " + err.getMessage ());
}

Rather than calling newInstance(), you can hardcode the constructor for a particular factory, probably using one of the classes listed in Table 3-2. It's better to keep implementation preferences as configuration issues though, and not hardwire them into source code. For situations where you may have several parsers in your class path (or a tree of class loaders, as found in many recent servlet engines), JAXP offers several methods to configure such preferences. You can associate the factory class name value with the key javax.xml.parsers.SAXParserFactory by using the key to name a system property (which sets the default parser for your JVM instance) or by putting it in the $JAVA_HOME/jre/lib/jaxp.properties property file (which sets the default policy for that JVM implementation). I prefer the jaxp.properties solution; with the other method the default parser is a function of your class path settings and even the names assigned to various JAR files. You can also embed this preference in your application's JAR files as a META-INF/services/... file, but that solution is similarly sensitive to class loader configuration issues.

Table 3-2. JAXP SAXParserFactory implementation classes

JAXP factory

Class name

Ælfred

gnu.xml.aelfred2.JAXPFactory

Crimson

org.apache.crimson.jaxp.SAXParserFactoryImpl

Xerces

org.apache.xerces.jaxp.SAXParserFactoryImpl

If you're using JAXP to bootstrap a SAX2 parser, rather than the SAX2 APIs, the default setting for namespace processing is different: JAXP parsers don't process namespaces by default, while SAX2 parsers do. SAX2 normally removes all xmlns* attributes, reports namespace scope events, and may hide the namespace prefixes actually used by element and attribute names. JAXP does none of that unless you make it; in fact, the default parser mode for some current implementations is the illegal SAX2 mode described in the previous chapter. The example code in this section made the JAXP factory follow SAX2 defaults.

This book encourages you to use SAX2 directly, rather than through the JAXP factory mechanism. Even if JAXP is available, it's more complex to use. Also, the resulting parser is configured differently, so many of the examples in this book would break.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.