Book HomeSAX2Search this book

2.5. Exception Handling

Exceptions are the primary way that SAX event consumers communicate to event producers; this is the reverse of the typical communication pattern (from producer to consumer). We'll look at SAX exceptions before we delve more deeply into either producers or consumers. We'll look at the several types of exceptions that might be thrown, the error handler interface that lets your code decide how to handle errors, and then how these normally fit together.

Keep this rule of thumb in mind: when a SAX handler throws any exception -- including a java.lang.RuntimeException or a java.lang.Error -- parsing stops immediately. The exception passes through the parser and is thrown by XMLReader.parse(). Beyond some possible additional error reports, the only additional event callback should be ContentHandler.endDocument(). This method is always called before parsing finishes, even after errors, to ensure it can be used for cleaning up. (That callback is presented in Chapter 4, "Consuming SAX2 Events", in Section 4.1.1, "Other ContentHandler Methods ".)

2.5.1. SAX2 Exception Classes

There are four standard exception classes, with a common base class used in the signature for all handler methods. The parse() methods, as well as the EntityResolver class presented in Section 3.4, "The EntityResolver Interface" in Chapter 3, "Producing SAX2 Events", can also throw java.io.IOException to indicate problems unrelated to XML text content. You will find that many XML APIs are declared the same way; for example, JAXP parser methods may throw such exceptions even if they don't expose SAX events directly. See Appendix A, "SAX2 API Summary" for method summaries for these exception classes.

org.xml.sax.SAXException

This is the base exception class. Typically you will see its subclasses. These exceptions have messages and may wrap other exceptions for diagnostic purposes. When an application's event callback catches an exception it's not permitted to throw, it can wrap it in one of these exceptions and then throw that exception. Every SAX2 event callback can throw a SAXException, although most callback examples in this book won't demonstrate this.

org.xml.sax.SAXNotRecognizedException

This exception is thrown when the parser does not understand the URI identifying a feature or property you tried to access. Most processors recognize the standard IDs, so if you're trying to use those and you get this exception, make sure you're using the correct URI.

org.xml.sax.SAXNotSupportedException

These exceptions are typically used to indicate that an XMLReader property or feature value you tried to change was recognized, but the value you requested isn't supported. Reasons this might be reported include setting a property to an illegal value (such as the wrong type of handler) and trying to set a feature or property that is read-only in a given implementation (or when the request is made). For instance, it's not possible to ask a parser to stop validating in mid-parse, but for some parsers it's reasonable to do so before starting to parse a document.

org.xml.sax.SAXParseException

This is the most commonly seen exception class; instances provide detailed diagnostic information, such as the base URI of a file with bad XML content, and the line and column number of such content. XML parsers provide such exceptions when the report sends errors to ErrorHandler implementations.

Applications can also construct this information when reporting application-level errors through SAX callbacks. In fact, they probably should do so, providing a Locator object to the constructor (and perhaps wrapping an exception to identify a root cause) in order to provide good diagnostics. (See Section 4.1.2, "The Locator Interface " in Chapter 4, "Consuming SAX2 Events" for information about Locator objects.)

The "wrapped" exception is a powerful tool. You might be familiar with this mechanism from the new JDK 1.4 "Chained Exception" facility or the older java.lang.reflect.InvocationTargetException exception mechanism. (The JDK 1.4 getCause() method exposes essentially the same functionality as the SAX getException(), though it builds on new JVM features to add intelligence to exception printing.) While parsers may use it internally, you'll likely want to use it to ensure higher-level software will see the root cause of some SAXException your handler reported:

// in some SAX event handler:
try {
    ... application specific stuff ...
} catch (MyApplicationException cause) {
    throw new SAXException ("it broke!", cause);
    // or better yet: throw new SAXParseException 
    //	 ("broke", locator, cause)
}

If you print the stack backtrace of such a SAXException, you'll see two stacks, starting with the root cause. Being able to see that root cause information can be a real lifesaver when debugging. And some application error recovery strategies will use the SAXException.getException() method to find the root cause and then determine how to recover from it. For example, if the application exception identified some resource that was unavailable, higher levels in the application might be able to use that information to choose an alternative resource and restart processing.

2.5.2. ErrorHandler Interface

Normally, you will configure SAX event-processing code to use a specialized implementation of ErrorHandler to process faults that are uncovered during parsing. This is done with the XMLReader.setErrorHandler() call. This interface has three methods; you saw one of them in an earlier example. The interface is used to encapsulate an error-handling strategy. The primary choices you have to make are whether to ignore an error or to abort parsing, and whether to emit diagnostics. Those strategies are driven by the severity of the problem, as exposed by which method is used to report it, though sometimes exception-typing may give programs information about exactly what error was detected.

void error (SAXParseException e)

This method is used to report errors that aren't expected to be fatal. The best-known example is violation of XML validity constraints, but some other XML errors are nonfatal too. Many kinds of application-level errors (as reported by event-consumer logic, not XML parsers) will fall into this category, and most parsers use this callback to report violations of namespace constraints (such as referring to an undeclared namespace prefix).

When validating, applications often adopt a policy of treating these errors as if they were fatal, or generating a diagnostic for every such error. By default, all nonfatal errors are ignored. That default will be a big surprise, if you expect a validating parser to stop parsing when it sees validation errors. You have to override the default error-handling policy if you want such behavior.

void fatalError (SAXParseException e)

This method is used to report errors, typically violations of well formedness, that are fatal. Some XML parsers may be able to continue processing after reporting such errors, but only to report additional errors. The XML specification itself requires that no more data will be reported after a fatal error.

By default, fatal errors cause parsing to stop; the parse() method will return. This method is often used to provide a diagnostic or to log the exception. After it does that, it has two main choices: throw the parameter to terminate processing or return. Most parsers will treat a return as equivalent to throwing the parameter to terminate parsing. Some XML parsers continue checking for errors; in such cases, they aren't allowed to call any handlers other than the ErrorHandler.

void warning (SAXParseException e)

This method is used to report problems that aren't errors. Such situations are specific to the software that reports the warning; unlike fatal and nonfatal errors, the XML specification doesn't place requirements on reporting such situations. XML infrastructure software may generate warnings for any reason at all (much like many pet dogs I have known) and yet be fully compliant with the XML specification.

By default, warnings are ignored. Applications typically ignore them, or print low-priority diagnostics. Because there is such variability in what generates a warning, it is probably not useful to put a "no warnings allowed" policy into software (by treating this like a fatal error); users have to decide on a warning-by-warning basis whether to ignore it or treat it as significant.

Event consumers can also use this API to provide a standard way to report faults uncovered in layers above pure XML, for instance, when data in element content or an attribute value is invalid or corrupt. When both the application and the SAX-related components use the same ErrorHandler instance to handle error-reporting policy issues, maintaining that policy is easier. For example, developers like being able to collect lots of error reports with one test run rather than getting only one error per run; it can be more effective to resolve problems in groups, with shorter test cycles. You can do that with SAX by saving the exceptions (or their associated diagnostics) as they're reported. The same flexibility can be important in production systems.

An ErrorHandler can throw any SAXException it wants; it doesn't have to be the SAXParseException passed as its argument. Don't throw a different exception unless you find a certifiably excellent reason to do so; to discard that original exception just makes problems become harder to troubleshoot. One such reason might be to report a "double fault," in which you triggered another exception while handling the first one. (Operating systems sometimes panic in such cases, so there's no reason applications shouldn't do so too!)

JAXP also uses this handler to report errors when building DOM documents; SAXException objects may be thrown to terminate parsing after a DOM parser finds a problem, if the application chooses to handle those errors. Most DOM implementations in Java use SAX parsers to populate their DOM tree, so this is natural behavior. (JAXP only specifies a SAX-compatible way to present and report such errors. They might be reported from a non-SAX parser.)

2.5.3. Errors and Diagnostics

When you see a SAXException, it'll normally have a message you'll use for diagnostics, like any exception. It'll also have stack backtrace, which will help when you're debugging, like any exception; in some cases you might even see a nested "root cause" exception. At this time, standard methods only tell an error's severity; there's no way to distinguish different validity errors from each other, for example.

You can get better diagnostics when the exception is really a SAXParseException, and give accurate information about exactly where the error appeared. SAX parsers normally provide such data when reporting parsing errors, and applications can do the same thing by avoiding the more generic SAXException. With non-GUI applications, I often use code like that shown in Example 2-6 to present the most important diagnostic data.

Example 2-6. Getting diagnostics from a SAXParseException

static private String printParseException (
    String              label,
    SAXParseException   e
) {
    StringBuffer        buf = new StringBuffer ();
    int                 temp;

    buf.append ("** ");
    buf.append (label);
    buf.append (": ");
    buf.append (e.getMessage ());
    buf.append ('\n');
    // most such exceptions include the (absolute) URI for the text
    if (e.getSystemId () != null) {
        buf.append ("   URI:  ");
        buf.append (e.getSystemId ());
        buf.append ('\n');
    }
    // many include approximate line and column numbers
    if ((temp = e.getLineNumber ()) != -1) {
        buf.append ("   line: ");
        buf.append (temp);
        buf.append ('\n');
    }
    if ((temp = e.getColumnNumber ()) != -1) {
        buf.append ("   char: ");
        buf.append (temp);
        buf.append ('\n');
    }
    // public ID might be available, but is seldom useful

    return buf.toString ();
}

It's natural to call such code in two places. One place is after you've caught an exception of this type, in a "try" block. That's a bit awkward and error prone; you'll need to have two different "catch" clauses, first for SAXParseException and then for SAXException, or else use a cast. The more natural place is centralized in an ErrorHandler that can treat generating diagnostics as one of several options for processing errors, as shown in Example 2-7. In fact, it's the only way to generate diagnostics for nonfatal errors, or for warnings, without treating them as fatal errors; or to centralize your error-handling policy to make it easily configurable.

Example 2-7. Customizable diagnostic error handler

public class MyErrorHandler implements ErrorHandler
{
    int         flags;

    // bit mask values for flags
    public static final int ERR_PRINT = 1;
    public static final int ERR_IGNORE = 2;
    public static final int WARN_PRINT = 4;
    public static final int FATAL_PRINT = 8;
    public static final int FATAL_IGNORE = 16;

    MyErrorHandler () { flags = ~0; }
    MyErrorHandler (int flags) { this.flags = flags; }

    public void error (SAXParseException e)
    throws SAXParseException
    {
	if ((flags & ERR_PRINT) != 0)
	    System.err.print (printParseException ("Error", e));
	if ((flags & ERR_IGNORE) == 0)
	    throw e;
    }

    public void fatalError (SAXParseException e)
    throws SAXParseException
    {
	if ((flags & FATAL_PRINT) != 0)
	    System.err.print (printParseException ("FATAL", e));
	if ((flags & FATAL_IGNORE) == 0)
	    throw e;
    }

    public void warning (SAXParseException e)
    throws SAXParseException
    {
	if ((flags & WARN_PRINT) != 0)
	    System.err.print (printParseException ("Warning", e));
	// always ignored
    }

    // printParseException() method (above) is part of this class
}

Such an error handler gives you flexibility about which errors to report and how to handle the various types that show up. A silent mode of operation might never print diagnostics, a verbose one might print all of them, and a different default could be somewhere in between. A defensive operational mode might terminate XML processing when it sees any error; a permissive one might try to continue after every error. The default shown is verbose and permissive.

To use such an error handler for handling application-specific SAXExceptions, you'll need to adopt the same classifications that SAX derives from XML: fatal errors, nonfatal errors, and warnings. That's usually pretty natural, particularly if application configuration flags control which potential error cases are tested.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.