Book HomeSAX2Search this book

6.2. XML and Messaging

Most technologies that fueled the "Internet Revolution" of the past few years have been around in one form or another for decades; they were just inaccessible to the volumes of people that were able to use them with mass market web browsers. Some of those technologies are now being re-created: they are updated to work better in today's Internet, which is a larger and more varied world than the earlier versions they were born into. In this section we will look at why XML is an important part of the re-creation of messaging technologies and at some of the roles Java plays in this process. We also look at how lightweight SAX2-based infrastructure supports XML messaging over the Web without requiring developers to master new toolkits.

6.2.1. XML/Internet Versus Older Technologies

Many more developers work with web servers than have ever worked with Remote Procedure Call (RPC) or message-queuing technologies. However, the problem is largely unchanged: the core issue is still how to exchange messages reliably and securely with services operated by other organizations. The combination of XML and web-based messaging has several basic technical benefits compared to those earlier technology generations, especially most forms of RPC:

HTTP-based protocols have truly global reach

HTTP is in essence a text-based RPC protocol: clients issue requests to objects identified by web server URIs, and those servers dynamically compute the responses. Because it's text-based, HTTP can be (and is) easily supported by almost all programming languages. Because of HTTPS (HTTP over SSL, a security protocol), HTTP security has been at least as good as any available with commercial RPC services. HTTP/HTTPS is now the most ubiquitous and functional RPC transport in the world.

XML is a more accessible and extensible message-encoding technology

Previous technologies generally focused on binary-oriented technologies, which often rigidly defined the set of possible messages. In practice, most technologies were restricted to particular programming environments because developers needed an API toolkit to generate the correct binary data. XML has a clear win here since essentially every such environment supports text input and output. And unlike other encodings, XML doesn't impose any inherent policy on what such text means, which makes it more flexible. SAX is able to leverage that flexibility because it is data-structure agnostic. Much of the work in XML messaging is to establish and promote particular policies; SAX can support all the important ones.

The Internet biases toward larger, coarse-grained messages

Before the Internet, applications were optimized for private local area networks (LANs) or for low-speed, application-specific wide area networks (WANs). Neither optimization point is a good match for today's typical Internet link (56 kbps modem, or megabit links for some home use and most enterprises). Two key Internet issues are network latency and reliability. Using HTTP with XML provides an opportunity to develop newer systems using a design policy that works with the Internet rather than against it: use bigger messages, less often. This is the antithesis of many RPC systems, which bias toward constant exchange of small messages just like they were local procedure calls.

XML favors loose coupling

RPC-based systems were often developed to assume that clients and servers are in the same organization. Some even assumed only one vendor's product would be used. That is, developers often aimed for a monoculture and tended to characterize diversity as either a commercial threat, an inefficiency, a security problem, or just a support headache. Actually, diversity is a source of strength: human groups that are diverse are more adaptable and more resilient because they have more resources to draw on. Because XML messaging focuses on protocols and message formats, rather than vendor-specific implementations or APIs, it promotes diversity. That reduces inappropriate coupling and makes systems less vulnerable to the problems of any particular implementation.

In short, as the limitations of earlier messaging infrastructures became well known, organizations of all sizes were investing in new, web-based technology. Internet-savvy applications were developed with HTTP technology, and the flexibility of XML as well as its introduction to the web developer community, made it the inevitable choice for the most widely deployed messaging technologies.

While much of the current work is focused on business applications, notably business-to-business integration, that's hardly the only type of application it benefits. There's also interest in peer-to-peer (P2P) protocols built with XML. P2P is usefully viewed as just messaging policies for applications that have finally escaped from the "client or server" straitjacket. Now, essentially anyone can run a server and act as a publisher for information they have produced. These new publishing systems are most naturally built with the same XML and HTTP technologies adopted elsewhere.

Another interesting way to compare these models is that while the RPC model moves computations to where the data lives, the Web model moves the data to where the computation lives. That has been called the "representational state transfer" (REST) model. When code is downloaded, a third model can be said to come into effect. The design of distributed systems needs to balance among all these alternatives and not focus exclusively on any single model.

6.2.2. Roles for Java in XML Messaging

Since Java was the first true "Internet-integrated" programming environment and had XML support very early, it's no surprise that a huge amount of XML messaging work is done in Java. There are a variety of higher-level XML APIs and tools, all of which define particular messaging policies and frameworks. This book may seem somewhat iconoclastic in its perspective on such tools: many of them are overkill. Most applications will be fine without any of the heavier-weight items on the API smorgasbord (for any language!); a lighter meal will often be the healthier solution, even on an expense account budget. There's plenty of scope for innovative applications written without such toolkits, and it's easier to spread them if they don't depend on first deploying lots of complex new infrastructure.

From an interoperability perspective, the most interesting work is language-neutral development of protocols. Some such initiatives hide or limit use of XML, such as XML-RPC. Others, notably BEEP and SOAP, let applications provide their own payloads, although SOAP is usually coupled with synchronous RPC-style messaging and payloads using W3C XML schema and precluding full use of XML, such as DTDs. BEEP is a standards-track peer-to-peer Internet protocol, building on decades of community experience and supporting both synchronous and asynchronous messaging models. http://www.beepcore.org has a wealth of relevant information, including protocol specifications and toolkits in many languages including Java. And as presented in various parts of this book, it's easy to use HTTP/HTTPS directly with SAX; that approach is very lightweight. Many applications can define XML messages and pass them using HTTP without needing additional policies or APIs; it's only a small stretch to use SMTP and email queues if you need asynchronous queuing.

To develop lightweight XML-based applications, get a JDK, an HTTP/HTTPS servlet engine, an XML toolset with SAX2 support, and probably a relational database that you can access through JDBC. That's enough for quite a lot of web services. When you need to get beyond HTTP-centric models, look at protocol frameworks like BEEP, which has long had Java support. Remember to carefully document and review your XML messages and protocols and to keep that documentation current. That is important for maintaining your software, and such good practices will help uncover design bugs early in system life cycles, when they're easy to fix.

6.2.3. XML Messaging over HTTP with SAX2

HTTP is a request/response protocol, loosely called an "RPC transport." Strictly speaking, RPC touches on APIs in some programming language and makes them location transparent, but here we use the term in a broader request/response sense. HTTP has several operations, sent to a particular server port (typically 80 for nonencrypted HTTP) and directed to a particular URI. For the purposes of XML messaging, the most important HTTP operations are GET and POST.

HTTP's GET request asks the server to return the data associated with the request's URI, as modified by various header fields. Other than the request itself, this is a one-way data transfer, from server to client; the data is returned using MIME as a typed envelope. For the purposes of this book, that data is most interesting when it's XML text. Web browsers normally issue GET to retrieve documents, and in Java when you read data from a java.net.URL you are normally issuing a GET. In particular, when a client passes a URI to the SAX XMLReader.parse(uri) call, the call uses GET underneath. It's easy to dynamically generate XML content from Java servlets, as shown in Section 6.1.3, "Building Applications with RSS " earlier in this chapter.

HTTP's POST request is more interesting. POST is very similar in structure to GET, but it provides something that GET doesn't have: the request includes MIME-encoded data. (Again, that's most interesting when it's XML text.) That is, unlike GET, POST is a two-way data transfer; XML can be sent to the server as part of the request, as well as returned in the response. Another key difference is that GET is idempotent: clients are expected to reissue GET calls, which must not change significant server state. If you wanted to transfer money between bank accounts, POST is the call to use since it's expected to execute exactly once.

It's a bit messy to issue XML-in/XML-out POST requests from Java clients. We'll discuss how to do this in relatively pure SAX, but first let's look at this process using a SAX-friendly API library. No matter how you actually transfer this data, the real work of your application will be to turn the SAX events into application work. You'll likely connect code resembling this example to application-specific code that marshals and unmarshals custom data structures needed to do its work.

The gnu.xml.pipeline.CallFilter class packages the entire process as a pipeline stage, sending its input events as a POST request and parsing the POST response to produce output events. That makes it easy to use POST as a generic processing component. For example, in a batch processing scenario you might want to POST an XML file to a server and print its response. Such a server might schedule work as described in the particular document, and it could easily have access to resources or privileges unavailable to your client. This request can be issued programmatically or in some cases using a standard command-line tool.

Example 6-7 shows one way to send an XML file to a server and save its XML response as another file. As mentioned, the NSFilter class can be (and in this case, is) optimized away. It's just making sure the namespace prefix information in the event stream isn't missing anything important.

Example 6-7. Exchanging XML with a server (GNU pipeline version)

import gnu.xml.pipeline.*;
import gnu.xml.util.Resolver;
import org.xml.sax.*;
import org.xml.sax.helpers.XMLReaderFactory;

public class CallFile
{
    // argv [0] == in.xml (filename)
    // argv [1] == url for posting service
    public static void main (String argv [])
    {
	try {
	    EventConsumer	         out;
	    XMLReader		in;

	    out = new TextConsumer (System.out);
	    out = new NSFilter (out);
	    out = new CallFilter (argv [1], out);
	    out = new NSFilter (out);

	    in = XMLReaderFactory.createXMLReader ();
	    EventFilter.bind (in, out);

	    in.parse (Resolver.fileNameToURL (argv [0]));

	} catch (Exception  e) {
	    e.printStackTrace ();
	    System.exit (1);
	}
    }
}

If you want to do the same thing without using that pipeline framework, you have more work to do. You'll be driving the java.net.URLConnection directly, ensuring the text encodings are correct. And you won't have a generic way to group all SAX handlers together; you'd need to create an analogue of gnu.xml.pipeline.EventConsumer or, as shown in Example 6-8, write code that knows the specific output class it's talking to.

Example 6-8. Exchanging XML with a server (SAX-only version)

import java.io.*;
import java.net.*;
import gnu.xml.util.Resolver;
import gnu.xml.util.XMLWriter;
import org.xml.sax.*;
import org.xml.sax.helpers.XMLReaderFactory;

public class CallFile
{
    // argv [0] == in.xml (filename)
    // argv [1] == url for posting service
    public static void main (String argv [])
    {
	try {
	    XMLReader		in;
	    Caller		caller;
	    XMLWriter		out;

	    out = new XMLWriter (System.out);
	    caller = new CallWriter (new URL (argv [1]), out);

	    in = XMLReaderFactory.createXMLReader ();
	    in.setFeature (featurePrefix + "namespace-prefixes", true);
	    bindAll (in, caller);

	    in.parse (Resolver.fileNameToURL (argv [0]));

	} catch (Exception  e) {
	    e.printStackTrace ();
	    System.exit (1);
	}
    }

    private static void bindAll (XMLReader in, Object out)
    throws SAXException
    {
	if (out instanceof ContentHandler)
	    in.setContentHandler ((ContentHandler) out);
	if (out instanceof DTDHandler)
	    in.setDTDHandler ((DTDHandler) out);
	try {
	    if (out instanceof DeclHandler)
		
		in.setProperty 
			("http://xml.org/sax/properties/
			declaration-handler", out);
	} catch (SAXNotRecognizedException e) { /* IGNORE */ }
	try {
	    if (out instanceof LexicalHandler)
		in.setProperty 
			("http://xml.org/sax/properties/
			lexical-handler", out);
	} catch (SAXNotRecognizedException e) { /* IGNORE */ }
    }
    
    // print input to server
    // block till response
    // print output as XML text to stdout
    private static class CallWriter extends XMLWriter
    {
	private URL		target;
	private URLConnection	conn;
	private XMLWriter	next;

	CallWriter (URL url, XMLWriter out)
	{
	    super ((Writer)null);
	    target = url;
	    next = out;
	}

	// Connect to remote object and set up to send it XML text
	public synchronized void startDocument () throws SAXException
	{
	    try {
		conn = target.openConnection ();
		conn.setDoOutput (true);

		// "text/*" expects DOS-style EOL
		next.setEOL ("\r\n");
		conn.setRequestProperty ("Content-Type",
			    "text/xml;charset=UTF-8");

		setWriter (new OutputStreamWriter (
			conn.getOutputStream (),
			"UTF8"), "UTF-8");

	    } catch (IOException e) {
		fatal ("can't write (POST) to URI: " + target, e);
	    }
	    super.startDocument ();
	}

	// finish sending request
	// receive the POST response
	public void endDocument () throws SAXException
	{
	    super.endDocument ();

	    try {
		InputSource source = new InputSource 
			(conn.getInputStream ());
		XMLReader   producer = XMLReaderFactory.createXMLReader ();
		String      encoding;

		producer.setFeature (featurePrefix + 
			"namespace-prefixes", true);
		encoding = Resolver.getEncoding (conn.getContentType ());
		if (encoding != null)
		    source.setEncoding (encoding);
		bindAll (producer, next);
		producer.parse (source);

	    } catch (IOException e) {
		fatal ("I/O Exception reading response, " 
			+ e.getMessage (), e);
	    }
	}
    }
}

In that example scenario, you might also be able to just use binary file I/O and trust that the inputs and outputs are actually XML. But in general, inputs won't be sitting in files, and output processing will involve more than creating a new file. Both the CallFilter and CallWriter classes shown here are structured to be reusable.

On the server side, it's also easy to handle POST. In fact, you've seen all you need to know already! We saw how to pull XML data out of the POST request using the XmlReader.parse(InputSource) method in Chapter 3, "Producing SAX2 Events", in Section 3.1.2.2, "Providing entity text". Writing XML data in the response works exactly like it does for a GET, as shown earlier in this chapter in Section 6.1.3, "Building Applications with RSS ". The main XML-specific issue is to handle the character encoding correctly, as shown in both of those examples. (UTF-8 is the safest over-the-wire encoding.) It's safe to use the application/xml MIME content type whenever you pass XML using HTTP, since there are fewer things that can (and will!) go wrong. You should also make sure to use CRLF-style line ends whenever you use a text/* MIME content type. You might want to pay attention to some servlet-specific issues, such as structuring your code to support connection keepalives or (less commonly) on-the-fly compression of response data.

In many cases it's probably good to have servlets' doPost() methods save input to persistent storage, so that some other thread can pick it up as work item, and then just use the response data to acknowledge the request. The client would collect any additional requests later, either when it polled or when the server called back to the client (with another POST). That approach avoids tying up connections for a long time and creates a framework whereby many component failures will be transparently recovered from. Using such an atomic transaction model correctly can let you avoid the need for transactional roll-back mechanisms to recover from common system failure modes.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.