Book HomeXML in a NutshellSearch this book

15.2. Developing Data-Oriented XML Formats

Despite the mature status of most of XML's core technologies, XML application development is only now being recognized as a distinct discipline. Many architects and XML developers are attempting to turn existing design methodologies (like UML) and design patterns to the problem of constructing markup languages, but a widely accepted design process for creating XML applications still does not exist.

TIP: The term "XML application" is often used in XML contexts to describe an XML vocabulary for a particular domain rather than the software used to process it. This may seem a little strange to developers used to creating software applications, but it makes sense if you think about integrating a software application with an XML application, for instance.

XML applications can range in scope from a proprietary vocabulary used to store a single computer program's configuration settings to an industry-wide standard for storing consumer loan applications. Although the specifics and sometimes the sequence will vary, the basic steps involved in creating a new XML application are as follows:

The following sections explore each of these steps in greater depth.

15.2.1. Basic Application Requirements

The first step in designing a new XML application is like the first step in many design methodologies. Before the application can be designed, it is important to determine exactly what needs the application will fulfill. Some basic questions must be answered before proceeding.

15.2.1.1. Where and how will new documents be created?

Documents that will be created automatically by a software application or database server will be structured differently than those that need to be created by humans using an XML editor. While software wouldn't have a problem generating 100 elements with intricate attributes and cross-references, a human being probably would.

If you already have an application or a legacy format to which you're adding XML, you may already have data structures you need to map to the XML. Depending on the other requirements for the application, you may be able to base your XML format on the existing structures. If you're starting from scratch or need to share the information with other programs that don't share those structures, you probably need to look at the data itself and build the application creating the XML around the information.

15.2.1.2. How complex will the document be?

Obviously the complexity of the data that will be modeled by the XML document has some impact on how the application will be designed. A document containing a few, simple element types is much easier to describe than one that contains dozens of different elements with complex data type requirements. The complexity of an application will affect what type of validation should be used and how documents will be created and processed.

15.2.1.3. How will documents be consumed?

If the XML documents using this vocabulary will only pass between similar programs, it may make sense to model the XML documents directly on the internal structures of the programs without much concern for how easy or difficult that makes using the document for other programs or for humans. If there's a substantial chance that this information needs to be reused by other applications, read by humans (for debugging purposes or for direct access to information), or will be stored for unknown future use, it probably makes sense to ensure that the document is easy to read and process even if that makes creating the document a slightly more difficult task.

15.2.1.4. How widely will the resulting documents be distributed?

Generally, the audience of a new XML application is known in advance. Some documents are created and read by the same application without ever leaving a single system. Other documents will be used to transmit important business information between the IT systems of different organizations. Some documents are created for publication on the Web to be viewed by hundreds or even thousands of people around the world. XML formats that will be shared widely typically need comprehensive documentation made readily available to potential users. Formal validation models may also be more important for documents that are shared outside of a small community of trusted participants.

15.2.1.5. Will others need to incorporate this document structure into their own applications?

Some XML applications are never intended for use and are only useful when incorporated into other XML applications. Others are useful standards on their own but are also suitable for inclusion in other applications. A few different methods that might be used to incorporate markup from one application into another:

Simple inclusion
Markup from one application is included within a container element of another application. Embedding XHTML content in another document is a common example of this.

Mixed element inclusion
Markup from one application is mixed inline with content from another application. This can complicate validation and makes the including application sensitive to changes in the included application. The Global Document Annotation (GDA) Initiative application provides an example of this type of application (http://www.oasis-open.org/cover/gda.html).

Mixed attribute inclusion
Some XML applications are comprised of attributes that may be attached to elements from the host application. XML Linking (XLink) is a prime example of this type of application, defining only attributes that may be used in other vocabularies.

Answering these questions will provide a basic set of requirements to keep in mind when deciding whether to build a new application, acquire an existing application, or some combination of the two.

15.2.2. Investigating Available Options

Before committing to designing and implementing a new XML application, it is a good idea to take a few minutes to search the Internet for prior art. Since the first version of the XML recommendation was released in 1998, thousands of new XML applications have been developed and released around the world. Although the quality and completeness of these applications vary greatly, it is often more efficient to start with an existing DTD or schema (however imperfect) rather than starting from scratch. In some cases supporting software is already available, potentially saving software development work as well.

15.2.2.1. XML vocabulary development

It is also possible that the work your application needs to do may fit into an existing generic framework, such as XML-RPC or SOAP. If this is the case, you may or may not need to create your own XML vocabulary. XML-RPC only uses its own vocabulary, while different styles of SOAP may reduce the amount of work your vocabulary needs to perform.

There are several XML application registries available on the Internet, and a good "metadirectory" of DTD and schema directories can be found on O'Reilly's XML site, http://www.xml.com. These repositories list applications for various disciplines and topics with varying licensing requirements. The XML Cover Pages, at http://xml.coverpages.org, also provide information about a wide variety of XML-related vocabularies, software, and projects. The search for existing applications may also find potential collaborators, potentially helpful if the XML format is intended for use across multiple organizations.

15.2.3. Planning for Growth

Some applications may not need to evolve over time (a vocabulary describing basic DNA strands, for instance), but some thought should be given as to how users of the application would be able to extend it to meet their own needs. In DTD-based applications, this is done by providing parameter entity "hooks" into the document type definition, which could either be referenced or redefined by an instance document. Take the simple DTD shown in Example 15-1.

Example 15-1. extensible.dtd

<!ENTITY % varContent "(EMPTY)">
<!ELEMENT variable %varContent;>

This fragment is not a very interesting application by itself, but since it provides the capability for extension, the document author can make it more useful by providing an alternative entity declaration for the content of the variable element, as shown in Example 15-2.

Example 15-2. Document extending extensible.dtd

<?xml version="1.0"?>
<!DOCTYPE variable SYSTEM "extensible.dtd"
[
<!ENTITY % varContent "(#PCDATA)">
]>
<variable>Useful content.</variable>

The XML schema language provides more comprehensive and controlled support for extending markup using the extension, include, redefine, and import elements. These two mechanisms can be used in conjunction to create very powerful, customizable application frameworks.

15.2.4. Choosing a Validation Method

The first major implementation decision of designing a new XML application is what type of validation (if any) will be performed on instance documents. In many cases, prototyping a set of instance documents is the best way to determine what level of validation must be performed.

If your application is simply saving some internal program state between invocations (such as window positions or menu configurations within a GUI application), going to the trouble of building a schema and validating documents may not be necessary. Since these configuration documents will always be written and read by the same program, the structure is fixed by the program logic itself. The only conceivable purpose for validating a document like this would be to detect file corruption, which would be likely to generate a well-formedness error in any case.

An example of an application that would require some level of validation is where XML documents are exchanged between different related systems that are not maintained by the same development organization. In this case, a DTD or schema can serve as a definitive blueprint to ensure that all systems are sending and receiving information in the expected formats.

The most rigorous type of validation is required when developing a new XML standard that will be implemented independently by many different vendors without any explicit control or restrictions. For example, the XHTML 1.1 standard is enforced by a very explict and well-documented DTD that is hosted by the W3C. This well-known public DTD allows tool and application vendors to ensure that their systems will interoperate as long as instance documents conform to the standard.

After determining the level of validation for a particular application, it must be decided what validation language will be used. The DTD mechanism of XML 1.0 is still the most widely supported standard, although it lacks the expressive power that is required by sophisticated data-oriented applications. The W3C XML schema recommendation provides very rich type and content model expression, but brings with it a commensurate level of complexity.

Developers can also provide both DTDs and XML schemas, or even combine them with other vocabularies for describing XML structures, notably RELAX NG (http://www.oasis-open.org/committees/relax-ng/ ) and Schematron (http://www.ascc.net/xml/resource/schematron/schematron.html).RDDL, described in Chapter 14, provides a set of tools for supporting and explaining such combinations for formats that use namespaces.

15.2.5. Namespace Support

Virtually every XML application that will be shared with the public should include at least a basic level of namespace support. Even if there are no current plans to release a particular document application to the outside world, it is much simpler to implement namespaces from the ground up than it is to retrofit an existing application with a namespace.

Namespaces affect everything from how the document is validated to how it is transformed (using a stylesheet language such as XSLT). Here are a few namespace issues to consider before selecting a URI and starting work.

15.2.5.1. Will instance documents need to be validated using a DTD?

If so, some planning of how namespace prefixes will be assigned and incorporated into the DTD is necessary. DTDs are not namespace aware, so strategic use of parameter entities can make modification of prefixes much simpler down the road.

15.2.5.2. Will markup from this application need to be embedded in other applications?

If so, some thought needs to be given to potential tag-name collisions. The safest approach is to force every element from your application to be explicitly qualified with a namespace. This can be done within an XML schema by setting the elementFormDefault and attributeFormDefault attributes of the schema element to qualified.

15.2.5.3. Are there legacy documents to support?

If an application will be used to validate existing XML documents, some thought should be given to the effort involved in migrating them. In most cases, simply adding a default namespace declaration will be sufficient. If the new application includes markup from different namespaces, however, some thought must be given to how to update old documents.

15.2.6. Maintaining Compatibility

Maintaining backward compatibility with existing documents is a primary concern for XML applications that are widely used by diverse audiences. The difficulties faced by standards organizations when dealing with the task of updating a popular application (such as HTML) are formidable. While most applications may not become as widespread as HTML, some thought should be given in advance as to how new versions of a schema or DTD will interact with existing documents.

One possible approach to maintaining backward compatibility is to create a new, distinct namespace that will be used to mark new element declarations or perhaps to change the namespace of the entire document to reflect a substantially changed version. Another possible strategy is only to extend existing applications without removing prior functionality. The most important thing is to ensure that each instance document for an application has some readily identifiable marker that associates it with a particular version of a DTD or schema. The good news is that the highly transformable nature of XML makes it very easy to migrate old documents to new document formats.

Removing functionality is possible, but frequently difficult, once a format is widely used. Deprecating functionality--marking it as a likely target for removal a version or several before it is actually removed--is one approach. While deprecated features often linger in implementations long after they've been targeted for removals, they change the expectations of developers building new applications and make it possible, if slow, to remove functionality.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.