Book HomeJava and XSLTSearch this book

3.5. Schema Evolution

Looking beyond HTML generation, a key use for XSLT is transforming one form of XML into another form. In many cases, these are not radical transformations, but minor enhancements such as adding new attributes, changing the order of elements, or removing unused data. If you have only a handful of XML files to transform, it is a lot easier to simply edit the XML directly rather than going through the trouble of writing a stylesheet. But in cases where a large collection of XML documents exist, a single XSLT stylesheet can perform transformations on an entire library of XML files in a single pass. For B2B applications, schema evolution is useful when different customers require the same data, but in different formats.

3.5.1. An Example XML File

Let's suppose that you wrote a logging API for your Java programs. Log files are written in XML and are formatted as shown in Example 3-10.

Example 3-10. Log file before transformation

<?xml version="1.0" encoding="UTF-8"?>
<log>
  <message text="input parameter was null">
    <type>ERROR</type>
    <when>
      <year>2000</year>
      <month>01</month>
      <day>15</day>
      <hour>03</hour>
      <minute>12</minute>
      <second>18</second>
    </when>
    <where>
      <class>com.foobar.util.StringUtil</class>
      <method>reverse(String)</method>
    </where>
  </message>
  <message text="cannot read config file">
    <type>WARNING</type>
    <when>
      <year>2000</year>
      <month>01</month>
      <day>15</day>
      <hour>06</hour>
      <minute>35</minute>
      <second>44</second>
    </when>
    <where>
      <class>com.foobar.servlet.MainServlet</class>
      <method>init( )</method>
    </where>
  </message>
  <!-- more messages ... -->
</log>

As you can see from this example, the file format is quite verbose. Of particular concern is how the date and time are written. Since log files can be quite large, it would be a good idea to select a more concise format for this information. Additionally, the text is stored as an attribute on the <message> element, and the type is stored as a child element. It would make more sense to list the type as an attribute and the message as an element. For example:

<message type="WARNING">
  <text>This is the text of a message.
        Multi-line messages are easier when an
        element is used instead of an attribute.</text>
  ...remainder omitted

3.5.2. The Identity Transformation

Whenever writing a schema evolution stylesheet, it is a good idea to start with an identity transformation . This is a very simple template that simply takes the original XML document and "transforms" it into a new document with the same elements and attributes as the original document. Example 3-11 shows a stylesheet that contains an identity transformation template.

Example 3-11. identityTransformation.xslt

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
  
  <xsl:template match="@*|node( )">
    <xsl:copy>
      <xsl:apply-templates select="@*|node( )"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

Amazingly, it takes only a single template to perform the identity transformation, regardless of the complexity of the XML data. Our stylesheet encodes the result using UTF-8 and indents lines, regardless of the original XML format. In XPath, node( ) is a node test that matches all child nodes of the current context. This is fine, but it omits the attributes of the current context. For this reason, @* must be unioned with node( ) as follows:

<xsl:template match="@*|node( )">

Translated into English, this means that the template will match any attribute or any child node of the current context. Since node( ) includes elements, comments, processing instructions, and even text, this template will match anything that can occur in the XML document.

Inside of our template, we use <xsl:copy> . As you can probably guess, this instructs the XSLT processor to simply copy the current node to the result tree. To continue processing, <xsl:apply-templates> then selects all attributes or children of the current context using the following code:

<xsl:apply-templates select="@*|node( )"/>

3.5.3. Transforming Elements and Attributes

Once you have typed in the identity transformation and tested it, it is time to begin adding additional templates that actually perform the schema evolution. In XSLT, it is possible for two or more templates to match a pattern in the XML data. In these cases, the more specific template is instantiated. Without going into a great deal of technical detail, an explicit match such as <xsl:template match="when"> takes precedence over the identity transformation template, which is essentially a wildcard pattern that matches any attribute or node. To modify specific elements and attributes, simply add more specific templates to the existing identity transformation stylesheet.

In the log file example, a key problem is the quantity of XML data written for each <when> element. Instead of representing the date and time using a series of child elements, it would be much more concise to use the following syntax:

<timestamp time="06:35:44" day="15" month="01" year="2000"/>

The following template will perform the necessary transformation:

<xsl:template match="when">
  <!-- change 'when' into 'timestamp', and change its
       child elements into attributes -->
  <timestamp time="{hour}:{minute}:{second}"
    year="{year}" month="{month}" day="{day}"/>
</xsl:template>

This template can be added to the identity transformation stylesheet and will take precedence whenever a <when> element is encountered. Instead of using <xsl:copy>, this template produces a new <timestamp> element AVTs are then used to specify attributes for this element, effectively converting element values into attribute values. The AVT syntax {hour} is equivalent to selecting the <hour> child of the <when> element. You may notice that XSLT processors do not necessarily preserve the order of attributes. This is not important because the relative ordering of attributes is meaningless in XML, and you cannot force the order of XML attributes.

The next thing to tackle is the <message> element. As mentioned earlier, we would like to convert the text attribute to an element, and the <type> element to an attribute. Just like before, add a new template that matches the <message> element, which will take precedence over the identity transformation. Comments in the code explain what happens at each step.

<!-- locate <message> elements -->
<xsl:template match="message">
  <!-- copy the current node, but not its attributes -->
  <xsl:copy>
    <!-- change the <type> element to an attribute -->
    <xsl:attribute name="type">
      <xsl:value-of select="type"/>
    </xsl:attribute>
    
    <!-- change the text attribute to a child node -->
    <xsl:element name="text">
      <xsl:value-of select="@text"/>
    </xsl:element>
      
    <!-- since the select attribute is not present,
        xsl:apply-templates processes all children 
        of the current node. (not attributes or processing instructions!) -->
    <xsl:apply-templates/>
      
  </xsl:copy>
</xsl:template>

This almost completes the stylesheet. <xsl:copy> simply copies the <message> element to the result tree but does not copy any of its attributes or children. We can explicitly add new attributes using <xsl:attribute> and explicitly create new child elements using <xsl:element>. <xsl:apply-templates> then tells the processor to continue the transformation process for the children of <message>. One problem right now is that the <type> element has been converted into an attribute but has not been removed from the document. The identity transformation still copies the <type> element to the result tree without modification. To fix this, simply add an empty template as follows:

<xsl:template match="type"/>

The complete schema evolution stylesheet simply contains the previous templates. Without duplicating all of the code, here is its overall structure:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
    version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
  
  <!-- the identity transformation -->
  <xsl:template match="@*|node( )">
    ...
  </xsl:template>
  
  <!-- locate <message> elements -->
  <xsl:template match="message">
    ...
  </xsl:template>

  <!-- locate <when> elements -->
  <xsl:template match="when">
    ...
  </xsl:template>
  
  <!-- suppress the <type> element
  <xsl:template match="type"/>
</xsl:stylesheet>

3.5.4. The Result File

Now that the stylesheet is complete, it can be applied to all of the existing XML log files using a simple shell script or batch file. The resulting XML file is shown in Example 3-12.

Example 3-12. Result of the transformation

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="schemaChange.xslt"?>
<log>
  <message type="ERROR">
        <text>input parameter was null</text>
    
    <timestamp time="03:12:18" day="15" month="01" year="2000"/>
    <where>
      <class>com.foobar.util.StringUtil</class>
      <method>reverse(String)</method>
    </where>
  </message>
  <message type="WARNING">
        <text>cannot read config file</text>
    
    <timestamp time="06:35:44" day="15" month="01" year="2000"/>
    <where>
      <class>com.foobar.servlet.MainServlet</class>
      <method>init( )</method>
    </where>
  </message>
  <message type="ERROR">
        <text>negative duration is not allowed</text>
    
    <timestamp time="10:01:49" day="17" month="01" year="2000"/>
    <where>
      <class>com.foobar.util.DateUtil</class>
      <method>getWeek(int)</method>
    </where>
  </message>
</log>


Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.