There's nothing particularly difficult about generating XML. You know about elements with start and end tags, their attributes, and so on. It's just tedious to write an XML output method that remembers to cross all the t's and dot all the i's. Does it put a space between every attribute? Does it close open elements? Does it put that slash at the end of empty elements? You don't want to have to think about these things when you're writing more important code. Others have written modules to take care of these serialization details for you.
David Megginson's XML::Writer is a fine example of an abstract XML generation interface. It comes with a handful of very simple methods for building any XML document. Just create a writer object and call its methods to crank out a stream of XML. Table 3-1 lists some of these methods.
Using these routines, we can build a complete XML document. The program in Example 3-10, for example, creates a basic HTML file.
use IO; my $output = new IO::File(">output.xml"); use XML::Writer; my $writer = new XML::Writer( OUTPUT => $output ); $writer->xmlDecl( 'UTF-8' ); $writer->doctype( 'html' ); $writer->comment( 'My happy little HTML page' ); $writer->pi( 'foo', 'bar' ); $writer->startTag( 'html' ); $writer->startTag( 'body' ); $writer->startTag( 'h1' ); $writer->startTag( 'font', 'color' => 'green' ); $writer->characters( "<Hello World!>" ); $writer->endTag( ); $writer->endTag( ); $writer->dataElement( "p", "Nice to see you." ); $writer->endTag( ); $writer->endTag( ); $writer->end( );
This example outputs the following:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html> <!-- My happy little HTML page --> <?foo bar?> <html><body><h1><font color="green"><Hello World!></font></h1><p>Nice to see you.</p></body></html>
Some nice conveniences are built into this module. For example, it automatically takes care of illegal characters like the ampersand (&) by turning them into the appropriate entity references. Quoting of entity values is automatic, too. At any time during the document-building process, you can check the context you're in with predicate methods like within_element('foo'), which tells you if an element named 'foo' is open.
By default, the module outputs a document with all the tags run together. You might prefer to insert whitespace in some places to make the XML more readable. If you set the option NEWLINES to true, then it will insert newline characters after element tags. If you set DATA_MODE, a similar effect will be achieved, and you can combine DATA_MODE with DATA_INDENT to automatically indent lines in proportion to depth in the document for a nicely formatted document.
The nice thing about XML is that it can be used to organize just about any kind of textual data. With XML::Writer, you can quickly turn a pile of information into a tightly regimented document. For example, you can turn a directory listing into a hierarchical database like the program in Example 3-11.
use XML::Writer; my $wr = new XML::Writer( DATA_MODE => 'true', DATA_INDENT => 2 ); &as_xml( shift @ARGV ); $wr->end; # recursively map directory information into XML # sub as_xml { my $path = shift; return unless( -e $path ); # if this is a directory, create an element and # stuff it full of items if( -d $path ) { $wr->startTag( 'directory', name => $path ); # Load the names of all things in this # directory into an array my @contents = ( ); opendir( DIR, $path ); while( my $item = readdir( DIR )) { next if( $item eq '.' or $item eq '..' ); push( @contents, $item ); } closedir( DIR ); # recurse on items in the directory foreach my $item ( @contents ) { &as_xml( "$path/$item" ); } $wr->endTag( 'directory' ); # We'll lazily call anything that's not a directory a file. } else { $wr->emptyTag( 'file', name => $path ); } }
Here's how the example looks when run on a directory (note the use of DATA_MODE and DATA_INDENT to improve readability):
$ ~/bin/dir /home/eray/xtools/XML-DOM-1.25 <directory name="/home/eray/xtools/XML-DOM-1.25"> <directory name="/home/eray/xtools/XML-DOM-1.25/t"> <file name="/home/eray/xtools/XML-DOM-1.25/t/attr.t" /> <file name="/home/eray/xtools/XML-DOM-1.25/t/minus.t" /> <file name="/home/eray/xtools/XML-DOM-1.25/t/example.t" /> <file name="/home/eray/xtools/XML-DOM-1.25/t/print.t" /> <file name="/home/eray/xtools/XML-DOM-1.25/t/cdata.t" /> <file name="/home/eray/xtools/XML-DOM-1.25/t/astress.t" /> <file name="/home/eray/xtools/XML-DOM-1.25/t/modify.t" /> </directory> <file name="/home/eray/xtools/XML-DOM-1.25/DOM.gif" /> <directory name="/home/eray/xtools/XML-DOM-1.25/samples"> <file name="/home/eray/xtools/XML-DOM-1.25/samples/REC-xml-19980210.xml" /> </directory> <file name="/home/eray/xtools/XML-DOM-1.25/MANIFEST" /> <file name="/home/eray/xtools/XML-DOM-1.25/Makefile.PL" /> <file name="/home/eray/xtools/XML-DOM-1.25/Changes" /> <file name="/home/eray/xtools/XML-DOM-1.25/CheckAncestors.pm" /> <file name="/home/eray/xtools/XML-DOM-1.25/CmpDOM.pm" />
We've seen XML::Writer used step by step and in a recursive context. You could also use it conveniently inside an object tree structure, where each XML object type has its own "to-string" method making the appropriate calls to the writer object. XML::Writer is extremely flexible and useful.
Remember that many parser modules have their own ways to turn their current content into simple, pretty strings of XML. XML::LibXML, for example, lets you call a toString( ) method on the document or any element object within it. Consequently, more specific processor classes that subclass from this module or otherwise make internal use of it often make the same method available in their own APIs and pass end user calls to it to the underlying parser object. Consult the documentation of your favorite processor to see if it supports this or a similar feature.
Finally, sometimes all you really need is Perl's print function. While it lives at a lower level than tools like XML::Writer, ignorant of XML-specific rules and regulations, it gives you a finer degree of control over the process of turning memory structures into text worthy of throwing at filehandles. If you're doing especially tricky work, falling back to print may be a relief, and indeed some of the stunts we pull in Chapter 10, "Coding Strategies" use print. Just don't forget to escape those naughty < and & characters with their respective entity references, as shown in Table 2-1, or be generous with CDATA sections.
Copyright © 2002 O'Reilly & Associates. All rights reserved.