4.2. Whitespace Processing
The handling of special characters (tab, linefeeds, carriage returns
and spaces, which are often used only to "pretty
print" XML documents) has always been very
controversial. W3C XML Schema has imposed a two-step generic
algorithm, which is applied to most of the predefined datatypes
(actually, on all of them except two, xs:string
and
xs:normalizedString).
- Whitespace replacement
-
This is the first step of whitespace
processing applied to the parsed value. During whitespace
replacement, all occurrences of any whitespace--#x9 (tab), #xA
(linefeed), and #xD (carriage return)--are replaced with a
space (#x20). The number of characters is
not changed by this step, which is applied to all the predefined
datatypes (except for xs:string, since no
whitespace replacement is performed on the parsed value for this).
- Whitespace collapse
-
The second step removes the leading and
trailing spaces, and replaces all contiguous occurrences of spaces by
a single space character. This is applied on all the predefined
datatypes (except for xs:string, since no
whitespace replacement is performed on the parsed value for this, and
for xs:normalizedString, in which whitespaces are only normalized).
TIP:
This notion of "normalized string"
does not match the XPath function normalize-space(
), which corresponds with what W3C XML Schema calls
whitespace collapsing. It is also different from the DOM
normalize() method, which is a merge of adjacent
text objects.
| | |
4. Using Predefined Simple Datatypes | | 4.3. String Datatypes |
Copyright © 2002 O'Reilly & Associates. All rights reserved.