Simple Content (XML in a Nutshell, 2nd Edition)

16.6.2. Facets

In schema-speak, a facet is an aspect of a possible value for a simple data type. Depending on the base type, some facets make more sense than others. For example, a numeric data type can be restricted by the minimum and maximum possible values it could contain. But these types of restrictions wouldn't make sense for a boolean value. The following list covers the different facet types that are supported by a schema processor:

length (or minLength and maxLength)
pattern
enumeration
whiteSpace
maxInclusive and maxExclusive
minInclusive and minExclusive
totalDigits
fractionDigits

Facets are applied to simple types using the xs:restriction element. Each facet is expressed as a distinct element within the restriction block, and multiple facets can be combined to further restrict potential values of the simple type.

16.6.2.1. Handling whitespace

The whiteSpace facet controls how the schema processor will deal with any whitespace within the target data. Whitespace normalization takes place before any of the other facets are processed. There are three possible values for the whiteSpace facet:

preserve: Keep all whitespace exactly as it was in the source document (basic XML 1.0 whitespace handling for content within elements).
replace: Replace occurrences of #x9 (tab), #xA (line feed), and #xD (carriage return) characters with #x20 (space) characters.
collapse: Perform the replace step first, then collapse multiple-space characters into a single space.

16.6.2.2. Restricting length

The length-restriction facets are fairly easy to understand. The length facet forces a value to be exactly the length given. The minLength and maxLength facets can be used to set a definite range for the lengths of values of the type given. For example, take the nameComponent type from the schema. What if a name component could not exceed 50 characters (because of a database limitation, for instance)? This rule can be enforced by using the maxLength facet. Incorporating this facet requires a new simple type to reference from within the nameComponent complex type definition:

<xs:complexType name="nameComponent">
  <xs:simpleContent>
    <xs:extension base="addr:nameString"/>
  </xs:simpleContent>
 </xs:complexType>
 
 <xs:simpleType name="nameString">
  <xs:restriction base="xs:string">
    <xs:maxLength value="50"/>
  </xs:restriction>
 </xs:simpleType>

The new nameString simple type is derived from the built-in xs:string type, but can contain no more than 50 characters (the default is unlimited). The same approach can be used with the length and minLength facets.

16.6.2.3. Enumerations

One of the more useful types of restriction is the simple enumeration. In many cases, it is sufficient to restrict possible values for an element or attribute to a member of a predefined list. For example, values of the new locationType simple type defined earlier could be restricted to a list of valid options like so:

<xs:simpleType name="locationType">
  <xs:restriction base="xs:string">
    <xs:enumeration value="work"/>
    <xs:enumeration value="home"/>
    <xs:enumeration value="mobile"/>
  </xs:restriction>      
</xs:simpleType>

Then, if the location attribute in any instance document contained a value not found in the list of enumeration values, the schema processor would generate a validity error.

16.6.2.4. Numeric Facets

Almost half of the of built-in data types defined by the schema specification represent numeric data of one type or another. More might be called numeric since the date/time and duration types are considered to be scalar quantities as well. The following two sections cover all of the numeric facets available, but for a comprehensive list of which of these facets are applicable to which data types, see Chapter 21.

16.6.2.4.1. Minimum and maximum values

Four facets control the minimum and maximum values of items:

minInclusive
minExclusive
maxInclusive
maxExclusive

The primary difference between the inclusive and exclusive flavors of the min and max facets is whether the value given is considered part of the set of allowable values. For example, the following two facet declarations are equivalent:

<xs:maxInclusive value="1"/>
<xs:maxExclusive value="0"/>

The difference between inclusive and exclusive becomes more significant when dealing with decimal or floating point values. For example, if minExclusive were set to 5.0, the equivalent minInclusive value would require an infinite number of nines to the right of the decimal point (4.99999). These facets can also be applied to date and time values.

16.6.2.4.2. Length and precision

There are two facets that control the length and precision of decimal numeric values: totalDigits and fractionDigits . The totalDigits facet determines the total number of digits (only digits are counted, not signs or decimal points) that are allowed in a complete number. fractionDigits determines the number of those digits that must appear to the right of the decimal point in the number.

16.6.2.5. Enforcing format

The xs:pattern facet can place very sophisticated restrictions on the format of string values. The pattern facet compares the value in question against a regular expression, and if the value doesn't conform to the expression, it generates a validation error. For example, this xs:simpleType element declares a social security number simple type using the pattern facet:

<xs:simpleType name="ssn">
  <xs:restriction base="xs:string">
    <xs:pattern value="\d\d\d-\d\d-\d\d\d\d"/>
  </xs:restriction>
 </xs:simpleType>

This new simple type enforces the rule that a social security number consists of three digits, a dash followed by two digits, another dash, and finally four more digits. The actual regular-expression language is very similar to that of the Perl programming language, but it also supports a wide range of Unicode characters. See Chapter 21 for more information on the full pattern-matching language.

16.6.2.6. Lists

XML 1.0 provided a few very simple list types that could be declared as possible attribute values: IDREFS, ENTITIES, and NMTOKENS. Schemas have generalized the concept of lists and provide the ability to declare lists of arbitrary types.

These list types are themselves simple types and may be used in the same places other simple types are used. For example, if the fullName element were to be expanded to accommodate multiple middle names, one approach would be to declare the middle element to contain a list of nameString values:

 <xs:element name="middle" type="addr:nameList" minOccurs="0"/>
. . .
<xs:complexType name="nameList">
  <xs:simpleContent>
    <xs:extension base="addr:nameListType"/>
  </xs:simpleContent>
 </xs:complexType>
 
 <xs:simpleType name="nameListType">
  <xs:list itemType="addr:nameString"/>
 </xs:simpleType>

After this change has been made, the middle element of an instance document can contain an unlimited list of names, each of which can contain up to 50 characters separated by whitespace. The use of xs:complextype here will greatly simplify adding attributes later.

16.6.2.7. Unions

In some cases, it is useful to allow potential values for elements and attributes to have any of several types. The xs:union element allows a type to be declared that can draw from multiple type spaces. For example, it might be useful to allow users to enter their own one-word descriptions into the location attribute of the phone element, as well as to choose from a list. The location attribute declaration could be modified to include a union that incorporated the locationType type and the xs:NMTOKEN types:

<xs:attribute name="location">
  <xs:simpleType>
    <xs:union memberTypes="addr:locationType xs:NMTOKEN"/>
  </xs:simpleType>
</xs:attribute>

Now the location attribute can contain either addr:locationType or xs:NMTOKEN content.