JaVa
   

XML Schema

Screenshot A.3 shows the XML schema for the Flute employee list XML from Listing A.1. The first thing you notice is that, compared with the DTD in Screenshot A.2, the XML schema is much longer. The reason is twofold: First, XML Schema, being XML, is more verbose. Second, XML Schema defines the business rules for a Flute employee much more comprehensively than the DTD. Although it is long, it is easy to understand when broken down into smaller parts. We will introduce the different parts one by one and, where appropriate, map the schema structure to the elements in the employeeList DTD.

Java Click To expand
Screenshot A.3: XML Schema for the employeeList document

Namespaces

In the schema for the employeeList XML, we used a vocabulary that had particular meaning in the context. The vocabulary used to construct the schema (e.g., element, attribute, restriction, simpleType, complexType, etc.) has specific meanings in an XML Schema document. This vocabulary is defined in a context, or a namespace: http://www.w3.org/2001/XMLSchema. By associating these words with a namespace, we are qualifying the names and, in the process, ensuring no clashes arise between the same words used in different contexts. A Java programmer can think of a namespace as analogous to a package name. The same class name can be used in a piece of Java code, but only if it is qualified by the package name to which it belongs. Before using a qualified name in a document, a namespace must be declared:

xmlns:xsd="http://www.w3.org/2001/XMLSchema"


Read this as "XML namespace qualifier 'xsd' represents namespace 'http://www.w3.org/2001/XMLSchema.'" In declaring this namespace, we are essentially saying that all elements and attributes qualified with "xsd" are defined in the namespace http://www.w3.org/2001/XMLSchema. A name is qualified by using the declared qualifier as a prefix. For example, the word schema is qualified as xsd:schema. This means that the element schema has been defined in the namespace represented by the xsd qualifier. The XML Schema namespace is also called the "schema of schemas," because it defines all schema definition elements and attributes. The employeeList schema also declares a targetNamespace and a default namespace. The default namespace is in effect when elements are referred to without a qualifier. The default namespace is declared without a namespace qualifier (which was xsd in the previous example):

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
 targetNamespace="http://www.flute.com"
 xmlns="http://www.flute.com">
...
<xsd:element ref="employee" minOccurs="1" maxOccurs="1"/>



In Screenshot A.3 the default namespace applies to elements such as employee, employeeList, and dept. In the code above, ref="employee" refers to an employee element declared in the default namespace (http://www.flute.com). The targetNamespace declaration signifies that the vocabulary defined in this schema document (employee, first_name, email, dept, extn, etc.) belong to the http://www.flute.com namespace (Screenshot A.4). The target namespace is the one to which the employeeList schema elements are defined. In the example, we have elected to associate the default namespace with the target namespace elements—that is, to refer to the target namespace elements without a qualifier. The target namespace value is significant because, when an instance document uses the elements declared in the schema document, those element declarations must point to the target namespace value. (Screenshot A.5, in the "Bringing It All Together" section, illustrates this.).

Java Click To expand
Screenshot A.4 Java Click To expand
Screenshot A.5
Java Start Sidebar

A schema need not define elements in a namespace (i.e., it's okay to have a schema with no targetNamespace attribute). Instance documents that use elements defined without a namespace may use the noNamespaceSchema attribute to provide the process with the schema's location:

xsi:noNamespaceSchemaLocation="employee.xsd"


Java End Sidebar

Simple Types

Elements such as first_name and email have simple datatype values (e.g. strings). These types (string, int, etc.), which are prefixed with the xsd qualifier, are called built-in types, because they are defined in the schema of schemas. To declare built-in XML schema simple types in the employeeList XML schema, the schema declaration is straightforward:

<xsd:element name="name" type="type" minOccurs="int" maxOccurs="int"/>


For example:

<xsd:element type="xsd:string"/


Java Start Sidebar

In a schema, elements and attributes are declared, and types are defined.

Java End Sidebar

minOccurs and maxOccurs constraints determine how many times that particular element may be repeated in the document. The default value of minOccurs and maxOccurs is "1". A special value of unbounded is used to indicate that a particular element may repeat any number of times.

Java Start Sidebar

provides mapping between common built-in datatypes and Java types.

Java End Sidebar

Extending Simple Types

The Flute Bank business rules state that all employee_id values must be between 1 and 100,000. An element declaration such as <xsd:element name="employee_id" type="xsd:int"/> enforces the rule only partially: all employee IDs are integer values. To add further constraints on the declaration, XML Schema allows new types to be defined by extending built-in types, using the simpleType element:

<xsd:element name="employee_id">
 <xsd:simpleType>
 <xsd:restriction base="xsd:int">
 <xsd:minInclusive value="1"/>
 <xsd:maxInclusive value="100000"/>
 </xsd:restriction>
 </xsd:simpleType>
</xsd:element>


In the above XML fragment, the employee_id element is declared with a new simple type that is a restriction on the base built-in type int. The restrictions are added to the employee_id element on top of the built-in type restriction of integers and are declared using facets. In this example, the facets added to the int type are minInclusive and maxInclusive. The general syntax for a simpleType is

<xsd:simpleType>
 <xsd:restriction base="simple-type">
 <xsd:facet value="value"/>
 <xsd:facet value="value"/>
 ...
 </xsd:restriction>
 </xsd:simpleType>


When a restriction element contains multiple facets, they are ORed if they are enumeration or pattern facets. All other facets are ANDed. The example below shows how a simple type is created by applying a pattern facet to a string datatype. The pattern expression is a regular expression.

<xsd:element >
 <xsd:simpleType>
 <xsd:restriction base="xsd:string">
 <xsd:pattern value="[0-9]{3}-[0-9]{3}-[0-9]{4}"/>
 </xsd:restriction>
 </xsd:simpleType>
</xsd:element>


Table A.1 shows a few facets that can be used with different built-in types to create new simpleTypes.

Table A.1: Facets

Built-in type

Facet

String, all number types

enumeration

  • Example

 

 <xsd:element name=stateCode
 <xsd:simpleType>
 <xsd:restriction base="xsd:string">
 <xsd:enumeration value="CA"/>
 <xsd:enumeration value="MA"/>
 </xsd:restriction>
 </xsd:simpleType>
 </xsd:element name=employeeType
  • stateCode can have only values "CA" or "MA"

Built-in type

Facet

String, token, normalized String

length, minLength, maxLength pattern

  • Example

 

 <xsd:element name=stateCode
 <xsd:simpleType>
 <xsd:restriction base="xsd:string">
 <xsd:minLength value="2"/>
 <xsd:maxLength value="2"/>
 </xsd:restriction>
 </xsd:simpleType>
 </xsd:element name=employeeType
 stateCode length = 2 characters
 <xsd:element >
 <xsd:simpleType>
 <xsd:restriction base="xsd:string">
 <xsd:pattern value="[0-9]{5}"/>
 </xsd:restriction>
 </xsd:simpleType>
 </xsd:element>
  • extn is a sequence of five digits. (Any regular expression can be used.)

Complex Types

A complex data structure is modeled with a complexType element. A type created with complexType maps to a Java bean. A complex type can contain other subelements and can have attributes (simpleTypes can have neither). In the following code, employee is a complex type consisting of several elements and one attribute:

Table A.2: Facets

Built-in type

Facet

Most numeric types

maxInclusive, minInclusive, maxExclusive, minExclusive

  • Example

 

 <xsd:element name="employee_id">
 <xsd:simpleType>
 <xsd:restriction base="xsd:int">
 <xsd:minInclusive value="1"/>
 <xsd:maxInclusive value="100000"/>
 </xsd:restriction>
 </xsd:simpleType>
 </xsd:element>
  • employee_id is an integer between 0 and 100,000.

Built-in type

Facet

Decimal

totalDigits, fractionalDigits

  • Example

 

 <xsd:element >
 <xsd:simpleType>
 <xsd:restriction base="xsd:decimal">
 <xsd:totalDigits value="10"/>
 <xsd:fractionDigits value="2"/>
 </xsd:restriction>
 </xsd:simpleType>
 </xsd:element>
<xsd:element >
 <xsd:complexType>
 <xsd:sequence>
 <xsd:element ref="employee_id" minOccurs="1" maxOccurs="1"/>
 <xsd:element ref="name" minOccurs="1" maxOccurs="1"/>
 <xsd:element ref="extn" minOccurs="1" maxOccurs="1"/>
 <xsd:element ref="dept" minOccurs="1" maxOccurs="1"/>
 <xsd:element ref="email" minOccurs="1" maxOccurs="1"/>
 </xsd:sequence>
 <xsd:attributeGroup ref="employeeAttribute"/>
 </xsd:complexType>
</xsd:element>



This example ensures that an employee instance XML will contain elements for employee ID, name, extension, department, and email. You may note that these datatypes of subelements are not defined in the complex type element. Instead, we have chosen to define the subtypes elsewhere in the document and only refer to those definitions here. For example, <xsd:element ref="employee_id" minOccurs="1" maxOccurs="1"/> uses a reference to the employee_id element, which is of a simpleType defined later in the schema. This is not the only way in which a complex type can be defined. It is also possible to define a simpleType inline:

<xsd:element >
 <xsd:complexType>
 <xsd:sequence>
 <xsd:element name="employee_id" minOccurs="1" maxOccurs="1"/>
 <xsd:simpleType>
 ...
 </ xsd:simpleType>
 <xsd:element>
 <xsd:element ref="name" minOccurs="1" maxOccurs="1"/>
 <xsd:element ref="extn" minOccurs="1" maxOccurs="1"/>
 <xsd:element ref="dept" minOccurs="1" maxOccurs="1"/>
 <xsd:element ref="email" minOccurs="1" maxOccurs="1"/>
 </xsd:sequence>
 <xsd:attributeGroup ref="employeeAttribute"/>
 </xsd:complexType>
</xsd:element>


However, defining a simpleType with a name and then referring to it wherever it is used lends itself to reuse of types.

sequence and all

sequence signifies that the order of elements declared in it is important. If the order of the subelements within a complexType is not important, the all element can be used to convey this:

<xsd:complexType>
 <xsd:all>
 <xsd:element ref="employee_id" minOccurs="1" maxOccurs="1"/>
 <xsd:element ref="name" minOccurs="1" maxOccurs="1"/>
 <xsd:element ref="extn" minOccurs="1" maxOccurs="1"/>
 </xsd:all>
</xsd:complexType>


choice

What if we want to express that a Flute employee has either a manager_id or an employee_id? (This is not good design, but the point is to illustrate how the XML schema can handle choices.) choices can appear in a sequence:

<xsd:element >
 <xsd:complexType>
 <xsd:sequence>
 <xsd:choice>
 <xsd:element ref="employee_id" />
 <xsd:element ref="manager_id" />
 </xsd:choice>
 <xsd:element ref="name" />
 <xsd:element ref="extn" />
 <xsd:element ref="email" />
 <xsd:element ref="dept"/>
 </xsd:sequence>
 </xsd:complexType>
</xsd:element>


In the above fragment, employee must have name, extn, email, and dept and either a manager_id or an employee_id.

attributes

In our Flute Bank example, employees can be described by an employee_type attribute indicating whether they are permanent or contract. In an XML Schema document, an element with one or more attributes can be defined only as a complexType. The example below shows how the attribute declaration is made within the employee complexType. In the example, the attribute is allowed only two values (enumeration). This means that using a datatype of xsd:string is insufficient to define the type of the employee_type attribute. Just as we did for employee_id, we must define a new type for this attribute. Just as we created a new simpleType for employee_id, the employee_type attribute defines a new type by restricting the base xsd:string type with enumeration facets:

<xsd:element >
 <xsd:complexType>
 <xsd:sequence>
 <xsd:element ref="employee_id" minOccurs="1" maxOccurs="1"/>
 <xsd:element ref="name" minOccurs="1" maxOccurs="1"/>
 <xsd:element ref="extn" minOccurs="1" maxOccurs="1"/>
 <xsd:element ref="dept" minOccurs="1" maxOccurs="1"/>
 <xsd:element ref="email" minOccurs="1" maxOccurs="1"/>
 </xsd:sequence>
 <xsd:attribute use="required">
 <xsd:simpleType>
 <xsd:restriction base="xsd:string">
 <xsd:enumeration value="contract"/>
 <xsd:enumeration value="perm"/>
 </xsd:restriction>
 </xsd:simpleType>
 </xsd:attribute>
 </xsd:complexType>


The general syntax for declaring attributes locally is

<xsd:attribute name="name" use="required|optional|prohibited" default/fixed="value">
 <xsd:simpleType>
 <xsd:restriction base="built-in type">
 <xsd:facet value="value"/>
 ...
 </xsd:restriction>
 </xsd:simpleType>
</xsd:attribute>


or

<xsd:attribute name="name" type="built-in type" "use="required|optional|prohibited"
default/fixed="value"/>



An attribute can also be defined globally (i.e., not inline within the complexType element definition) and then referred to within the element. When declaring attributes globally, the use parameter cannot appear within the global declaration; instead, it must be specified in the complexType element where it is referenced. The code below shows a global declaration:

<xsd:element >
 <xsd:complexType>
 <xsd:sequence>
 <xsd:element ref="employee_id" minOccurs="1" maxOccurs="1"/>
 ...
 <xsd:element ref="email" minOccurs="1" maxOccurs="1"/>
 </xsd:sequence>
 <xsd:attribute ref="empType" use="required"/>
 </xsd:complexType>
...
 <xsd:attribute >
 <xsd:simpleType>
 <xsd:restriction base="xsd:string">
 <xsd:enumeration value="contract"/>
 <xsd:enumeration value="perm"/>
 </xsd:restriction>
 </xsd:simpleType>
 </xsd:attribute>


attributeGroup

If an element has several attributes, the attribute declarations can be grouped and a single reference made to the attribute group:

<xsd:element >
 <xsd:complexType>
 <xsd:sequence>
 <xsd:element ref="employee_id" minOccurs="1" maxOccurs="1"/>
 <xsd:element ref="name" minOccurs="1" maxOccurs="1"/>
 <xsd:element ref="extn" minOccurs="1" maxOccurs="1"/>
 <xsd:element ref="dept" minOccurs="1" maxOccurs="1"/>
 <xsd:element ref="email" minOccurs="1" maxOccurs="1"/>
 </xsd:sequence>
 <xsd:attributeGroup ref="employeeAttribute">
 </xsd:complexType>
</xsd:element>
...
<xsd:attributeGroup >
 <xsd:attribute type="xsd:string" use="optional" />
 <xsd:attribute use="required">
 <xsd:simpleType>
 <xsd:restriction base="xsd:string">
 <xsd:enumeration value="contract"/>
 <xsd:enumeration value="perm"/>
 </xsd:restriction>
 </xsd:simpleType>
 </xsd:attribute>
</xsd:attributeGroup>


Comments in XML Schema

XML Schema provides the annotation element to document the schema. An annotation element can contain two elements: the documentation element, meant for human consumption, and the appinfo element, for machine consumption:

<asd:annotation>
<xsd:documentation xml:lang="en">
 The next appinfo element provides a custom instruction to the processor
</xsd:documentation>
<xsd:appinfo>
 <instruction some instruction </instruction>
 </xsd:appinfo>
</xsd:annotation


Java Start Sidebar

XML Schema cannot handle all types of validations. It cannot handle validations that require complex cross-element or -attribute values (e.g., a rule such as, "If employee_type attribute value is 'contract,' then employee_id value must be between 20,000 and 40,000"). For these types of complex validations, the appinfo element can provide instructions to another tool (e.g., an XSLT engine or Schematron) to enforce these complex constraints. An XML schema validator would validate the annotated schema against the instance document, and Schematron would validate the instance document based on the extracted instructions embedded in the appinfo elements.

Java End Sidebar

JaVa
Comments