Constraints - XML - Java Programming Language

It's rare that you'll be able to author XML without worrying about anyone else modifying your document, or anyone having to interpret the meaning of the document. The majority of the time, someone (or something) will have to figure out what your tags mean, what data is allowed within those tags, and how your document is structured. This is where constraint models come into play in the XML world. A constraint model defines the structure of your document and, to some degree, the data allowed within that structure. In fact, if you take XML as being a data representation, you really can't divorce a document (often called an instance) from its constraints (the schema). The instance document contains the data, and the schema gives form to that data. You can't have one without the other; at least, not without introducing tremendous room for error. An instance document without a schema must be interpreted by the recipient; and do you really want him deciding what your elements and attributes meant?

There's an argument that essentially goes like this: "Good XML should be structured so that it's self-documenting." That's a good goal, but practically impossible. As a programmer, I often think my code is well documented and easily understood; but I'm assuming a certain level of expertise, and a certain approach to coding. Change just a few bits here and there, and someone else might reasonably interpret my "well-documented" code (or XML) completely differently than I might. Taking the time to write a schema solves this problem much more definitively.

There are three basic models for constraints in use today: DTDs Introduced as part of the XML 1.0 specification, DTDs are the oldest constraint model around in the XML world. They're simply to use, but this simplicity comes at a price: DTDs are inflexible, and offer you little for data type validation as well. XML Schema (XSD) XML Schema is the W3C's anointed successor to DTDs. XML Schemas are literally orders of magnitude more flexible than DTDs, and offer an almost dizzying array of support for various data types. However, just as DTDs were simple and limited, XML Schemas are flexible, complex, and (some would argue) bloated. It takes a lot of work to write a good schema, even for 50- or 100-line XML documents. For this reason, there's been a lot of dissatisfaction with XML Schema, even though they are widely being used. RELAX NG RELAX NG is largely a result of the backlash against the complexity of XML Schema. An alternate schema language, RELAX NG attempts to merge the flexibility of XML Schema with the simplicity of DTDs. While it's not as fully featured as XML Schema, it serves most of the common use cases, making it a great tool for the "everyday" XML developer.

There's some confusion around terminology in constraint models. To clarify, the term schema is used in this chapter to refer to any constraint model, whether it be a DTD, XML Schema, or Relax NG schema. In cases where the XSD specification is specifically referenced, "schema" will be capitalized, and preceded by "XML," as in XML Schema.