| Previous | Next
The XML DeclarationXML documents should (but do not have to) begin with an XML declaration. The XML declaration looks like a processing instruction with the name Example 2-7. A very simple XML document with an XML declaration<?xml version="1.0" encoding="ASCII" standalone="yes"?> <person> Alan Turing </person> XML documents do not have to have an XML declaration. However, if an XML document does have an XML declaration, then that declaration must be the first thing in the document. It must not be preceded by any comments, whitespace, processing instructions, and so forth. The reason is that an XML parser uses the first five characters ( encodingSo far we've been a little cavalier about encodings. We've said that XML documents are composed of pure text, but we haven't said what encoding that text uses. Is it ASCII? Latin-1? Unicode? Something else? The short answer to this question is "Yes." The long answer is that by default XML documents are assumed to be encoded in the UTF-8 variable-length encoding of the Unicode character set. This is a strict superset of ASCII, so pure ASCII text files are also UTF-8 documents. However, most XML processors, especially those written in Java, can handle a much broader range of character sets. All you have to do is tell the parser which character encoding the document uses. Preferably this is done through metainformation, stored in the filesystem or provided by the server. However, not all systems provide character-set metadata so XML also allows documents to specify their own character set with an encoding declaration inside the XML declaration. Example 2-8 shows how you'd indicate that a document was written in the ISO-8859-1 (Latin-1) character set that includes letters like ö and ç needed for many non-English Western European languages. Example 2-8. An XML document encoded in Latin-1<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?> <person> Erwin Schrödinger </person> The The different encodings and the proper handling of non-English XML documents will be discussed in greater detail in . standaloneIf the Documents that do not have DTDs, like all the documents in this chapter, can have the value The |