XML for the uninitiated
You may have heard of Extensible Markup Language (XML), and you may have heard many reasons why your organization should use it. But what is XML, exactly? This article explains the basics of XML - what it is and how it works.In this article
A brief look at mark up, markup, and tags
A peek at XML in the Microsoft Office System
A brief look at mark up, markup, and tags
To understand XML, it helps to understand the idea of marking up data. People have created documents for centuries, and for just as long they have marked up those documents. For example, school teachers mark up student papers all of the time. They tell students to move paragraphs, clarify sentences, correct misspellings, and so on. Marking up a document is how we define the structure, meaning, and visual appearance of the information in the document. If you have ever used the Track Changes feature in Microsoft Office Word, you have used a computerized form of mark up.
In computing, "mark up" has also evolved into "markup." Markup is the process of using codes called tags (or sometimes tokens) to define the structure, the visual appearance, and - in the case of XML - the meaning of any data.
The HTML code for this article is a good example of computer markup at work. If you browse through it (in Microsoft Internet Explorer, right-click the page, and then click View Source
), you will see a mix of readable text and Hypertext Markup Language (HTML) tags, such as <p>
and <h2>
. Tags in HTML and XML documents are easy to recognize because they are surrounded by angle brackets. In the source code for this article, the HTML tags do a variety of jobs, such as define the beginning and end of each paragraph (<p>
... </p>
) and mark the location of each image.
So what makes it XML?
HTML and XML documents contain data that is surrounded with tags, but that is where the similarities between the two languages end. In HTML, the tags define the look and feel of your data - the headlines go here, the paragraph starts there, and so on. In XML the tags define the structure and meaning of your data - what the data is.
When you describe the structure and meaning of your data, you make it possible to reuse that data in any number of ways. For example, if you have a block of sales data and each item in the block is clearly identified, you can load just the items that you need into a sales report and load other items into an accounting database. Put another way, you can use one system to generate your data and mark it up with XML tags, and then process that data in any number of other systems, regardless of the hardware platform or operating system. That portability is why XML has become one of the most popular technologies for exchanging data.
Remember these facts as you proceed:
- You cannot use HTML in place of XML. You can, however, wrap your XML data in HTML tags and display it in a Web page.
- HTML is limited to a predefined set of tags that all users share.
- XML allows you to create any tag that you need to describe your data and the structure of that data. For instance, say that you need to store and share information about pets. You can create the following XML code:
<?xml version="1.0"?> <CAT> <NAME>Izzy</NAME> <BREED>Siamese</BREED> <AGE>6</AGE> <ALTERED>yes</ALTERED> <DECLAWED>no</DECLAWED> <LICENSE>Izz138bod</LICENSE> <OWNER>Colin Wilcox</OWNER> </CAT>
You can see that XML tags make it possible to know exactly what kind of data that you are looking at. For example, you know this is data about a cat, and you can easily find the cat's name, age, and so on. The ability to create tags that define almost any data structure is what makes XML "extensible."
But don't confuse the tags in that code sample with tags in an HTML file. For instance, if you paste that XML structure into an HTML file and view the file in your browser, the results will look something like this:
Izzy Siamese 6 yes no Izz138bod Colin Wilcox
The browser ignores your XML tags and displays just the data.
A word about well-formed data
You may hear someone from your IT department mention "well-formed" XML. A well-formed XML file conforms to a set of very strict rules that govern XML. If a file doesn't conform to those rules, XML stops working. For example, in the previous code sample, every opening tag has a closing tag, so the sample adheres to one of the rules for being well-formed. If you remove a tag and try to open that file in one of the Office programs, you will see an error message, and the program will stop you from using the file.
You don't necessarily need to know the rules for creating well-formed XML (though they are easy to understand), but you do need to remember that you can share XML data among programs and systems only if that data is well-formed. If you can't open an XML file, chances are that file isn't well-formed.
XML is also platform-independent, meaning that any program built to use XML can read and process your XML data, regardless of the hardware or operating system. For example, with the right XML tags, you can use a desktop program to open and work with data from a mainframe computer. And, regardless of who creates a body of XML data, you can work with the same data in several of the Microsoft Office 2003 and Microsoft Office Professional programs, including Microsoft Office Access, Microsoft Office Word, Microsoft Office InfoPath, and Microsoft Office Excel. Because it is so portable, XML has become one of the most popular technologies for exchanging data between databases and user desktops.
In addition to tagged, well-formed data, XML systems typically use two additional components: schemas and transforms. The following sections explain how these additional components work.
A quick look at schemas
Don't let the term "schema" intimidate you. A schema is just an XML file that contains the rules for what can and cannot reside in an XML data file. Schema files typically use the .xsd file name extension, while XML data files use the .xml extension.
Schemas allow programs to validate data. They provide the framework for structuring data and ensuring that it makes sense to the creator and any other users. For example, if a user enters invalid data, such as text in a date field, the program can prompt the user to enter the correct data. As long as the data in an XML file conforms to the rules in a given schema, any program that supports XML can use that schema to read, interpret, and process the data. For example, as shown in the following illustration, Excel and Word can validate the <CAT>
data against the CAT schema.
Schemas can become complex, and teaching you how to create one is beyond the scope of this article. (Besides, you probably have an IT department that knows how.) However, it helps to know what schemas look like. The following schema defines the rules for the <CAT> ... </CAT> tag set.
<xsd:element name="cat"> <xsd:complexType> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="breed" type="xsd:string"/> <xsd:element name="age" type="xsd:positiveinteger"/> <xsd:element name="altered" type="xsd:boolean"/> <xsd:element name="declawed" type="xsd:boolean"/> <xsd:element name="license" type="xsd:string"/> <xsd:element name="owner" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element>
Don't worry about understanding everything in the sample. Just keep these facts in mind:
- The line items in the sample schema are called declarations. If you needed additional information about an animal, such as its color or markings, chances are that your IT department would add a declaration to the schema. You can change your XML system as your business needs evolve.
- Declarations provide a tremendous amount of control over the data structure. For instance, the
<xsd:sequence>
declaration means that the tags, such as<NAME>
and<BREED>
, have to occur in the order that they are listed above. Declarations can also control the types of data that users can enter. For example, the schema above requires a positive number for the cat's age, and Boolean (TRUE or FALSE) values for the ALTERED and DECLAWED tags. - When the data in an XML file conforms to the rules provided by a schema, that data is said to be valid. The process of checking an XML data file against a schema is called (logically enough) validation. The big advantage to using schemas is that they can help prevent corrupted data. They also make it easy to find corrupted data because XML stops when it encounters a problem.
A quick look at transforms
As we mentioned earlier, XML also provides powerful ways to use or reuse data. The mechanism for reusing data is called an Extensible Stylesheet Language Transformation (XSLT), or simply, a transform. Transforms are where XML can really get interesting. For example, after you validate a data file against a schema, you can apply a transform that makes the data work as a marketing brochure in Microsoft Office Word 2003 and apply another transform to create a sales report in Office Excel.
You (okay, your IT department) can also use transforms to exchange data between back-end systems, such as databases. For instance, say that Database A stores the sales data in a table structure that works well for the sales department. Database B stores the revenue and expense data in a table structure that is tailored for the accounting department. Database B can use a transform to accept data from A and write that data to the correct tables.
The combination of data file, schema, and transform constitutes a basic XML system. The following illustration shows how such systems typically work. The data file is validated against the schema and then rendered in any number of usable ways by a transform. In this case, the transform deploys the data to a table in a Web page.
The following code sample shows one way to write a transform. It loads the <CAT> data into a table on a Web page. Again, the point of the sample isn't to show you how to write a transform, but to show you one form that a transform can take.
<?xml version="1.0"?> <xsl:stylesheet version="1.0"> <TABLE> <TR> <TH>Name</TH> <TH>Breed</TH> <TH>Age</TH> <TH>Altered</TH> <TH>Declawed</TH> <TH>License</TH> <TH>Owner</TH> </TR> <xsl:for-each select="cat"> <TR align="left" valign="top"> <TD> <xsl:value-of select="name"/> </TD> <TD> <xsl:value-of select="breed"/> </TD> <TD> <xsl:value-of select="age"/> </TD> <TD> <xsl:value-of select="altered"/> </TD> <TD> <xsl:value-of select="declawed"/> </TD> <TD> <xsl:value-of select="license"/> </TD> <TD> <xsl:value-of select="owner"/> </TD> </TR> </xsl:for-each> </TABLE>
This sample shows how one type of transform might look when it is coded, but remember that you can just describe what you need from the data in plain English. For example, you can go to your IT department and say that you need to print the sales data for particular regions for the past two years, "and I need it to look this way." Your IT department can then write (or change) a transform to do that job.
What makes all of this even more convenient is that Microsoft and a growing number of other vendors are creating transforms for jobs of all sorts. In the future, chances are that you will be able to download a transform that either meets your needs or that you can adjust to suit your purpose. That means XML will cost less to use over time.
A peek at XML in the Microsoft Office System
The professional editions of Microsoft Office 2003 and Office release provide extensive XML support.
- Office Excel, Office Word, and Office PowerPoint use XML as their default file formats, a change that has several advantages:
- Smaller file sizes. The new format uses ZIP and other compression technologies to reduce file size by as much as 75 percent compared to the binary formats that are used in earlier versions of Office.
- Easier information recovery and greater security. XML is human readable, so if a file becomes damaged, you can open the file in Microsoft Notepad or another text reader and recover at least some of your information. Also, the new files are more secure because they cannot contain Visual Basic for Applications (VBA) code. If you use the new format to create templates, any ActiveX controls and VBA macros reside in a separate, more secure section of the file. In addition, you can use tools, such as Document Inspector, to remove any personal data. For more information about using Document Inspector, see the article Remove hidden data and personal information from Office documents.
- Greater portability and flexibility. Because XML stores data in a text format instead of a proprietary binary format, your customers can define their own schemas and use your data in more ways, all without having to pay royalties. For more information about the new formats, see Introduction to Open XML File Formats.
- Each Office program furnishes a different set of tools. The user interfaces and processes that you follow in Word differ from the user interfaces and processes that you use in Excel or PowerPoint. Why? Because what works for Word doesn't necessarily work for Excel, and so on.
- The Office programs can work with schemas, transforms, and data from other suppliers as long as the XML is well-formed.
- Some of the Office programs use XML in the background, and some, such as Microsoft Office OneNoteā¢, don't support it at all. The best way to learn how an Office program supports XML is to start the online Help for that program and search on XML.
So far so good, but what if you have XML data with no schema? The Office programs that support XML have their own approaches to helping you work with the data. For instance, if you open an XML file in Word without an attached schema, Word displays the tags and data and enables you to apply a transform if, for example, the file's creator or your IT department provides one. At the least, you can read the tags and data in the file.
In contrast, Excel infers a schema if you open an XML file that doesn't already have one. Excel then gives you the option of loading this data into a read-only file or of mapping the data into either an XML list (in Microsoft Office Excel 2003) or an XML table (in Office Excel). You can use the XML lists and tables to sort, filter, or add calculations to the data.
Office Professional and Microsoft Office 2003 provide the same sets of XML tools. In Office Professional, you must first enable XML support, and then you start the tools from different locations. However, after you start the tools, they work the same in Microsoft Office 2003 and Office Professional. The following steps explain how to start the XML tools for Office Excel and Office Word.
Note Microsoft Office Access enables its XML tools by default, so you can skip the first steps if you use Access.
Enable the XML tools in Office Excel and Office Word
- In
Excel
orWord
, click theMicrosoft Office Button
, and then click
Excel Options
orWord Options
, depending on the program that you have open. - Click
Personalize
. - Under Top options for working with application name, select
Show Developer tab in the Ribbon
, and then clickOK
.
Start the XML tools in Office Excel and Office Word
- In either program, on the
Developer
tab, click any available command in theXML
group.
Start the XML tools in Office Access
- Click the
External Data
tab. - Do one of the following:
- In the
Import
group, clickXML File
. - In the
Export
group, clickMore
, and then clickXML File
.
- In the
More information
The links in the following sections take you to information about using XML in various Office programs and about writing XML code.
Using XML in Office release
Note The links in this section will change as the Office team creates and publishes more content.
Introduction to Open XML File Formats
Using XML in Microsoft Office 2003
Note Some of the links in this section go to the Microsoft Office Online Web site, and some go to the Microsoft Developer Network (MSDN).
Online Training | |
General | |
Access |
|
Excel |
|
FrontPage | |
InfoPath |
|
Visio |
|
Word |
|
Writing XML code
- XML Developer's Center (MSDN)
Books about XML
For beginners
For developers and IT specialists