RSS is an app of XML that defines a sequenced list of content. RSS calls this list a channel. Within the channel are one or more items. These items are usually located at a URL. The feed also contains metadata about the channel and each item; the feed can specify an image to be used as the logo of the channel, a description of each item, and so on. RSS was originally created by Netscape for use on its My Netscape portal. Users needed to be able to add channels of content to their portals, and Netscape wanted a consistent way to represent those channels. Thus, the first version of RSS was born as Version 0.9 in March of 1999. In this initial specification, the letters RSS stood for RDF Site Summary. Since then, RSS has been used as an acronym for two additional terms:

In addition, some people involved with the development of RSS now claim that it is not an acronym at all.

Blogs and Podcasting

Two popular uses for RSS are blogs and podcasting. A blog is generally a web site containing a series of entries. Although it's not strictly required to have a site be called a blog, the majority of blogs have an RSS feed available and many blogging apps are built on RSS. Podcasting, on the other hand, is very much tied to RSS. Podcasting is the distribution of multimedia content through a syndication feed, generally using the enclosure element within RSS 2.0. A podcatcher (an app that reads podcast feeds) downloads the media files referenced by a podcast feed. Most podcatchers are designed to put downloaded files in a specific location on a user's computer from which they will be copied to a portable audio or video player.


RSS Variants

Nine different specifications have been released under the name RSS. These can be separated into those that are based on the Resource Description Framework (RDF) and those that aren't, as seen in .

Table 12-1. RSS variants

Based on RDF Not based on RDF
RSS 0.9 RSS 0.91 (both Netscape and Userland versions)
RSS 1.0 RSS 0.92 through RSS 0.94
RSS 2.0

This chapter will focus on RSS 1.0 and RSS 2.0, the two current versions. For comparison, Examples and contain excerpts from RSS 1.0 and 2.0 feeds respectively.

Java Tip

The evolution of RSS has been fairly controversial. There are many online resources describing the events that led to a situation where "RSS" refers to two different, incompatible, specifications. I suggest starting with the Wikipedia page: and browsing from there. In addition, a blog entry titled "The myth of RSS compatibility" at is worth a look, especially if you're writing an app that ingests RSS feeds. The author, Mark Pilgrim, has been involved with a number of open source feed validators and parsers.


Example Example RSS 1.0 feed

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns="http://purl.org/rss/1.0/">
 <channel>
 <title>Example RSS 1.0 Feed</title>
 <link>http://www.example.org</link>
 <description>the Example Organization web site</description>
 <image rdf:about="http://www.example.org/images/logo.gif">
 <title>Example</title>
 <url>http://www.example.org/images/logo.gif</url>
 <link>http://www.example.org</link>
 </image>
 <items>
 <rdf:Seq>
 <rdf:li resource="http://www.example.org/item1/"/>
 <rdf:li resource="http://www.example.org/item2/"/>
 </rdf:Seq>
 </items>
 </channel>
 <item rdf:about="http://www.example.org/item1/">
 <title>New Status Updates</title>
 <link>http://www.example.org/item1/</link>
 <description>News about the Example project</description>
 </item>
 <item rdf:about="http://www.example.org/item2/">
 <title>Another New Status Updates</title>
 <link>http://www.example.org/item2/</link>
 <description>More news about the Example project</description>
 </item>
 <textinput rdf:about="http://www.example.org/search/">
 <title>Search example.org</title>
 <description>Search the website www.example.org</description>
 <name>searchterm</name>
 <link>http://www.example.org/search/</link>
 </textinput>
</rdf:RDF>

Example Example RSS 2.0 feed

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
 <channel>
 <title>Example RSS 2.0 Feed</title>
 <link>http://www.example.org</link>
 <description>The Example Organization web site</description>
 <image>
 <title>Example</title>
 <url>http://www.example.org/images/logo.gif</url>
 <link>http://www.example.org</link>
 </image>
 <textInput>
 <title>Search this site:</title>
 <description>Find:</description>
 <name>q</name>
 <link>http://example.com/search</link>
 </textInput>
 <item>
 <title>New Status Updates</title>
 <link>http://www.example.org/item1/</link>
 <guid isPermaLink="true">http://www.example.org/item1/</guid>
 <description>News about the Example project</description>
 </item>
 <item>
 <title>Another New Status Updates</title>
 <link>http://www.example.org/item2/</link>
 <guid isPermaLink="true">http://www.example.org/item2/</guid>
 <description>More news about the Example project</description>
 </item>
 </channel>
</rss>

As you can see from these examples, although the vocabularychannel, item, and so onis the same, these documents have important syntactical differences. Most significantly, the root elements and namespaces are different. In RSS 1.0, the root element is named RDF in the namespace and the RSS 1.0 elements are in the namespace . In RSS 2.0, the root element is named rss; it and all the other RSS 2.0 elements are in no namespace. In RSS 2.0, the description element can contain HTML markup. The HTML elements must be either XML-escaped or within a CDATA block. The descriptions in could be enhanced with:

<description>News about the &lt;b&gt;Example&lt;/b&gt; project</description>

Or with CDATA:

<description>
<![CDATA[<i>More</i> news about the <b>Example</b> project]]>
</description>

In addition to the elements shows in , RSS 2.0 has many optional elements available at both the channel and item levels. lists these additional elements.

Table 12-2. Additional RSS 2.0 elements

channel subelements item subelements
language
author
copyright
category
managingEditor
comments
webmaster
enclosure
pubDate
pubDate
lastBuildDate
source
generator
docs
cloud
ttl
rating
skipHours
skipDays

Some of these will be examined later in this chapter. For full definitions of all of the RSS 2.0 elements, please refer to the specification at .

What's RDF?

I mentioned the Resource Definition Framework (RDF) a few times before. RDF is a set of World Wide Web Consortium (W3C) specifications for expressing various properties to describe a resource. RDF expressions are composed of three values: a subject, a predicate, and an object. If the expression "The shape of the ball is round" were expressed in RDF, the subject would be "the ball," the predicate would be "shape," and the object is "round." contains pretty much all the RDF you'll need to know to work with RSS 1.0 documents: the Seq and li elements defining a list and the about attribute to associate an item with the identifier set in the resource attribute of each li element. For more information on RDF, Practical RDF by Shelly Powers (Oracle) is highly recommended.


RSS Modules

Both RSS 1.0 and 2.0 are extensible through the use of RSS modules. An RSS module is simply a set of elements in a namespace other than the namespace of the host RSS document. RSS modules are widely used in both RSS 1.0 and 2.0 documents. Although some modules are specified for a particular version of RSS, most will work with either. The RSS 1.0 specification defines three modules: Dublin Core, Syndication, and Content.

Dublin Core

The Dublin Core Metadata Initiative (DCMI) is an organization dedicated to creating standardized metadata vocabularies. Dublin Core allows metadata to be expressed using the same terms in a variety of formats. Dublin Core elements are commonly seen in HTML/XHTML, RDF (including RSS 1.0), and XML (including RSS 2.0) documents. More information about Dublin Core can be found at . Dublin Core Simple, the basic set of Dublin Core metadata, contains 15 elements in the namespace :

As you can see, some of these elements can be used to bring some of the extra elements from RSS 2.0 into RSS 1.0: Dublin Core elements date, language, and rights can hold the same data as the RSS 2.0 elements pubDate, language, and .

Java Tip However, the Dublin Core date element is in ISO 8601 format (2006-07-29T09:25:37.421+00:00) whereas RSS 2.0 pubDate and lastBuildDate elements are in RFC 822 format (Sat 29 Jul 2006 09:25:37 GMT).

Syndication

The RSS 1.0 Syndication module in the namespace adds elements describing how often the feed is updated. It defines elements updatePeriod and updateFrequency that let you define pretty much any consistent update schedule. For example, to declare that a feed is updated twice hourly, you could add the following to your feed:

<sy:updatePeriod>hourly</sy:updatePeriod>
<sy:updateFrequency>2</sy:updateFrequency>

This same schedule could be expressed with the RSS 2.0 ttl element:

<ttl>30</ttl>

Content

The RSS 1.0 Content module in the namespace enables the embedding of HTML content as an RSS 1.0 item's description. A formatted version of the description in could be included as:

<content:encoded><![CDATA[<i>More</i> news about the Example project]]>
</content:encoded>

Embedding HTML in RSS 1.0 with the Content module is actually superior to embedding HTML in RSS 2.0, because there's no way to indicate whether the content of an RSS 2.0 description element should be treated as HTML. As a result, there is no way to distinguish between these two descriptions:

This may not seem like a problem, but if you have a feed about HTML markup, it can be important.

CommentAPI

The CommentAPI defines an interface for blogs to accept comments without requiring the user to fill out a form on a web site. Instead, comments can be accepted directly from an RSS aggregator. Comments are posted as RSS 2.0 items. In order to discover the URL to post the comment XML to, the CommentAPI module using the namespace defines a comment element to contain the URL. More information about the CommentAPI module is available at that URL.

iTunes

When Apple Computer's iTunes Music Store added support for podcasting in mid-2005, it introduced an RSS module that added additional elements to RSS 2.0 to support the podcast directory within the iTunes Music Store. This module uses the namespace and is fully documented at . To have your podcast listed within the iTunes podcast directory, it must use this module.

Atom

The Atom Syndication Format was created in an attempt to merge the simplicity of RSS 2.0 (like RSS, Atom doesn't use RDF) with the more structured aspects of RSS 1.0 (for one thing, all Atom elements are within a namespace). The feed in Examples and can be written in Atom as .

Example Example Atom feed

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
 <title>Example Atom Feed</title>
 <subtitle>The Example Organization web site</subtitle>
 <link href="http://web.archive.org/web/www.example.org/"/>
 <id>urn:uuid:68063c50-1f77-11db-a98b-0800200c9a66</id>
 <entry>
 <title>New Status Updates</title>
 <link href="http://web.archive.org/web/www.example.org/item1/"/>
 <id>urn:uuid:68063c51-1f77-11db-a98b-0800200c9a66</id>
 <summary>News about the Example project</summary>
 </entry>
 <entry>
 <title>More New Status Updates</title>
 <link href="http://web.archive.org/web/www.example.org/item2/"/>
 <id>urn:uuid:975ceb20-1f77-11db-a98b-0800200c9a66</id>
 <summary type="html"><![CDATA[<i>More</i> news about the Example project]]></summary>
 </entry>
</feed>

In addition, there is a related Atom Publishing Protocol that defines a standard API for creating and editing entries on a blog. Both Atom specifications are developed by the AtomPub Working Group, part of the Internet Engineering Task Force (IETF). Although Atom is an interesting set of technologies, we will not be looking extensively at Atom here. For more details on Atom, see the Working Group's web site: .