Cookies are the best current way to identify users and allow persistent sessions. They don't suffer many of the problems of the previous techniques, but they often are used in conjunction with those techniques for extra value. Cookies were first developed by Netscape but now are supported by all major browsers.

Because cookies are important, and they define new HTTP headers, we're going to explore them in more detail than we did the previous techniques. The presence of cookies also impacts caching, and most caches and browsers disallow caching of any cookied content. The following sections present more details.

Types of Cookies

You can classify cookies broadly into two types: session cookies and persistent cookies. A session cookie is a temporary cookie that keeps track of settings and preferences as a user navigates a site. A session cookie is deleted when the user exits the browser. Persistent cookies can live longer; they are stored on disk and survive browser exits and computer restarts. Persistent cookies often are used to retain a configuration profile or login name for a site that a user visits periodically.

The only difference between session cookies and persistent cookies is when they expire. As we will see later, a cookie is a session cookie if its Discard parameter is set, or if there is no Expires or Max-Age parameter indicating an extended expiration time.

How Cookies Work

Cookies are like "Hello, My Name Is" stickers stuck onto users by servers. When a user visits a web site, the web site can read all the stickers attached to the user by that server.

The first time the user visits a web site, the web server doesn't know anything about the user (Screenshot 11-3a). The web server expects that this same user will return again, so it wants to "slap" a unique cookie onto the user so it can identify this user in the future. The cookie contains an arbitrary list of name=value information, and it is attached to the user using the Set-Cookie or Set-Cookie2 HTTP response (extension) headers.

Cookies can contain any information, but they often contain just a unique identification number, generated by the server for tracking purposes. For example, in Screenshot 11-3b, the server slaps onto the user a cookie that says id="34294". The server can use this number to look up database information that the server accumulates for its visitors (purchase history, address information, etc.).

However, cookies are not restricted to just ID numbers. Many web servers choose to keep information directly in the cookies. For example:

Cookie: name="Brian Totty"; phone="555-1212"

The browser remembers the cookie contents sent back from the server in Set-Cookie or Set-Cookie2 headers, storing the set of cookies in a browser cookie database (think of it like a suitcase with stickers from various countries on it). When the user returns to the same site in the future (Screenshot 11-3c), the browser will select those cookies slapped onto the user by that server and pass them back in a Cookie request header.

Slapping a cookie onto a user
Slapping a cookie onto a user
(Screenshot 11-3.)

Cookie Jar: Client-Side State

The basic idea of cookies is to let the browser accumulate a set of server-specific information, and provide this information back to the server each time you visit. Because the browser is responsible for storing the cookie information, this system is called client-side state. The official name for the cookie specification is the HTTP State Management Mechanism.

Netscape Navigator cookies

Different browsers store cookies in different ways. Netscape Navigator stores cookies in a single text file called cookies.txt. For example:

# Netscape HTTP Cookie File
# http://www.netscape.com/newsref/std/cookie_spec.html
# This is a generated file! Do not edit.
#
# domain allh path secure expires name value
 
www.fedex.com FALSE / FALSE 1136109676 cc /us/
.bankofamericaonline.com TRUE / FALSE 1009789256 state CA
.cnn.com TRUE / FALSE 1035069235 SelEdition www
secure.eepulse.net FALSE /eePulse FALSE 1007162968 cid %FE%FF%002
www.reformamt.org TRUE /forum FALSE 1033761379 LastVisit 1003520952
www.reformamt.org TRUE /forum FALSE 1033761379 UserName Guest
 ...

Each line of the text file represents a cookie. There are seven tab-separated fields:

domain

The domain of the cookie

allh

Whether all hosts in a domain get the cookie, or only the specific host named

path

The path prefix in the domain associated with the cookie

secure

Whether we should send this cookie only if we have an SSL connection

expiration

The cookie expiration date in seconds since Jan 1, 1970 00:00:00 GMT

name

The name of the cookie variable

value

The value of the cookie variable

Microsoft Internet Explorer cookies

Microsoft Internet Explorer stores cookies in individual text files in the cache directory. You can browse this directory to view the cookies, as shown in Screenshot 11-4. The format of the Internet Explorer cookie files is proprietary, but many of the fields are easily understood. Each cookie is stored one after the other in the file, and each cookie consists of multiple lines.

Internet Explorer cookies are stored in individual text files in the cache directory
Internet Explorer cookies are stored in individual text files in the cache directory
(Screenshot 11-4.)

The first line of each cookie in the file contains the cookie variable name. The next line is the variable value. The third line contains the domain and path. The remaining lines are proprietary data, presumably including dates and other flags.

Different Cookies for Different Sites

A browser can have hundreds or thousands of cookies in its internal cookie jar, but browsers don't send every cookie to every site. In fact, they typically send only two or three cookies to each site. Here's why:

·         Moving all those cookie bytes would dramatically slow performance. Browsers would actually be moving more cookie bytes than real content bytes!

·         Most of these cookies would just be unrecognizable gibberish for most sites, because they contain server-specific name/value pairs.

·         Sending all cookies to all sites would create a potential privacy concern, with sites you don't trust getting information you intended only for another site.

In general, a browser sends to a server only those cookies that the server generated. Cookies generated by joes-hardware.com are sent to joes-hardware.com and not to bobs-books.com or marys-movies.com.

Many web sites contract with third-party vendors to manage advertisements. These advertisements are made to look like they are integral parts of the web site and do push persistent cookies. When the user goes to a different web site serviced by the same advertisement company, the persistent cookie set earlier is sent back again by the browser (because the domains match). A marketing company could use this technique, combined with the Referer header, to potentially build an exhaustive data set of user profiles and browsing habits. Modern browsers allow you to configure privacy settings to restrict third-party cookies.

Cookie Domain attribute

A server generating a cookie can control which sites get to see that cookie by adding a Domain attribute to the Set-Cookie response header. For example, the following HTTP response header tells the browser to send the cookie user="mary17" to any site in the domain .airtravelbargains.com:

Set-cookie: user="mary17"; domain="airtravelbargains.com"

If the user visits www.airtravelbargains.com, specials.airtravelbargains.com, or any site ending in .airtravelbargains.com, the following Cookie header will be issued:

Cookie: user="mary17"

Cookie Path attribute

The cookie specification even lets you associate cookies with portions of web sites. This is done using the Path attribute, which indicates the URL path prefix where each cookie is valid.

For example, one web server might be shared between two organizations, each having separate cookies. The site www.airtravelbargains.com might devote part of its web site to auto rentals-say, http://www.airtravelbargains.com/autos/-using a separate cookie to keep track of a user's preferred car size. A special auto-rental cookie might be generated like this:

Set-cookie: pref=compact; domain="airtravelbargains.com"; path=/autos/

If the user goes to http://www.airtravelbargains.com/specials.html, she will get only this cookie:

Cookie: user="mary17"

But if she goes to http://www.airtravelbargains.com/autos/cheapo/index.html, she will get both of these cookies:

Cookie: user="mary17"
Cookie: pref=compact

So, cookies are pieces of state, slapped onto the client by the servers, maintained by the clients, and sent back to only those sites that are appropriate. Let's look in more detail at the cookie technology and standards.

Cookie Ingredients

There are two different versions of cookie specifications in use: Version 0 cookies (sometimes called "Netscape cookies"), and Version 1 ("RFC 2965") cookies. Version 1 cookies are a less widely used extension of Version 0 cookies.

Neither the Version 0 or Version 1 cookie specification is documented as part of the HTTP/1.1 specification. There are two primary adjunct documents that best describe the use of cookies, summarized in Table 11-2.

Table 11-2. Cookie specifications

Title Description Location
Persistent Client State: HTTP Cookies Original Netscape cookie standard http://home.netscape.com/newsref/std/cookie_spec.html
RFC 2965: HTTP State Management Mechanism October 2000 cookie standard, obsoletes RFC 2109 http://www.ietf.org/rfc/rfc2965.txt

Version 0 (Netscape) Cookies

The initial cookie specification was defined by Netscape. These "Version 0" cookies defined the Set-Cookie response header, the Cookie request header, and the fields available for controlling cookies. Version 0 cookies look like this:

Set-Cookie: name=value [; expires=date] [; path=path] [; domain=domain] [; secure]
 
Cookie: name1=value1 [; name2=value2] ...

Version 0 Set-Cookie header

The Set-Cookie header has a mandatory cookie name and cookie value. It can be followed by optional cookie attributes, separated by semicolons. The Set-Cookie fields are described in Table 11-3.

Table 11-3. Version 0 (Netscape) Set-Cookie attributes

Set-Cookie attribute Description and examples
NAME=VALUE Mandatory. Both NAME and VALUE are sequences of characters, excluding the semicolon, comma, equals sign, and whitespace, unless quoted in double quotes. The web server can create any NAME=VALUE association, which will be sent back to the web server on subsequent visits to the site.
Set-Cookie: customer=Mary
Expires Optional. This attribute specifies a date string that defines the valid lifetime of that cookie. Once the expiration date has been reached, the cookie will no longer be stored or given out. The date is formatted as:
Weekday, DD-Mon-YY HH:MM:SS GMT

The only legal time zone is GMT, and the separators between the elements of the date must be dashes. If Expires is not specified, the cookie will expire when the user's session ends.

Set-Cookie: foo=bar; expires=Wednesday, 09-Nov-99 23:12:40 GMT
Domain Optional. A browser sends the cookie only to server hostnames in the specified domain. This lets servers restrict cookies to only certain domains. A domain of "acme.com" would match hostnames "anvil.acme.com" and "shipping.crate.acme.com", but not "www.cnn.com".

Only hosts within the specified domain can set a cookie for a domain, and domains must have at least two or three periods in them to prevent domains of the form ".com", ".edu", and "va.us". Any domain that falls within the fixed set of special top-level domains listed here requires only two periods. Any other domain requires at least three. The special top-level domains are: .com, .edu, .net, .org, .gov, .mil, .int, .biz, .info, .name, .museum, .coop, .aero, and .pro.

If the domain is not specified, it defaults to the hostname of the server that generated the Set-Cookie response.

Set-Cookie: SHIPPING=FEDEX; domain="joes-hardware.com"
Path Optional. This attribute lets you assign cookies to particular documents on a server. If the Path attribute is a prefix of a URL path, a cookie can be attached. The path "/foo" matches "/foobar" and "/foo/bar.html". The path "/" matches everything in the domain.

If the path is not specified, it is set to the path of the URL that generated the Set-Cookie response.

Set-Cookie: lastorder=00183; path=/orders
Secure Optional. If this attribute is included, a cookie will be sent only if HTTP is using an SSL secure connection.
Set-Cookie: private_id=519; secure

Version 0 Cookie header

When a client sends requests, it includes all the unexpired cookies that match the domain, path, and secure filters to the site. All the cookies are combined into a Cookie header:

Cookie: session-id=002-1145265-8016838; session-id-time=1007884800

Version 1 (RFC 2965) Cookies

An extended version of cookies is defined in RFC 2965 (previously RFC 2109). This Version 1 standard introduces the Set-Cookie2 and Cookie2 headers, but it also interoperates with the Version 0 system.

The RFC 2965 cookie standard is a bit more complicated than the original Netscape standard and is not yet completely supported. The major changes of RFC 2965 cookies are:

·         Associate descriptive text with each cookie to explain its purpose

·         Support forced destruction of cookies on browser exit, regardless of expiration

·         Max-Age aging of cookies in relative seconds, instead of absolute dates

·         Ability to control cookies by the URL port number, not just domain and path

·         The Cookie header carries back the domain, port, and path filters (if any)

·         Version number for interoperability

·         $ prefix in Cookie header to distinguish additional keywords from usernames

The Version 1 cookie syntax is as follows:

set-cookie = "Set-Cookie2:" cookies
cookies = 1#cookie
cookie = NAME "=" VALUE *(";" set-cookie-av)
NAME = attr
VALUE = value
set-cookie-av = "Comment" "=" value
 | "CommentURL" "=" <"> http_URL <">
 | "Discard"
 | "Domain" "=" value
 | "Max-Age" "=" value
 | "Path" "=" value
 | "Port" [ "=" <"> portlist <"> ]
 | "Secure"
 | "Version" "=" 1*DIGIT
portlist = 1#portnum
portnum = 1*DIGIT
 
cookie = "Cookie:" cookie-version 1*((";" | ",") cookie-value)
cookie-value = NAME "=" VALUE [";" path] [";" domain] [";" port]
cookie-version = "$Version" "=" value
NAME = attr
VALUE = value
path = "$Path" "=" value
domain = "$Domain" "=" value
port = "$Port" [ "=" <"> value <"> ]
 
cookie2 = "Cookie2:" cookie-version

Version 1 Set-Cookie2 header

More attributes are available in the Version 1 cookie standard than in the Netscape standard. Table 11-4 provides a quick summary of the attributes. Refer to RFC 2965 for more detailed explanation.

Table 11-4. Version 1 (RFC 2965) Set-Cookie2 attributes

Set-Cookie2 attribute Description and examples
NAME=VALUE Mandatory. The web server can create any NAME=VALUE association, which will be sent back to the web server on subsequent visits to the site. The name must not begin with "$", because that character is reserved.
Version Mandatory. The value of this attribute is an integer, corresponding to the version of the cookie specification. RFC 2965 is Version 1.
 Set-Cookie2: Part="Rocket_Launcher_0001"; Version="1"
Comment Optional. This attribute documents how a server intends to use the cookie. The user can inspect this policy to decide whether to permit a session with this cookie. The value must be in UTF-8 encoding.
CommentURL Optional. This attribute provides a URL pointer to detailed documentation about the purpose and policy for a cookie. The user can inspect this policy to decide whether to permit a session with this cookie.
Discard Optional. If this attribute is present, it instructs the client to discard the cookie when the client program terminates.
Domain Optional. A browser sends the cookie only to server hostnames in the specified domain. This lets servers restrict cookies to only certain domains. A domain of "acme.com" would match hostnames "anvil.acme.com" and "shipping.crate.acme.com", but not "www.cnn.com". The rules for domain matching are basically the same as in Netscape cookies, but there are a few additional rules. Refer to RFC 2965 for details.
Max-Age Optional. The value of this attribute is an integer that sets the lifetime of the cookie in seconds. Clients should calculate the age of the cookie according to the HTTP/1.1 age-calculation rules. When a cookie's age becomes greater than the Max-Age, the client should discard the cookie. A value of zero means the cookie with that name should be discarded immediately.
Path Optional. This attribute lets you assign cookies to particular documents on a server. If the Path attribute is a prefix of a URL path, a cookie can be attached. The path "/foo" would match "/foobar" and "/foo/bar.html". The path "/" matches everything in the domain. If the path is not specified, it is set to the path of the URL that generated the Set-Cookie response.
Port Optional. This attribute can stand alone as a keyword, or it can include a comma-separated list of ports to which a cookie may be applied. If there is a port list, the cookie can be served only to servers whose ports match a port in the list. If the Port keyword is provided in isolation, the cookie can be served only to the port number of the current responding server.
 Set-Cookie2: foo="bar"; Version="1"; Port="80,81,8080"
 Set-Cookie2: foo="bar"; Version="1"; Port
Secure Optional. If this attribute is included, a cookie will be sent only if HTTP is using an SSL secure connection.

Version 1 Cookie header

Version 1 cookies carry back additional information about each delivered cookie, describing the filters each cookie passed. Each matching cookie much include any Domain, Port, or Path attributes from the corresponding Set-Cookie2 headers.

For example, assume the client has received these five Set-Cookie2 responses in the past from the www.joes-hardware.com web site:

Set-Cookie2: ID="29046"; Domain=".joes-hardware.com"
Set-Cookie2: color=blue
Set-Cookie2: support-pref="L2"; Domain="customer-care.joes-hardware.com"
Set-Cookie2: Coupon="hammer027"; Version="1"; Path="/tools"
Set-Cookie2: Coupon="handvac103"; Version="1"; Path="/tools/cordless"

If the client makes another request for path /tools/cordless/specials.html, it will pass along a long Cookie2 header like this:

Cookie: $Version="1";
 ID="29046"; $Domain=".joes-hardware.com";
 color="blue";
 Coupon="hammer027"; $Path="/tools";
 Coupon="handvac103"; $Path="/tools/cordless"

Notice that all the matching cookies are delivered with their Set-Cookie2 filters, and the reserved keywords begin with a dollar sign ($).

Version 1 Cookie2 header and version negotiation

The Cookie2 request header is used to negotiate interoperability between clients and servers that understand different versions of the cookie specification. The Cookie2 header advises the server that the user agent understands new-style cookies and provides the version of the cookie standard supported (it would have made more sense to call it Cookie-Version):

Cookie2: $Version="1"

If the server understands new-style cookies, it recognizes the Cookie2 header and should send Set-Cookie2 (rather than Set-Cookie) response headers. If a client gets both a Set-Cookie and a Set-Cookie2 header for the same cookie, it ignores the old Set-Cookie header.

If a client supports both Version 0 and Version 1 cookies but gets a Version 0 Set-Cookie header from the server, it should send cookies with the Version 0 Cookie header. However, the client also should send Cookie2: $Version="1" to give the server indication that it can upgrade.

Cookies and Session Tracking

Cookies can be used to track users as they make multiple transactions to a web site. E-commerce web sites use session cookies to keep track of users' shopping carts as they browse. Let's take the example of the popular shopping site Amazon.com. When you type http://www.amazon.com into your browser, you start a chain of transactions where the web server attaches identification information through a series of redirects, URL rewrites, and cookie setting.

Screenshot 11-5 shows a transaction sequence captured from an Amazon.com visit:

·         Screenshot 11-5a-Browser requests Amazon.com root page for the first time.

·         Screenshot 11-5b-Server redirects the client to a URL for the e-commerce software.

·         Screenshot 11-5c-Client makes a request to the redirected URL.

·         Screenshot 11-5d-Server slaps two session cookies on the response and redirects the user to another URL, so the client will request again with these cookies attached. This new URL is a fat URL, meaning that some state is embedded into the URL. If the client has cookies disabled, some basic identification can still be done as long as the user follows the Amazon.com-generated fat URL links and doesn't leave the site.

·         Screenshot 11-5e-Client requests the new URL, but now passes the two attached cookies.

·         Screenshot 11-5f-Server redirects to the home.html page and attaches two more cookies.

·         Screenshot 11-5g-Client fetches the home.html page and passes all four cookies.

·         Screenshot 11-5h-Server serves back the content.

The Amazon.com web site uses session cookies to track users
The Amazon.com web site uses session cookies to track users
(Screenshot 11-5.)

Cookies and Caching

You have to be careful when caching documents that are involved with cookie transactions. You don't want to assign one user some past user's cookie or, worse, show one user the contents of someone else's personalized document.

The rules for cookies and caching are not well established. Here are some guiding principles for dealing with caches:

Mark documents uncacheable if they are

The document owner knows best if a document is uncacheable. Explicitly mark documents uncacheable if they are-specifically, use Cache-Control: no-cache="Set-Cookie" if the document is cacheable except for the Set-Cookie header. The other, more general practice of using Cache-Control: public for documents that are cacheable promotes bandwidth savings in the Web.

Be cautious about caching Set-Cookie headers

If a response has a Set-Cookie header, you can cache the body (unless told otherwise), but you should be extra cautious about caching the Set-Cookie header. If you send the same Set-Cookie header to multiple users, you may be defeating user targeting.

Some caches delete the Set-Cookie header before storing a response in the cache, but that also can cause problems, because clients served from the cache will no longer get cookies slapped on them that they normally would without the cache. This situation can be improved by forcing the cache to revalidate every request with the origin server and merging any returned Set-Cookie headers with the client response. The origin server can dictate such revalidations by adding this header to the cached copy:

Cache-Control: must-revalidate, max-age=0

More conservative caches may refuse to cache any response that has a Set-Cookie header, even though the content may actually be cacheable. Some caches allow modes when Set-Cookied images are cached, but not text.

Be cautious about requests with Cookie headers

When a request arrives with a Cookie header, it provides a hint that the resulting content might be personalized. Personalized content must be flagged uncacheable, but some servers may erroneously not mark this content as uncacheable.

Conservative caches may choose not to cache any document that comes in response to a request with a Cookie header. And again, some caches allow modes when Cookied images are cached, but not text. The more accepted policy is to cache images with Cookie headers, with the expiration time set to zero, thus forcing a revalidate every time.

Cookies, Security, and Privacy

Cookies themselves are not believed to be a tremendous security risk, because they can be disabled and because much of the tracking can be done through log analysis or other means. In fact, by providing a standardized, scrutinized method for retaining personal information in remote databases and using anonymous cookies as keys, the frequency of communication of sensitive data from client to server can be reduced.

Still, it is good to be cautious when dealing with privacy and user tracking, because there is always potential for abuse. The biggest misuse comes from third-party web sites using persistent cookies to track users. This practice, combined with IP addresses and information from the Referer header, has enabled these marketing companies to build fairly accurate user profiles and browsing patterns.

In spite of all the negative publicity, the conventional wisdom is that the session handling and transactional convenience of cookies outweighs most risks, if you use caution about who you provide personal information to and review sites' privacy policies.

The Computer Incident Advisory Capability (part of the U.S. Department of Energy) wrote an assessment of the overrepresented dangers of cookies in 1998. Here's an excerpt from that report:

CIAC I-034: Internet Cookies
(http://www.ciac.org/ciac/bulletins/i-034.shtml)
 
PROBLEM:
 
Cookies are short pieces of data used by web servers to help identify web users. The 
popular concepts and rumors about what a cookie can do has reached almost mystical 
proportions, frightening users and worrying their managers.
 
VULNERABILITY ASSESSMENT:
 
The vulnerability of systems to damage or snooping by using web browser cookies is
essentially nonexistent. Cookies can only tell a web server if you have been there
before and can pass short bits of information (such as a user number) from the web
server back to itself the next time you visit. Most cookies last only until you quit
your browser and then are destroyed. A second type of cookie known as a persistent 
cookie has an expiration date and is stored on your disk until that date. A 
persistent cookie can be used to track a user's browsing habits by identifying him
whenever he returns to a site. Information about where you come from and what web
pages you visit already exists in a web server's log files and could also be used to
track users browsing habits, cookies just make it easier.

 


Hypertext Transfer Protocol (HTTP)