Look back at Screenshot 15-8. The client does not initially have a copy of the resource, so it sends a request to the server asking for it. The server responds with Version 1 of the resource. The client can now cache this copy, but for how long?

Once the document has "expired" at the client (i.e., once the client can no longer consider its copy a valid copy), it must request a fresh copy from the server. If the document has not changed at the server, however, the client does not need to receive it again-it can just continue to use its cached copy.

This special request, called a conditional request, requires that the client tell the server which version it currently has, using a validator, and ask for a copy to be sent only if its current copy is no longer valid. Let's look at the three key concepts-freshness, validators, and conditionals-in more detail.

Freshness

Servers are expected to give clients information about how long clients can cache their content and consider it fresh. Servers can provide this information using one of two headers: Expires and Cache-Control.

The Expires header specifies the exact date and time when the document "expires"-when it can no longer be considered fresh. The syntax for the Expires header is:

Expires: Sun Mar 18 23:59:59 GMT 2001

For a client and server to use the Expires header correctly, their clocks must be synchronized. This is not always easy, because neither may run a clock synchronization protocol such as the Network Time Protocol (NTP). A mechanism that defines expiration using relative time is more useful. The Cache-Control header can be used to specify the maximum age for a document in seconds-the total amount of time since the document left the server. Age is not dependent on clock synchronization and therefore is likely to yield more accurate results.

The Cache-Control header actually is very powerful. It can be used by both servers and clients to describe freshness using more directives than just specifying an age or expiration time. Table 15-3 lists some of the directives that can accompany the Cache-Control header.

Table 15-3. Cache-Control header directives

Directive Message type Description
no-cache Request Do not return a cached copy of the document without first revalidating it with the server.
no-store Request Do not return a cached copy of the document. Do not store the response from the server.
max-age Request The document in the cache must not be older than the specified age.
max-stale Request The document may be stale based on the server-specified expiration information, but it must not have been expired for longer than the value in this directive.
min-fresh Request The document's age must not be more than its age plus the specified amount. In other words, the response must be fresh for at least the specified amount of time.
no-transform Request The document must not be transformed before being sent.
only-if-cached Request Send the document only if it is in the cache, without contacting the origin server.
public Response Response may be cached by any cache.
private Response Response may be cached such that it can be accessed only by a single client.
no-cache Response If the directive is accompanied by a list of header fields, the content may be cached and served to clients, but the listed header fields must first be removed. If no header fields are specified, the cached copy must not be served without revalidation with the server.
no-store Response Response must not be cached.
no-transform Response Response must not be modified in any way before being served.
must-revalidate Response Response must be revalidated with the server before being served.
proxy-revalidate Response Shared caches must revalidate the response with the origin server before serving. This directive can be ignored by private caches.
max-age Response Specifies the maximum length of time the document can be cached and still considered fresh.
s-max-age Response Specifies the maximum age of the document as it applies to shared caches (overriding the max-age directive, if one is present). This directive can be ignored by private caches.

Caching and freshness were discussed in more detail in Chapter 7.

Conditionals and Validators

When a cache's copy is requested, and it is no longer fresh, the cache needs to make sure it has a fresh copy. The cache can fetch the current copy from the origin server, but in many cases, the document on the server is still the same as the stale copy in the cache. We saw this in Screenshot 15-8b; the cached copy may have expired, but the server content still is the same as the cache content. If a cache always fetches a server's document, even if it's the same as the expired cache copy, the cache wastes network bandwidth, places unnecessary load on the cache and server, and slows everything down.

To fix this, HTTP provides a way for clients to request a copy only if the resource has changed, using special requests called conditional requests. Conditional requests are normal HTTP request messages, but they are performed only if a particular condition is true. For example, a cache might send the following conditional GET message to a server, asking it to send the file /announce.html only if the file has been modified since June 29, 2002 (the date the cached document was last changed by the author):

GET /announce.html HTTP/1.0 
If-Modified-Since: Sat, 29 Jun 2002, 14:30:00 GMT

Conditional requests are implemented by conditional headers that start with "If-". In the example above, the conditional header is If-Modified-Since. A conditional header allows a method to execute only if the condition is true. If the condition is not true, the server sends an HTTP error code back.

Each conditional works on a particular validator. A validator is a particular attribute of the document instance that is tested. Conceptually, you can think of the validator like the serial number, version number, or last change date of a document. A wise client in Screenshot 15-8b would send a conditional validation request to the server saying, "send me the resource only if it is no longer Version 1; I have Version 1." We discussed conditional cache revalidation in Chapter 7, but we'll study the details of entity validators more carefully in this chapter.

The If-Modified-Since conditional header tests the last-modified date of a document instance, so we say that the last-modified date is the validator. The If-None-Match conditional header tests the ETag value of a document, which is a special keyword or version-identifying tag associated with the entity. Last-Modified and ETag are the two primary validators used by HTTP. Table 15-4 lists four of the HTTP headers used for conditional requests. Next to each conditional header is the type of validator used with the header.

Table 15-4. Conditional request types

Request type Validator Description
If-Modified-Since Last-Modified Send a copy of the resource if the version that was last modified at the time in your previous Last-Modified response header is no longer the latest one.
If-Unmodified-Since Last-Modified Send a copy of the resource only if it is the same as the version that was last modified at the time in your previous Last-Modified response header.
If-Match ETag Send a copy of the resource if its entity tag is the same as that of the one in your previous ETag response header.
If-None-Match ETag Send a copy of the resource if its entity tag is different from that of the one in your previous ETag response header.

HTTP groups validators into two classes: weak validators and strong validators. Weak validators may not always uniquely identify an instance of a resource; strong validators must. An example of a weak validator is the size of the object in bytes. The resource content might change even thought the size remains the same, so a hypothetical byte-count validator only weakly indicates a change. A cryptographic checksum of the contents of the resource (such as MD5), however, is a strong validator; it changes when the document changes.

The last-modified time is considered a weak validator because, although it specifies the time at which the resource was last modified, it specifies that time to an accuracy of at most one second. Because a resource can change multiple times in a second, and because servers can serve thousands of requests per second, the last-modified date might not always reflect changes. The ETag header is considered a strong validator, because the server can place a distinct value in the ETag header every time a value changes. Version numbers and digest checksums are good candidates for the ETag header, but they can contain any arbitrary text. ETag headers are flexible; they take arbitrary text values ("tags"), and can be used to devise a variety of client and server validation strategies.

Clients and servers may sometimes want to adopt a looser version of entity-tag validation. For example, a server may want to make cosmetic changes to a large, popular cached document without triggering a mass transfer when caches revalidate. In this case, the server might advertise a "weak" entity tag by prefixing the tag with "W/". A weak entity tag should change only when the associated entity changes in a semantically significant way. A strong entity tag must change whenever the associated entity value changes in any way.

The following example shows how a client might revalidate with a server using a weak entity tag. The server would return a body only if the content changed in a meaningful way from Version 4.0 of the document:

GET /announce.html HTTP/1.1 
If-None-Match: W/"v4.0"

In summary, when clients access the same resource more than once, they first need to determine whether their current copy still is fresh. If it is not, they must get the latest version from the server. To avoid receiving an identical copy in the event that the resource has not changed, clients can send conditional requests to the server, specifying validators that uniquely identify their current copies. Servers will then send a copy of the resource only if it is different from the client's copy. For more details on cache revalidation, please refer back to Section 7.7.

 


Hypertext Transfer Protocol (HTTP)