Detailed Algorithms - Hypertext Transfer Protocol (HTTP)

The HTTP specification provides a detailed, but slightly obscure and often confusing, algorithm for computing document aging and cache freshness. In this section, we'll discuss the HTTP freshness computation algorithms in detail (the "Fresh enough?" diamond in Screenshot 7-12) and explain the motivation behind them.

This section will be most useful to readers working with cache internals. To help illustrate the wording in the HTTP specification, we will make use of Perl pseudocode. If you aren't interested in the gory details of cache expiration formulas, feel free to skip this section.

Age and Freshness Lifetime

To tell whether a cached document is fresh enough to serve, a cache needs to compute only two values: the cached copy's age and the cached copy's freshness lifetime. If the age of a cached copy is less than the freshness lifetime, the copy is fresh enough to serve. In Perl:

$is_fresh_enough = ($age < $freshness_lifetime);

The age of the document is the total time the document has "aged" since it was sent from the server (or was last revalidated by the server). Because a cache might not know if a document response is coming from an upstream cache or a server, it can't assume that the document is brand new. It must determine the document's age, either from an explicit Age header (preferred) or by processing the server-generated Date header.

Remember that the server always has the most up-to-date version of any document.

The freshness lifetime of a document tells how old a cached copy can get before it is no longer fresh enough to serve to clients. The freshness lifetime takes into account the expiration date of the document and any freshness overrides the client might request.

Some clients may be willing to accept slightly stale documents (using the Cache-Control: max-stale header). Other clients may not accept documents that will become stale in the near future (using the Cache-Control: min-fresh header). The cache combines the server expiration information with the client freshness requirements to determine the maximum freshness lifetime.

Age Computation

The age of the response is the total time since the response was issued from the server (or revalidated from the server). The age includes the time the response has floated around in the routers and gateways of the Internet, the time stored in intermediate caches, and the time the response has been resident in your cache. Example 7-1 provides pseudocode for the age calculation.

Example 7-1. HTTP/1.1 age-calculation algorithm calculates the overall age of a cached document

$apparent_age = max(0, $time_got_response - $Date_header_value);

$corrected_apparent_age = max($apparent_age, $Age_header_value);

$response_delay_estimate = ($time_got_response - $time_issued_request);

$age_when_document_arrived_at_our_cache =

 $corrected_apparent_age + $response_delay_estimate;

$how_long_copy_has_been_in_our_cache = $current_time - $time_got_response;

$age = $age_when_document_arrived_at_our_cache +

 $how_long_copy_has_been_in_our_cache;

The particulars of HTTP age calculation are a bit tricky, but the basic concept is simple. Caches can tell how old the response was when it arrived at the cache by examining the Date or Age headers. Caches also can note how long the document has been sitting in the local cache. Summed together, these values are the entire age of the response. HTTP throws in some magic to attempt to compensate for clock skew and network delays, but the basic computation is simple enough:

$age = $age_when_document_arrived_at_our_cache +

 $how_long_copy_has_been_in_our_cache;

A cache can pretty easily determine how long a cached copy has been cached locally (a matter of simple bookkeeping), but it is harder to determine the age of a response when it arrives at the cache, because not all servers have synchronized clocks and because we don't know where the response has been. The complete age-calculation algorithm tries to remedy this.

Apparent age is based on the Date header

If all computers shared the same, exactly correct clock, the age of a cached document would simply be the "apparent age" of the document-the current time minus the time when the server sent the document. The server send time is simply the value of the Date header. The simplest initial age calculation would just use the apparent age:

$apparent_age = $time_got_response - $Date_header_value;

$age_when_document_arrived_at_our_cache = $apparent_age;

Unfortunately, not all clocks are well synchronized. The client and server clocks may differ by many minutes, or even by hours or days when clocks are set improperly.

The HTTP specification recommends that clients, servers, and proxies use a time synchronization protocol such as NTP to enforce a consistent time base.

Web applications, especially caching proxies, have to be prepared to interact with servers with wildly differing clock values. The problem is called clock skew-the difference between two computers' clock settings. Because of clock skew, the apparent age sometimes is inaccurate and occasionally is negative.

If the age is ever negative, we just set it to zero. We also could sanity check that the apparent age isn't ridiculously large, but large apparent ages might actually be correct. We might be talking to a parent cache that has cached the document for a long time (the cache also stores the original Date header):

$apparent_age = max(0, $time_got_response - $Date_header_value);

$age_when_document_arrived_at_our_cache = $apparent_age;

Be aware that the Date header describes the original origin server date. Proxies and caches must not change this date!

Hop-by-hop age calculations

So, we can eliminate negative ages caused by clock skew, but we can't do much about overall loss of accuracy due to clock skew. HTTP/1.1 attempts to work around the lack of universal synchronized clocks by asking each device to accumulate relative aging into an Age header, as a document passes through proxies and caches. This way, no cross-server, end-to-end clock comparisons are needed.

The Age header value increases as the document passes through proxies. HTTP/1.1-aware applications should augment the Age header value by the time the document sat in each application and in network transit. Each intermediate application can easily compute the document's resident time by using its local clock.

However, any non-HTTP/1.1 device in the response chain will not recognize the Age header and will either proxy the header unchanged or remove it. So, until HTTP/1.1 is universally adopted, the Age header will be an underestimate of the relative age.

The relative age values are used in addition to the Date-based age calculation, and the most conservative of the two age estimates is chosen, because either the cross-server Date value or the Age-computed value may be an underestimate (the most conservative is the oldest age). This way, HTTP tolerates errors in Age headers as well, while erring on the side of fresher content:

$apparent_age = max(0, $time_got_response - $Date_header_value);

$corrected_apparent_age = max($apparent_age, $Age_header_value);

$age_when_document_arrived_at_our_cache = $corrected_apparent_age;

Compensating for network delays

Transactions can be slow. This is the major motivation for caching. But for very slow networks, or overloaded servers, the relative age calculation may significantly underestimate the age of documents if the documents spend a long time stuck in network or server traffic jams.

The Date header indicates when the document left the origin server, but it doesn't say how long the document spent in transit on the way to the cache. If the document came through a long chain of proxies and parent caches, the network delay might be significant.

Note that if the document came from a parent cache and not from an origin server, the Date header will reflect the date of the origin server, not of the parent cache.

In practice, this shouldn't be more than a few tens of seconds (or users will abort), but the HTTP designers wanted to try to support accurate expiration of even of short-lifetime objects.

There is no easy way to measure one-way network delay from server to cache, but it is easier to measure the round-trip delay. A cache knows when it requested the document and when it arrived. HTTP/1.1 conservatively corrects for these network delays by adding the entire round-trip delay. This cache-to-server-to-cache delay is an overestimate of the server-to-cache delay, but it is conservative. If it is in error, it will only make the documents appear older than they really are and cause unnecessary revalidations. Here's how the calculation is made:

$apparent_age = max(0, $time_got_response - $Date_header_value);

$corrected_apparent_age = max($apparent_age, $Age_header_value);

$response_delay_estimate = ($time_got_response - $time_issued_request);

$age_when_document_arrived_at_our_cache =

 $corrected_apparent_age + $response_delay_estimate;

Complete Age-Calculation Algorithm

The last section showed how to compute the age of an HTTP-carried document when it arrives at a cache. Once this response is stored in the cache, it ages further. When a request arrives for the document in the cache, we need to know how long the document has been resident in the cache, so we can compute the current document age:

$age = $age_when_document_arrived_at_our_cache +

 $how_long_copy_has_been_in_our_cache;

Ta-da! This gives us the complete HTTP/1.1 age-calculation algorithm we presented in Example 7-1. This is a matter of simple bookkeeping-we know when the document arrived at the cache ($time_got_response) and we know when the current request arrived (right now), so the resident time is just the difference. This is all shown graphically in Screenshot 7-18.

**The age of a cached document includes resident time in the network and cache**
(Screenshot 7-18.)

Freshness Lifetime Computation

Recall that we're trying to figure out whether a cached document is fresh enough to serve to a client. To answer this question, we must determine the age of the cached document and compute the freshness lifetime based on server and client constraints. We just explained how to compute the age; now let's move on to freshness lifetimes.

The freshness lifetime of a document tells how old a document is allowed to get before it is no longer fresh enough to serve to a particular client. The freshness lifetime depends on server and client constraints. The server may have information about the publication change rate of the document. Very stable, filed reports may stay fresh for years. Periodicals may be up-to-date only for the time remaining until the next scheduled publication-next week, or 6:00 am tomorrow.

Clients may have certain other guidelines. They may be willing to accept slightly stale content, if it is faster, or they might need the most up-to-date content possible. Caches serve the users. We must adhere to their requests.

Complete Server-Freshness Algorithm

Example 7-2 shows a Perl algorithm to compute server freshness limits. It returns the maximum age that a document can reach and still be served by the server.

Example 7-2. Server freshness constraint calculation

sub server_freshness_limit

 local($heuristic,$server_freshness_limit,$time_since_last_modify);

 $heuristic = 0;

 if ($Max_Age_value_set)

 $server_freshness_limit = $Max_Age_value;

 elsif ($Expires_value_set)

 $server_freshness_limit = $Expires_value - $Date_value;

 elsif ($Last_Modified_value_set)

 $time_since_last_modify = max(0, $Date_value - $Last_Modified_value);

 $server_freshness_limit = int($time_since_last_modify * $lm_factor);

 $heuristic = 1;

 else

 $server_freshness_limit = $default_cache_min_lifetime;

 $heuristic = 1;

 if ($heuristic)

 if ($server_freshness_limit > $default_cache_max_lifetime)

 { $server_freshness_limit = $default_cache_max_lifetime; }

 if ($server_freshness_limit < $default_cache_min_lifetime)

 { $server_freshness_limit = $default_cache_min_lifetime; }

 return($server_freshness_limit);

Now let's look at how the client can override the document's server-specified age limit. Example 7-3 shows a Perl algorithm to take a server freshness limit and modify it by the client constraints. It returns the maximum age that a document can reach and still be served by the cache without revalidation.

Example 7-3. Client freshness constraint calculation

sub client_modified_freshness_limit

 $age_limit = server_freshness_limit( ); ## From Example 7-2

 if ($Max_Stale_value_set)

 if ($Max_Stale_value == $INT_MAX)

 { $age_limit = $INT_MAX; }

 else

 { $age_limit = server_freshness_limit( ) + $Max_Stale_value; }

 if ($Min_Fresh_value_set)

 $age_limit = min($age_limit, server_freshness_limit( ) - $Min_Fresh_value_set);

 if ($Max_Age_value_set)

 $age_limit = min($age_limit, $Max_Age_value);

The whole process involves two variables: the document's age and its freshness limit. The document is "fresh enough" if the age is less than the freshness limit. The algorithm in Example 7-3 just takes the server freshness limit and slides it around based on additional client constraints. We hope this section made the subtle expiration algorithms described in the HTTP specifications a bit clearer.

Hypertext Transfer Protocol (HTTP)