Cache Topologies - Hypertext Transfer Protocol (HTTP)

Caches can be dedicated to a single user or shared between thousands of users. Dedicated caches are called private caches. Private caches are personal caches, containing popular pages for a single user (Screenshot 7-7a). Shared caches are called public caches. Public caches contain the pages popular in the user community (Screenshot 7-7b).

Private Caches

Private caches don't need much horsepower or storage space, so they can be made small and cheap. Web browsers have private caches built right in-most browsers cache popular documents in the disk and memory of your personal computer and allow you to configure the cache size and settings. You also can peek inside the browser caches to see what they contain. For example, with Microsoft Internet Explorer, you can get the cache contents from the Tools Internet Options . . . dialog box. MSIE calls the cached documents "Temporary Files" and lists them in a file display, along with the associated URLs and document expiration times. You can view Netscape Navigator's cache contents through the special URL about:cache, which gives you a "Disk Cache statistics" page showing the cache contents.

Public Proxy Caches

Public caches are special, shared proxy servers called caching proxy servers or, more commonly, proxy caches (proxies were discussed in Chapter 6). Proxy caches serve documents from the local cache or contact the server on the user's behalf. Because a public cache receives accesses from multiple users, it has more opportunity to eliminate redundant traffic.

Because a public cache caches the diverse interests of the user community, it needs to be large enough to hold a set of popular documents, without being swept clean by individual user interests.

In Screenshot 7-8a, each client redundantly accesses a new, "hot" document (not yet in the private cache). Each private cache fetches the same document, crossing the network multiple times. With a shared, public cache, as in Screenshot 7-8b, the cache needs to fetch the popular object only once, and it uses the shared copy to service all requests, reducing network traffic.

**Shared, public caches can decrease network traffic**
(Screenshot 7-8.)

Proxy caches follow the rules for proxies described in Chapter 6. You can configure your browser to use a proxy cache by specifying a manual proxy or by configuring a proxy auto-configuration file (see Section 6.4.1). You also can force HTTP requests through caches without configuring your browser by using intercepting proxies (see Chapter 20).

Proxy Cache Hierarchies

In practice, it often makes sense to deploy hierarchies of caches, where cache misses in smaller caches are funneled to larger parent caches that service the leftover "distilled" traffic. Screenshot 7-9 shows a two-level cache hierarchy. The idea is to use small, inexpensive caches near the clients and progressively larger, more powerful caches up the hierarchy to hold documents shared by many users.

If the clients are browsers with browser caches, Screenshot 7-9 technically shows a three-level cache hierarchy.

Parent caches may need to be larger, to hold the documents popular across more users, and higher-performance, because they receive the aggregate traffic of many children, whose interests may be diverse.

**Accessing documents in a two-level cache hierarchy**
(Screenshot 7-9.)

Hopefully, most users will get cache hits on the nearby, level-1 caches (as shown in Screenshot 7-9a). If not, larger parent caches may be able to handle their requests (Screenshot 7-9b). For deep cache hierarchies it's possible to go through long chains of caches, but each intervening proxy does impose some performance penalty that can become noticeable as the proxy chain becomes long.

In practice, network architects try to limit themselves to two or three proxies in a row. However, a new generation of high-performance proxy servers may make proxy-chain length less of an issue.

Cache Meshes, Content Routing, and Peering

Some network architects build complex cache meshes instead of simple cache hierarchies. Proxy caches in cache meshes talk to each other in more sophisticated ways, and make dynamic cache communication decisions, deciding which parent caches to talk to, or deciding to bypass caches entirely and direct themselves to the origin server. Such proxy caches can be described as content routers, because they make routing decisions about how to access, manage, and deliver content.

Caches designed for content routing within cache meshes may do all of the following (among other things):

· Select between a parent cache or origin server dynamically, based on the URL.

· Select a particular parent cache dynamically, based on the URL.

· Search caches in the local area for a cached copy before going to a parent cache.

· Allow other caches to access portions of their cached content, but do not permit Internet transit through their cache.

These more complex relationships between caches allow different organizations to peer with each other, connecting their caches for mutual benefit. Caches that provide selective peering support are called sibling caches (Screenshot 7-10). Because HTTP doesn't provide sibling cache support, people have extended HTTP with protocols, such as the Internet Cache Protocol (ICP) and the HyperText Caching Protocol (HTCP). We'll talk about these protocols in Chapter 20.

Hypertext Transfer Protocol (HTTP)