The Memory Hierarchy

mem-o-ry: a device in which information can be inserted and stored and from which it may be extracted when wanted
hi-er-ar-chy: a graded or ranked series
Webster's Dictionary

OVERVIEW

An idealized random access memory (RAM) has N words indexed by integers such that any word can be fetched or stored - using its integer address - equally quickly. Hardware designers can make a big slow memory, or a small fast memory, but a big fast memory is prohibitively expensive. Also, one thing that speeds up access to memory is its nearness to the processor, and a big memory must have some parts far from the processor no matter how much money might be thrown at the problem. Almost as good as a big fast memory is the combination of a small fast cache memory and a big slow main memory; the program keeps its frequently used data in cache and the rarely used data in main memory, and when it enters a phase in which datum x will be frequently used it may move x from the slow memory to the fast memory.

It's inconvenient for the programmer to manage multiple memories, so the hardware does it automatically. Whenever the processor wants the datum at address x, it looks first in the cache, and - we hope - usually finds it there. If there is a cache miss - x is not in the cache - then the processor fetches x from main memory and places a copy of x in the cache so that the next reference to x will be a cache hit. Placing x in the cache may mean removing some other datum y from the cache to make room for it, so that some future access to y will be a cache miss.