INCREMENTAL COLLECTION

Even if the overall garbage collection time is only a few percent of the computation time, the collector will occasionally interrupt the program for long periods. For interactive or real-time programs this is undesirable. Incremental or concurrent algorithms interleave garbage collection work with program execution to avoid long interruptions. Terminology The collector tries to collect the garbage; meanwhile, the compiled program keeps changing (mutating) the graph of reachable data, so it is called the mutator. An incremental algorithm is one in which the collector operates only when the mutator requests it; in a concurrent algorithm the collector can operate between or during any instructions executed by the mutator. Tricolor marking In a mark-sweep or copying garbage collection, there are three classes of records:

The collection starts with all objects white; the collector executes , blackening grey objects and greying their white children. Implicit in changing an object from grey to black is removing it from the stack or queue; implicit in greying an object is putting it into the stack or queue. When there are no grey objects, then all white objects must be garbage.ALGORITHM 13.13: Basic tricolor marking

while there are any grey objects
 select a grey record p
 for each field fi of p
 if record p. fi is white
 color record p. fi grey
 color record p black

generalizes all of the mark-sweep and copying algorithms shown so far: , , , , and . All these algorithms preserve two natural invariants:

  1. No black object points to a white object.
  2. Every grey object is on the collector's (stack or queue) data structure (which we will call the grey-set).

While the collector operates, the mutator creates new objects (of what color?) and updates pointer fields of existing objects. If the mutator breaks one of the invariants, then the collection algorithm will not work. Most incremental and concurrent collection algorithms are based on techniques to allow the mutator to get work done while preserving the invariants. For example:

The first three of these are write-barrier algorithms, meaning that each write (store) by the mutator must be checked to make sure an invariant is preserved. The last two are read-barrier algorithms, meaning that read (fetch) instructions are the ones that must be checked. We have seen write barriers before, for generational collection: Remembered lists, remembered sets, card marking, and page marking are all different implementations of the write barrier. Similarly, the read barrier can be implemented in software (as in Baker's algorithm) or using the virtual-memory hardware. Any implementation of a write or read barrier must synchronize with the collector. For example, a Dijkstra-style collector might try to change a white node to grey (and put it into the grey-set) at the same time the mutator is also greying the node (and putting it into the grey-set). Thus, software implementations of the read or write barrier will need to use explicit synchronization instructions, which can be expensive.

But implementations using virtual-memory hardware can take advantage of the synchronization implicit in a page fault: If the mutator faults on a page, the operating system will ensure that no other process has access to that page before processing the fault.