Previous    Next

GENERATIONAL COLLECTION

In many programs, newly created objects are likely to die soon; but an object that is still reachable after many collections will probably survive for many collections more. Therefore the collector should concentrate its effort on the "young" data, where there is a higher proportion of garbage. We divide the heap into generations, with the youngest objects in generation G0; every object in generation G1 is older than any object in G0; everything in G2 is older than anything in G1, and so on. To collect (by mark-and-sweep or by copying) just G0, just start from the roots and do either depth-first marking or breadth-first copying (or semidepth-first copying). But now the roots are not just program variables: They include any pointer within G1, G2,… that points into G0. If there are too many of these, then processing the roots will take longer than the traversal of reachable objects within G0! Fortunately, it is rare for an older object to point to a much younger object. In many common coding styles, when an object a is created its fields are immediately initialized; for example, they might be made to point to b and c. But b and c already exist; they are older than a. So we have a newer object pointing to an older object. The only way that an older object b could point to a newer object a is if some field of b is updated long after b is created; this turns out to be rare. To avoid searching all of G1, G2,… for roots of G0, we make the compiled program remember where there are pointers from old objects to new ones. There are several ways of remembering:

Java Click To expand
Image 13.12: Generational collection. The bold arrow is one of the rare pointers from an older generation to a newer one.

When a garbage collection begins, the remembered set tells which objects (or cards, or pages) of the old generation can possibly contain pointers into G0; these are scanned for roots. Algorithm 13.3 or 13.9 can be used to collect G0: "heap" or "from-space" means G0, "to-space" means a new area big enough to hold the reachable objects in G0, and "roots" include program variables and the remembered set. Pointers to older generations are left unchanged: The marking algorithm does not mark old-generation records, and the copying algorithm copies them verbatim without forwarding them. After several collections of G0, generation G1 may have accumulated a significant amount of garbage that should be collected. Since G0 may contain many pointers into G1, it is best to collect G0 and G1 together. As before, the remembered set must be scanned for roots contained in G2, G3;…. Even more rarely, G2 will be collected, and so on. Each older generation should be exponentially bigger than the previous one. If G0 is half a megabyte, then G1 should be two megabytes, G2 should be eight megabytes, and so on. An object should be promoted from Gi to Gi+1 when it survives two or three collections of Gi. Cost of generational collection Without detailed empirical information about the distribution of object lifetimes, we cannot analyze the behavior of generational collection. In practice, however, it is common for the youngest generation to be less than 10% live data. With a copying collector, this means that H / R is 10 in this generation, so that the amortized cost per word reclaimed is c3 R / (10RR), or about 1 instruction. If the amount of reachable data in G0 is about 50 to 100 kilobytes, then the amount of space "wasted" by having H = 10R in the youngest generation is about a megabyte. In a 50-megabyte multigeneration system, this is a small space cost. Collecting the older generations can be more expensive. To avoid using too much space, a smaller H / R ratio can be used for older generations. This increases the time cost of an older-generation collection, but these are sufficiently rare that the overall amortized time cost is still good.

Maintaining the remembered set also takes time, approximately 10 instructions per pointer update to enter an object into the remembered set and then process that entry in the remembered set. If the program does many more updates than fresh allocations, then generational collection may be more expensive than nongenerational collection.


JaVaScreenshot Previous    Next
Comments