Previous    Next

## COPYING COLLECTION

The reachable part of the heap is a directed graph, with records as nodes, and pointers as edges, and program variables as roots. Copying garbage collection traverses this graph (in a part of the heap called from-space), building an isomorphic copy in a fresh area of the heap (called to-space). The to-space copy is compact, occupying contiguous memory without fragmentation (that is, without free records interspersed with the reachable data). The roots are made to point at the to-space copy; then the entire from-space (garbage, plus the previously reachable graph) is unreachable. Image 13.7 illustrates the situation before and after a copying collection. Before the collection, the from-space is full of reachable nodes and garbage; there is no place left to allocate, since next has reached limit. After the collection, the area of to-space between next and limit is available for the compiled program to allocate new records. Because the new-allocation area is contiguous, allocating a new record of size n into pointer p is very easy: Just copy next to p, and increment next by n. Copying collection does not have a fragmentation problem.

Image 13.7: Copying collection.

Eventually, the program will allocate enough that next reaches limit; then another garbage collection is needed. The roles of from-space and to-space are swapped, and the reachable data are again copied. Initiating a collection To start a new collection, the pointer next is initialized to point at the beginning of to-space; as each reachable record in from-space is found, it is copied to to-space at position next, and next incremented by the size of the record. Forwarding The basic operation of copying collection is forwarding a pointer; that is, given a pointer p that points to from-space, make p point to to-space (Algorithm 13.8).

ALGORITHM 13.8: Forwarding a pointer.
```function Forward(p)
if p points to from-space
then if p. f1 points to to-space
then return p. f1
else for each field fi of p
next. fi ← p. fi
p. f1 ← next
next ← next+ size of record p
return p. f1
else return p

```

There are three cases:

1. If p points to a from-space record that has already been copied, then p. f1 is a special forwarding pointer that indicates where the copy is. The forwarding pointer can be identified just by the fact that it points within the to-space, as no ordinary from-space field could point there.

2. If p points to a from-space record that has not yet been copied, then it is copied to location next; and the forwarding pointer is installed into p. f1. It's all right to overwrite the f1 field of the old record, because all the data have already been copied to the to-space at next.
3. If p is not a pointer at all, or if it points outside from-space (to a record outside the garbage-collected arena, or to to-space), then forwarding p does nothing.

Cheney's algorithm The simplest algorithm for copying collection uses breadth-first search to traverse the reachable data (Algorithm 13.9, illustrated in Image 13.10). First, the roots are forwarded. This copies a few records (those reachable directly from root pointers) to to-space, thereby incrementing next.

```scan ← next ← beginning of to-space
for each root r
r ← Forward(r)
while scan < next
for each field fi of record at scan
scan. fi Forward(scan. fi)
scan ← scan+ size of record at scan

```

ALGORITHM 13.11: Semi-depth-first forwarding.
```function Forward(p)
if p points to from-space
then if p. f1 points to to-space
then return p. f1
else Chase(p); return p. f1
else return p
function Chase(p)
repeat
q next
next ← next+ size of record p
r ← nil
for each field fi of record p
q. fi ← p. fi
if q. fi points to from-space and q. fi . f1 does not point to to-space
then r ← q. fi
p. f1 ← q
p ← r
until p = nil

```

Cost of garbage collection Breadth-first search (or the semi-depth-first variant) takes time proportional to the number of nodes it marks, that is, c3 R for some constant c3 (perhaps equal to 10 instructions). There is no sweep phase, so c3 R is the total cost of collection. The heap is divided into two semi-spaces, so each collection reclaims H = 2 − R words that can be allocated before the next collection. The amortized cost of collection is thus

instructions per word allocated.

As H grows much larger than R, this cost approaches zero. That is, there is no inherent lower bound to the cost of garbage collection. In a more realistic setting, where H = 4R, the cost would be about 10 instructions per word allocated. This is rather costly in space and time: It requires four times as much memory as reachable data, and requires 40 instructions of overhead for every 4-word object allocated. To reduce both space and time costs significantly, we use generational collection.

 JaVa Previous    Next