MARK-AND-SWEEP COLLECTION
Program variables and heap-allocated records form a directed graph. The variables are roots of this graph. A node n is reachable if there is a path of directed edges r → … → n starting at some root r. A graph-search algorithm such as depth-first search (Algorithm 13.2) can mark all the reachable nodes.ALGORITHM 13.2: Depth-first search.
function DFS(x) if x is a pointer into the heap if record x is not marked mark x for each field fi of record x DFS(x. fi)
Any node not marked must be garbage, and should be reclaimed. This can be done by a sweep of the entire heap, from its first address to its last, looking for nodes that are not marked (Algorithm 13.3). These are garbage and can be linked together in a linked list (the freelist). The sweep phase should also unmark all the marked nodes, in preparation for the next garbage collection.ALGORITHM 13.3: Mark-and-sweep garbage collection.
Mark phase: Sweep phase: for each root v p ← first address in heap DFS(v) while p < last address in heap if record p is marked unmark p else let f1 be the first field in p p. f1 ← freelist freelist ← p p ← p+(size of record p)
After the garbage collection, the compiled program resumes execution. Whenever it wants to heap-allocate a new record, it gets a record from the freelist. When the freelist becomes empty, that is a good time to do another garbage collection to replenish the freelist. Cost of garbage collection Depth-first search takes time proportional to the number of nodes it marks, that is, time proportional to the amount of reachable data. The sweep phase takes time proportional to the size of the heap. Suppose there are R words of reachable data in a heap of size H. Then the cost of one garbage collection is c1R + c2H for some constants c1 and c2; for example, c1 might be 10 instructions and c2 might be 3 instructions. The "good" that collection does is to replenish the freelist with H − R words of usable memory. Therefore, we can compute the amortized cost of collection by dividing the time spent collecting by the amount of garbage reclaimed. That is, for every word that the compiled program allocates, there is an eventual garbage-collection cost of
If R is close to H, this cost becomes very large: Each garbage collection reclaims only a few words of garbage. If H is much larger than R, then the cost per allocated word is approximately c2, or about 3 instructions of garbage-collection cost per word allocated. The garbage collector can measure H (the heap size) and H − R (the freelist size) directly. After a collection, if R/H is larger than 0.5 (or some other criterion), the collector should increase H by asking the operating system for more memory. Then the cost per allocated word will be approximately c1 + 2c2, or perhaps 16 instructions per word. Using an explicit stack The DFS algorithm is recursive, and the maximum depth of its recursion is as long as the longest path in the graph of reachable data. There could be a path of length H in the worst case, meaning that the stack of activation records would be larger than the entire heap!
Image 13.4: Mark-and-sweep collection.
To attack this problem, we use an explicit stack (instead of recursion), as in Algorithm 13.5. Now the stack could still grow to size H, but at least this is H words and not H activation records. Still, it is unacceptable to require auxiliary stack memory as large as the heap being collected.ALGORITHM 13.5: Depth-first search using an explicit stack.
function DFS(x) if x is a pointer and record x is not marked mark x t 1 stack[t] ← x while t > 0 x ← stack[t]; t ← t - 1 for each field fi of record x if x. fi is a pointer and record x. fi is not marked mark x. fi t ← t + 1; stack[t] ← x. fi
Pointer reversal After the contents of field x. fi have been pushed on the stack, Algorithm 13.5 will never again look the original location x. fi. This means we can use x. fi to store one element of the stack itself! This all-too-clever idea is called pointer reversal, because x. fi will be made to point back to the record from which x was reached. Then, as the stack is popped, the field x. fi will be restored to its original value. Algorithm 13.6 requires a field in each record called done, which indicates how many fields in that record have been processed. This takes only a few bits per record (and it can also serve as the mark field).ALGORITHM 13.6: Depth-first search using pointer reversal.
function DFS(x) if x is a pointer and record x is not marked t ← nil mark x; done[x] 0 while true i done[x] if i < # of fields in record x y ← x. fi if y is a pointer and record y is not marked x. fi ← t; t ← x; x ← y mark x; done[x] 0 else done[x] ← i + 1 else y ← x; x ← t if x = nil then return i ← done[x] t ← x. fi; x. fi ← y done[x] ← i + 1
The variable t serves as the top of the stack; every record x on the stack is already marked, and if i = done[x], then x. fi is the "stack link" to the next node down. When popping the stack, x. fi is restored to its original value. An array of freelists The sweep phase is the same no matter which marking algorithm is used: It just puts the unmarked records on the freelist, and unmarks the marked records. But if records are of many different sizes, a simple linked list will not be very efficient for the allocator. When allocating a record of size n, it may have to search a long way down the list for a free block of that size. A good solution is to have an array of several freelists, so that freelist[i] is a linked list of all records of size i. The program can allocate a node of size i just by taking the head of freelist[i]; the sweep phase of the collector can put each node of size j at the head of freelist[j]. If the program attempts to allocate from an empty freelist[i], it can try to grab a larger record from freelist[j] (for j > i) and split it (putting the unused portion back on freelist[j − i]). If this fails, it is time to call the garbage collector to replenish the freelists. Fragmentation It can happen that the program wants to allocate a record of size n, and there are many free records smaller than n but none of the right size. This is called external fragmentation. On the other hand, internal fragmentation occurs when the program uses a too-large record without splitting it, so that the unused memory is inside the record instead of outside.