REFERENCE COUNTS
One day a student came to Moon and said: "I understand how to make a better garbage collector. We must keep a reference count of the pointers to each cons." Moon patiently told the student the following story: "One day a student came to Moon and said: 'I understand how to make a better garbage collector …'"
(MIT-AI koan by Danny Hillis)
Mark-sweep collection identifies the garbage by first finding out what is reachable. Instead, it can be done directly by keeping track of how many pointers point to each record: This is the reference count of the record, and it is stored with each record. The compiler emits extra instructions so that whenever p is stored into x. fi, the reference count of p is incremented, and the reference count of what x. fi previously pointed to is decremented. If the decremented reference count of some record r reaches zero, then r is put on the freelist and all the other records that r points to have their reference counts decremented. Instead of decrementing the counts of r. fi when r is put on the freelist, it is better to do this "recursive" decrementing when r is removed from the freelist, for two reasons:
- It breaks up the "recursive decrementing" work into shorter pieces, so that the program can run more smoothly (this is important only for interactive or real-time programs).
- The compiler must emit code (at each decrement) to check whether the count has reached zero and put the record on the freelist, but the recursive decrementing will be done only in one place, in the allocator.
Reference counting seems simple and attractive. But there are two major problems:
- Cycles of garbage cannot be reclaimed. In Image 13.1, for example, there is a loop of list cells (whose keys are 7 and 9) that are not reachable from program variables; but each has a reference count of 1.
- Incrementing the reference counts is very expensive indeed. In place of the single machine instruction x. fi ← p, the program must execute
A naive reference counter will increment and decrement the counts on every assignment to a program variable. Because this would be extremely expensive, many of the increments and decrements are eliminated using dataflow analysis: As a pointer value is fetched and then propagated through local variables, the compiler can aggregate the many changes in the count to a single increment, or (if the net change is zero) no extra instructions at all. However, even with this technique there are many ref-count increments and decrements that remain, and their cost is very high. There are two possible solutions to the "cycles" problem. The first is simply to require the programmer to explicitly break all cycles when she is done with a data structure. This is less annoying than putting explicit free calls (as would be necessary without any garbage collection at all), but it is hardly elegant. The other solution is to combine reference counting (for eager and nondisruptive reclamation of garbage) with an occasional mark-sweep collection (to reclaim the cycles).
On the whole, the problems with reference counting outweigh its advantages, and it is rarely used for automatic storage management in coding language environments.