EFFICIENT TAIL RECURSION

Functional programs express loops and other control flow by function calls. Where Program 15.7b has a while loop, Program 15.12 has a function call to doListX. Where Program 15.7b's putInt simply returns to its two points of call within printTable, Program 15.11 has continuation functions. The FunJava compiler must compile the calls to doListX, doRest, and again as efficently as the MiniJava compiler compiles loops and function returns. Many of the function calls in Program 15.11 are in tail position. A function call f(x) within the body of another function g(y) is in tail position if "calling f is the last thing that g will do before returning." More formally, in each of the following expressions, the B_i are in tail contexts, but the C_i are not:

{int x = C₁; return B₁;}
C₁(C₂)
if C₁ B₁ else B₂
C₁ + C₂

For example, C₂ in expression 4 is not in a tail context, even though it seems to be "last", because after C₂ completes there will still need to be an add instruction. But B₁ in expression 3 is in a tail context, even though it is not "last" syntactically. If a function call f(x) is in a tail context with respect to its enclosing expression, and that expression is in a tail context, and so on all the way to the body of the enclosing function definition int g(y) B, then f(x) is a tail call. Tail calls can be implemented more efficiently than ordinary calls. Given

 int g(int y) {int x = h(y); return f(x)}

then h(y) is not a tail call, but f(x) is. When f(x) returns some result z, then z will also be the result returned from g. Instead of pushing a new return address for f to return to, g could just give f the return address given to g, and have f return there directly. That is, a tail call can be implemented more like a jump than a call. The steps for a tail call are

Move actual parameters into argument registers.
Restore callee-save registers.
Pop the stack frame of the calling function, if it has one.
Jump to the callee.

In many cases, item 1 (moving parameters) is eliminated by the copy-propagation (coalescing) phase of the compiler. Often, items 2 and 3 are eliminated because the calling function has no stack frame - any function that can do all its computation in caller-save registers needs no frame. Thus, a tail call can be as cheap as a jump instruction. In Program 15.12, every call is a tail call! Also, none of the functions in this program needs a stack frame. This need not have been true; for example, the call to double in Program 15.6 is not in tail position, and this nontail call only disappeared because the inline-expander did away with it. Tail calls implemented as jumps The compilation of Programs 15.12 and 15.7b is instructive. Image 15.13 shows that the pure functional program and the imperative program are executing almost exactly the same instructions! The figure does not show the functional program's fetching from static-link records; and it does not show the imperative program's saving and restoring callee-save registers.

printTable: allocate object r1
 jump to doListX doListX: allocate record r2
 if l=nil goto doneL
 i = l.head
 allocate object r3
 jump to putInt again: add this.i+this.i
 jump to putInt doRest: jump to doListX doneL : jump to this.c

printTable: allocate stack frame
 jump to whileL whileL:
 if l=nil goto doneL
 i := l.head
 call putInt
 add i+i
 call putInt
 jump to whileL doneL: return

(a) Functional program

(b) Imperative program

Image 15.13: printTable as compiled.

The remaining inefficiency in the functional program is that it creates three heap-allocated objects, r1,r2,r3, while the imperative program creates only one stack frame. However, more advanced closure-conversion algorithms can succeed in creating only one record (at the beginning of printTable). So the difference between the two programs would be little more than a heap-record creation versus a stack-frame creation. Allocating object on the garbage-collected heap may be more expensive than pushing and popping a stack frame. Optimizing compilers for functional languages solve this problem in different ways:

Compile-time escape analysis can identify which closures do not outlive the function that creates them. These objects can be stack-allocated. In the case of printTable, this would make the "functional" code almost identical to the "imperative" code.
Or heap allocation and garbage collection can be made extremely cheap. Then creating (and garbage collecting) a heap-allocated object takes only four or five instructions, making the functional printTable almost as fast as the imperative one (see ).