EFFICIENT TAIL RECURSION

Functional programs express loops and other control flow by function calls. Where has a while loop, has a function call to doListX. Where 's putInt simply returns to its two points of call within printTable, has continuation functions. The FunJava compiler must compile the calls to doListX, doRest, and again as efficently as the MiniJava compiler compiles loops and function returns. Many of the function calls in are in tail position. A function call f(x) within the body of another function g(y) is in tail position if "calling f is the last thing that g will do before returning." More formally, in each of the following expressions, the Bi are in tail contexts, but the Ci are not:

  1. {int x = C1; return B1;}
  2. C1(C2)
  3. if C1 B1 else B2
  4. C1 + C2

For example, C2 in expression 4 is not in a tail context, even though it seems to be "last", because after C2 completes there will still need to be an add instruction. But B1 in expression 3 is in a tail context, even though it is not "last" syntactically. If a function call f(x) is in a tail context with respect to its enclosing expression, and that expression is in a tail context, and so on all the way to the body of the enclosing function definition int g(y) B, then f(x) is a tail call. Tail calls can be implemented more efficiently than ordinary calls. Given

 int g(int y) {int x = h(y); return f(x)}

then h(y) is not a tail call, but f(x) is. When f(x) returns some result z, then z will also be the result returned from g. Instead of pushing a new return address for f to return to, g could just give f the return address given to g, and have f return there directly. That is, a tail call can be implemented more like a jump than a call. The steps for a tail call are

  1. Move actual parameters into argument registers.
  2. Restore callee-save registers.
  3. Pop the stack frame of the calling function, if it has one.
  4. Jump to the callee.

In many cases, item 1 (moving parameters) is eliminated by the copy-propagation (coalescing) phase of the compiler. Often, items 2 and 3 are eliminated because the calling function has no stack frame - any function that can do all its computation in caller-save registers needs no frame. Thus, a tail call can be as cheap as a jump instruction. In , every call is a tail call! Also, none of the functions in this program needs a stack frame. This need not have been true; for example, the call to double in is not in tail position, and this nontail call only disappeared because the inline-expander did away with it. Tail calls implemented as jumps The compilation of and is instructive. shows that the pure functional program and the imperative program are executing almost exactly the same instructions! The figure does not show the functional program's fetching from static-link records; and it does not show the imperative program's saving and restoring callee-save registers.

Java Start Image
printTable: allocate object r1
 jump to doListX doListX: allocate record r2
 if l=nil goto doneL
 i = l.head
 allocate object r3
 jump to putInt again: add this.i+this.i
 jump to putInt doRest: jump to doListX doneL : jump to this.c
printTable: allocate stack frame
 jump to whileL whileL:
 if l=nil goto doneL
 i := l.head
 call putInt
 add i+i
 call putInt
 jump to whileL doneL: return

(a) Functional program

(b) Imperative program

Java End Image

Image 15.13: printTable as compiled.

The remaining inefficiency in the functional program is that it creates three heap-allocated objects, r1,r2,r3, while the imperative program creates only one stack frame. However, more advanced closure-conversion algorithms can succeed in creating only one record (at the beginning of printTable). So the difference between the two programs would be little more than a heap-record creation versus a stack-frame creation. Allocating object on the garbage-collected heap may be more expensive than pushing and popping a stack frame. Optimizing compilers for functional languages solve this problem in different ways: