LAZY EVALUATION
Equational reasoning aids in understanding functional programs. One important principle of equational reasoning is β-substitution: If f(x) = B with some function body B, then any app f(E) to an expression E is equivalent to B with every occurrence of x replaced with E: f′ (x) = B implies that f (E) ≡ B[x ↦ E] But consider the PureFunJava program fragments,
{int loop (int z) { {int loop (int z) { return if (z>0) z return if (z>0) z else loop(z); else loop(z); } } int f (int x) { int f (int x) { return if (y>8) x return if (y>8) x else -y; else -y; } } return return if (y>8) loop(y) f(loop(y)); else -y; } }
If the expression B is if (y>8) x else -y, and expression E is loop(y), then clearly the program on the left contains f(E) and the program on the right contains B[x ↦ E]. So these programs are equivalent, using equational reasoning. However, the programs do not always behave the same! If y = 0, then the program on the right will return 0, but the program on the left will first get stuckinacallto loop(0), which infinite-loops. Clearly, if we want to claim that two programs are equivalent, then they must behave the same. In PureFunJava, if we obtain program A by doing substition on program B, then A and B will never give different results if they both halt; but A or B might not halt on the same set of inputs. To remedy this (partial) failure of equational reasoning, we can introduce lazy evaluation into the coding language. Haskell is the most widely used lazy language. A program compiled with lazy evaluation will not evaluate any expression unless its value is demanded by some other part of the computation. In contrast, strict languages such as MiniJava, PureFunJava, ML, C, and Java evaluate each expression as the control flow of the program reaches it. To explore the compilation of lazy functional languages, we will use the LazyJava language. Its syntax is identical to PureFunJava, and its semantics are almost identical, except that lazy evaluation is used in compiling it.
CALL-BY-NAME EVALUATION
Most coding languages (Pascal, C, ML, Java, MiniJava, PureFunJava) use call-by-value to pass function arguments: To compute f(g(x)), first g(x) is computed, and this value is passed to f. But if f did not actually need to use its argument, then computing g(x) will have been unnecessary. To avoid computing expressions before their results are needed, we can use call-by-name evaluation. Essentially, each variable is not a simple value, but is a thunk: a function that computes the value on demand. The compiler replaces each expression of type int with a function value of type ()->int, and similarly for all other types. At each place where a variable is created, the compiler creates a function value; and everywhere a variable is used, the compiler puts a function app. Thus the LazyJava program
{int a = 5+7; return a + 10; }
is automatically transformed to
{int a() {return 5+7;} return a() + 10; }
Where are variables created? At variable declarations and at function-parameter bindings. Thus, each variable turns into a function, and at each function-call site, we need a little function declaration for each actual-parameter expression. Program 15.14 illustrates this transformation applied to the look function of Program 15.3a.Call-by-name transformation applied to Program 15.3a.
type c_void_int = () -> int; type c_void_tree = () -> tree; class tree { c_void_String key; c_void_int binding; c_void_tree left; c_void_tree right; } public c_void_int look(c_void_tree t, c_void_String k) { c_void_int c = t().key().compareTo(k); if (c() < 0) return look(t().left, k); else if (c() > 0) return look(t().right, k); else return t().binding; }
The problem with call-by-name is that each thunk may be executed many times, each time (redundantly) yielding the same value. For example, suppose there is a tree represented by a thunk t1. Each time look(t1,k) is called, t1() is evaluated, which rebuilds the (identical) tree every time!
CALL-BY-NEED
Lazy evaluation, also called call-by-need, is a modification of call-by-name that never evaluates the same thunk twice. Each thunk is equipped with a memo slot to store the value. When the thunk is first created, the memo slot is empty. Each evaluation of the thunk checks the memo slot: If full, it returns the memo-ized value; if empty, it calls the thunk function. To streamline this process, we will represent a lazy thunk as an object with a thunk function, a memo slot, and (as with any closure object) instance variables to represent free variables as necessary for use by the thunk function. An unevaluated thunk has an empty memo slot, and the thunk function, when called, computes a value and puts it in the memo slot. An evaluated thunk has the previously computed value in its memo slot, and its thunk function just returns the memo-slot value. For example, the LazyJava declaration int twenty = addFive(15) (in Program 15.1) is compiled in a context where the environment pointer EP will point to a record containing the addFive function. The representation of addFive(15) is not a function call that will go and compute the answer now, but a thunk that will remember how to compute it on demand, later. We might translate this fragment of the LazyJava program into FunJava as follows:
/* this already points to a record containing addFive */ c_void_int twenty = new intThunk(this);
which is supported by the auxiliary declarations
class intThunk {public int eval(); int memo; boolean done; } class c_int_int {public int exec(int x);} class intFuncThunk {public c_int_int eval(); c_int_int memo; boolean done; } class twentyThunk extends intThunk { intFuncThunk addFive; public int exec() { if (!done) { memo = addFive.eval().exec(15); done = true; } return memo; } twentyThunk(addFive) {this.addFive=addFive;} } twentyThunk twenty = new twentyThunk(...);
To create a thunk such as twenty, it must be given values for its free variables (in this case, addFive) so that when it later evaluates, it has all the information it needs; this is just the same as closure conversion. To touch a lazy thunk t, we just compute t.eval(). The first time t.eval() is executed, it will see that done is false, and it will calculate the result and put it in memo. Any subsequent time that t is touched, t.eval() will simply return the memo field.
EVALUATION OF A LAZY PROGRAM
Here is a program that uses the enter function of Program 15.3b to build a tree mapping fthree ↦ 3!, −one ↦ (−1)!]:
{int fact(int i) { return if (i==0) 1 else i * fact(i-1) } tree t0 = new tree("",0,null,null); tree t1 = t0.enter("-one", if i=0 then 1 else i * fact(i-1)); tree t2 = t1.enter("three", fact(3)); return putInt(t2.look("three", exit)); }
A curious thing about this program is that fact(-1) is undefined. Thus, if this program is compiled by a (strict) PureFunJava compiler, it will infiniteloop (or will eventually overflow the machine's arithmetic as it keeps subtracting 1 from a negative number). But if compiled by a LazyJava compiler, the program will succeed, printing three factorial! First, variable t1 is defined; but this does not actually call enter - it merely makes a thunk which will do so on demand. Then, t2 is defined, which also does nothing but make a thunk. Then a thunk is created for look(t2,"three") (but look is not actually called). Finally, a thunk for the expression putInt(…,exit) is created. This is the result of the program. But the runtime system then "demands" an answer from this program, which can be computed only by calling the outermost thunk. So the body of putInt executes, which immediately demands the integer value of its first argument; this causes the look(t2,"three") thunk to evaluate. The body of look needs to compare k with t.key. Since k and t are each thunks, we can compute an integer by evaluating k.eval() and a tree by evaluating t.eval(). From the tree we can extract the key field; but each field is a thunk, so we must actually do (t.eval().key)() to get the integer.
The t.key value will turn out to be −1, so look(t().right,k) is called. The program never evaluates the binding thunk in the -one node, so fact(-1) is never given a chance to infinite-loop.
OPTIMIZATION OF LAZY FUNCTIONAL PROGRAMS
Lazy functional programs are subject to many of the same kinds of optimizations as strict functional programs, or even imperative programs. Loops can be identified (these are simply tail-recursive functions), induction variables can be identified, common subexpressions can be eliminated, and so on. In addition, lazy compilers can do some kinds of optimizations that strict functional or imperative compilers cannot, using equational reasoning. For example, given a loop
type intfun = int->int; intfun f (int i) { public int g(int j) {return h(i) * j;} return g; }
an optimizer might like to hoist the invariant computation h(i) out of the function g. After all, g may be called thousands of times, and it would be better not to recompute h(i) each time. Thus we obtain
type intfun = int->int; intfun f (int i) { int hi = h(i) public int g(int j) {return hi * j;} return g; }
and now each time g is called, it runs faster. This is valid in a lazy language. But in a strict language, this transformation is invalid! Suppose after intfun a = f(8) the function a is never called at all; and suppose h(8) infinite-loops; before the "optimization" the program would have terminated successfully, but afterward we get a nonterminating program. Of course, the transformation is also invalid in an impure functional language, because h(8) might have side effects, and we are changing the number of times h(8) is executed. Dead-code removal Another subtle problem with strict coding languages is the removal of dead code. Suppose we have
int f(int i) { int d = g(x); return i+2; }
The variable d is never used; it is dead at its definition. Therefore, the call to g(x) should be removed. In a conventional coding language, such as MiniJava or FunJava, we cannot remove g(x) because it might have side effects that are necessary to the operation of the program. In a strict, purely functional language such as PureFunJava, removing the computation g(x) could optimize a nonterminating computation into a terminating one! Though this seems benign, it can be very confusing to the programmer. We do not want programs to change their input/output behavior when compiled with different levels of optimization. In a lazy language, it is perfectly safe to remove dead computations such as g(x). Deforestation In any language, it is common to break a program into one module that produces a data structure and another module that consumes it. Program 15.15 is a simple example; range(i,j) generates a list of the integers from i to j, squares(l) returns the square of each number, and sum(l) adds up all the numbers.Summing the squares.
class intList {int head; intList tail; intList(head,tail){...}} type intfun = int->int; type int2fun = (int,int) -> int; public int sumSq(intfun inc,int2fun mul, int2fun add) { public intList range(int i, int j) { return if (i>j) then null else intList(i, range(inc(i),j)); } public intList squares(intList l) { return if (l==null) null else intList(mul(l.head,l.head), squares(l.tail)); } int sum(int accum, intList l) { return if (l==null) accum else sum(add(accum,l.head), l.tail); } return sum(0,squares(range(1,100))); }
First range builds a list of 100 integers; then squares builds another list of 100 integers; finally, sum traverses this list. It is wasteful to build these lists. A transformation called deforestation removes intermediate lists and trees (hence the name) and does everything in one pass. The deforested sumSq program looks like this:
public int sumSq(intfun inc,int2fun mul, int2fun add) { public int f(int accum, int i, int j) { return if (i>j) accum else f(add(accum,mul(i,i)),inc(i)); } return f(0,1,100); }
In impure functional languages (where functions can have side effects) deforestation is not usually valid. Suppose, for example, that the functions inc, mul, and add alter global variables, or print on an output file. The deforestation transformation has rearranged the order of calling these functions; instead of
inc(1), inc(2), ... inc(100), mul(1, 1), mul(2, 2),... mul(100, 100), add(0, 1), add(1, 4),... add(328350, 10000)
the functions are called in the order
mul(1, 1), add(0, 1), inc(1), mul(2, 2), add(1, 4), inc(2), ⋮ mul(100, 100), add(328350, 10000), inc(100)
Only in a pure functional language is this transformation always legal.
STRICTNESS ANALYSIS
Although laziness allows certain new optimizations, the overhead of thunk creation and thunk evaluation is very high. If no attention is paid to this problem, then the lazy program will run slowly no matter what other optimizations are enabled. The solution is to put thunks only where they are needed. If a function f(x) is certain to evaluate its argument x, then there is no need to pass a thunk for x; we can just pass an evaluated x instead. We are trading an evaluation now for a certain eventual evaluation. Definition of strictness We say a function f(x) is strict in x if, whenever some actual parameter a would fail to terminate, then f(a) would also fail to terminate. A multi-argument function f(x1,…, xn) is strict in xi if, whenever a would fail to terminate, then f(b1,…, bi−1, a, bi+1 ,…, bn) also fails to terminate, regardless of whether the bj terminate. Let us take an example:
int f(int x, int y) { return x + x + y; } int g(int x, int y) { return if (x>0) y else x; } tree h(String x, int y) { return new tree(x,y,null,null); } int j(int x) { return j(0); }
The function f is strict in its argument x, since if the result f(x,y) is demanded then f will certainly touch (demand the value of) x. Similarly, f is strict in argument y, and g is strict in x. But g is not strict in its second argument, because g can sometimes compute its result without touching y. The function h is not strict in either argument. Even though it appears to "use" both x and y, it does not demand (string or integer) values from them; instead it just puts them into a data structure, and it could be that no other part of the program will ever demand values from the key or binding fields of that particular tree. Curiously, by our definition of strictness, the function j is strict in x even though it never uses x. But the purpose of strictness analysis is to determine whether it is safe to evaluate x before passing it to the function j: Will this cause a terminating program to become nonterminating? In this case, if j is going to be called, it will infinite-loop anyway, so it doesn't matter if we perform a (possibly nonterminating) evaluation of x beforehand. Using the result of strictness analysis Program 15.16 shows the result of transforming the look function (of Program 15.3a) using strictness information. A call-by-name transformation has been applied here, as in Program 15.14, but the result would be similar using call-by-need. Function look is strict in both its arguments t and key. Thus, when comparing k<t.key, it does not have to touch k and t. However, the t.key field still points to a thunk, so it must be touched.Partial call-by-name using the results of strictness analysis; compare with Program 15.14.
bindingThunk look(tree t, key k) { return if (k < t.key.eval()) look(t.left.eval(), k) else if (k > t.key.eval()) look(t.right.eval(), k) else t.binding; }
Since look is strict, callers of look are expected to pass evaluated values, not thunks. This is illustrated by the recursive calls, which must explicitly touch t.left and t.right to turn them from thunks to values. Approximate strictness analysis In some cases, such as the functions f, g, and h above, the strictness or nonstrictness of a function is obvious - and easily determined by an optimizing compiler. But in general, exact strictness analysis is not computable - like exact dynamic liveness analysis (see page 210) and many other dataflow problems. Thus, compilers must use a conservative approximation; where the exact strictness of a function argument cannot be determined, the argument must be assumed nonstrict. Then a thunk will be created for it; this slows down the program a bit, but at least the optimizer will not have turned a terminating program into an infinite-looping program. Algorithm 15.17 shows an algorithm for computing strictness. It maintains aset H of tuples of the form (f, (b1,…, bn)), where n is the number of arguments of f and the bi are booleans. The meaning of a tuple (f, (1, 1, 0)) is this: If f is called with three arguments (thunks), and the first two may halt but the third never halts, then f may halt.ALGORITHM 15.17: First-order strictness analysis.
Function M:
Calculation of H:
until done Strictness (after the calculation of H terminates): f is strict in its ith argument if
If (f, (1, 1, 0)) is in the set H, then f might not be strict in its third argument. If (f, (1, 1, 0)) is never put into H,then f must be strict in its third argument. We also need an auxiliary function to calculate whether an expression may terminate. Given an expression E and a set of variables σ, we say that M(E, σ) means "E may terminate if all the variables in σ may terminate." If E1 is i+j, and there is some possibility that the thunks i and j may halt, then it is also possible that E1 will halt too: M(i + j, {i, j}) is true. But if E2 is if (kJ) i else j, where i and j could conceivably halt but k never does, then certainly E2 will not halt, so M(E2, {i, j}) is false.
Algorithm 15.17 will not work on the full LazyJava language, because it does not handle functions passed as arguments or returned as results. But for first-order programs (without higher-order functions), it does a good job of computing (static) strictness. More powerful algorithms for strictness analysis handle higher-order functions.
FURTHER READING
Church [1941] developed the λ-calculus, a "programming language" of nested functions that can be passed as arguments and returned as results. He was hampered by having no machines to compile for. Closures Landin [1964] showed how to interpret λ-calculus on an abstract machine, using closures allocated on a heap. Steele [1978] used closure representations specialized to different patterns of function usage, so that in many cases nonlocal variables are passed as extra arguments to an inner function to avoid heap-allocating a record. Cousineau et al. [1985] showed how closure conversion can be expressed as a transformation back into the source language, so that closure analysis can be cleanly separated from other phases of code generation. Static links are actually not the best basis for doing closure conversion; for many reasons it is better to consider each nonlocal variable separately, instead of always grouping together all the variables at the same nesting level. Kranz et al. [1986] performed escape analysis to determine which closures can be stack-allocated because they do not outlive their creating function and also integrated closure analysis with register allocation to make a high-performance optimizing compiler. Shao and Appel [1994] integrate closures with the use of callee-save registers to minimize the load/store traffic caused by accessing local and nonlocal variables. Appel [1992, s 10 and 12] has a good overview of closure conversion. Continuations Tail calls are particularly efficient and easy to analyze. Strachey and Wadsworth [1974] showed that the control flow of any program (even an imperative one) can be expressed as function calls, using the notion of continuations. Steele [1978] transformed programs into continuationpassing style early in compilation, turning all function calls into tail calls, to simplify all the analysis and optimization phases of the compiler. Kranz et al. [1986] built an optimizing compiler for Scheme using continuation-passing style; Appel [1992] describes a continuation-based optimizing compiler for ML. Inline expansion Cocke and Schwartz [1970] describe inline expansion of function bodies; Scheifler [1977] shows that it is particularly useful for languages supporting data abstraction, where there tend to be many tiny functions implementing operations on an abstract data type. Appel [1992] describes practical heuristics for controlling code explosion. Continuation-based I/O Wadler [1995] describes the use of monads to generalize the notion of continuation-based interaction. Lazy evaluation Algol-60 [Naur et al. 1963] used call-by-name evaluation for function arguments, implemented using thunks - but also permitted side effects, so programmers needed to know what they were doing! Most of its successors use call-by-value. Henderson and Morris [1976] and Friedman and Wise [1976] independently invented lazy evaluation (call-by-need). Hughes [1989] argues that lazy functional languages permit clearer and more modular coding than imperative languages.
Several lazy pure functional languages were developed in the 1980s; the community of researchers in this area designed and adopted the language Haskell [Hudak et al. 1992] as a standard. Peyton Jones [1987; 1992] describes many implementation and optimization techniques for lazy functional languages; Peyton Jones and Partain [1993] describe a practical algorithm for higher-order strictness analysis. Wadler [1990] describes deforestation.