STACK FRAMES

The simplest notion of a stack is a data structure that supports two operations, push and pop. However, it turns out that local variables are pushed in large batches (on entry to functions) and popped in large batches (on exit). Furthermore, when local variables are created they are not always initialized right away. Finally, after many variables have been pushed, we want to continue accessing variables deep within the stack. So the abstract push and pop model is just not suitable. Instead, we treat the stack as a big array, with a special register - the stack pointer - that points at some location. All locations beyond the stack pointer are considered to be garbage, and all locations before the stack pointer are considered to be allocated. The stack usually grows only at the entry to a function, by an increment large enough to hold all the local variables for that function, and, just before the exit from the function, shrinks by the same amount. The area on the stack devoted to the local variables, parameters, return address, and other temporaries for a function is called the function's activation record or stack frame. For historical reasons, run-time stacks usually start at a high memory address and grow toward smaller addresses. This can be rather confusing: Stacks grow downward and shrink upward, like icicles. The design of a frame layout takes into account the particular features of an instruction set architecture and the coding language being compiled. However, the manufacturer of a computer often prescribes a "standard" frame layout to be used on that architecture, where possible, by all compilers for all coding languages. Sometimes this layout is not the most convenient one for a particular coding language or compiler. But by using the "standard" layout, we gain the considerable benefit that functions written in one language can call functions written in another language. Image 6.2 shows a typical stack frame layout. The frame has a set of incoming arguments (technically these are part of the previous frame but they are at a known offset from the frame pointer) passed by the caller. The return address is created by the CALL instruction and tells where (within the calling function) control should return upon completion of the current function. Some local variables are in this frame; other local variables are kept in machine registers. Sometimes a local variable kept in a register needs to be saved into the frame to make room for other uses of the register; there is an area in the frame for this purpose. Finally, when the current function calls other functions, it can use the outgoing argument space to pass parameters.
Image 6.2: A stack frame.

THE FRAME POINTER

Suppose a function g(…) calls the function f(a₁,…, a_n). We say g is the caller and f is the callee. On entry to f, the stack pointer points to the first argument that g passes to f . On entry, f allocates a frame by simply subtracting the frame size from the stack pointer SP. The old SP becomes the current frame pointer FP. In some frame layouts, FP is a separate register; the old value of FP is saved in memory (in the frame) and the new FP becomes the old SP. When f exits, it just copies FP back to SP and fetches back the saved FP. This arrangement is useful if f 's frame size can vary, or if frames are not always contiguous on the stack. But if the frame size is fixed, then for each function f the FP will always differ from SP by a known constant, and it is not necessary to use a register for FP at all − FP is a "fictional" register whose value is always SP + framesize.

Why talk about a frame pointer at all? Why not just refer to all variables, parameters, etc., by their offset from SP, if the frame size is constant? The frame size is not known until quite late in the compilation process, when the number of memory-resident temporaries and saved registers is determined. But it is useful to know the offsets of formal parameters and local variables much earlier. So, for convenience, we still talk about a frame pointer. And we put the formals and locals right near the frame pointer at offsets that are known early; the temporaries and saved registers go farther away, at offsets that are known later.

REGISTERS

A modern machine has a large set of registers (typically 32 of them). To make compiled programs run fast, it's useful to keep local variables, intermediate results of expressions, and other values in registers instead of in the stack frame. Registers can be directly accessed by arithmetic instructions; on most machines, accessing memory requires separate load and store instructions. Even on machines whose arithmetic instructions can access memory, it is faster to access registers. A machine (usually) has only one set of registers, but many different procedures and functions need to use registers. Suppose a function f is using register r to hold a local variable and calls procedure g, which also uses r for its own calculations. Then r must be saved (stored into a stack frame) before g uses it and restored (fetched back from the frame) after g is finished using it. But is it f 's responsibility to save and restore the register, or g's? We say that r is a caller-save register if the caller (in this case, f) must save and restore the register, and r is callee-save if it is the responsibility of the callee (in this case, g). On most machine architectures, the notion of caller-save or callee-save register is not something built into the hardware, but is a convention described in the machine's reference manual. On the MIPS computer, for example, registers 16-23 are preserved across procedure calls (callee-save), and all other registers are not preserved across procedure calls (caller-save).

Sometimes the saves and restores are unnecessary. If f knows that the value of some variable x will not be needed after the call, it may put x in a caller-save register and not save it when calling g. Conversely, if f hasalocal variable i that is needed before and after several function calls, it may put i in some callee-save register r_i, andsave r_i just once (upon entry to f )and fetch it back just once (before returning from f ). Thus, the wise selection of a caller-save or callee-save register for each local variable and temporary can reduce the number of stores and fetches a program executes. We will rely on our register allocator to choose the appropriate kind of register for each local variable and temporary value.

PARAMETER PASSING

On most machines whose calling conventions were designed in the 1970s, function arguments were passed on the stack.^[1] But this causes needless memory traffic. Studies of actual programs have shown that very few functions have more than four arguments, and almost none have more than six. Therefore, parameter-passing conventions for modern machines specify that the first k arguments (for k = 4 or k = 6, typically) of a function are passed in registers r_p, …, r_{p+ k−1}, and the rest of the arguments are passed in memory. Now, suppose f(a₁, …, a_n) (which received its parameters in r₁, …, r_n, for example) calls h(z). It must pass the argument z in r₁; so f saves the old contents of r₁ (the value a₁) in its stack frame before calling h. But there is the memory traffic that was supposedly avoided by passing arguments in registers! How has the use of registers saved any time? There are four answers, any or all of which can be used at the same time:

Some procedures don't call other procedures - these are called leaf procedures. What proportion of procedures are leaves? Well, if we make the (optimistic) assumption that the average procedure calls either no other procedures or calls at least two others, then we can describe a "tree" of procedure calls in which there are more leaves than internal nodes. This means that most procedures called are leaf procedures.
Leaf procedures need not write their incoming arguments to memory. In fact, often they don't need to allocate a stack frame at all. This is an important savings.
Some optimizing compilers use interprocedural register allocation, analyzing all the functions in an entire program at once. Then they assign different procedures different registers in which to receive parameters and hold local variables. Thus f(x) might receive x in r₁, but call h(z) with z in r₇.
Even if f is not a leaf procedure, it might be finished with all its use of the argument x by the time it calls h (technically, x is a dead variable at the point where h is called). Then f can overwrite r₁ without saving it.
Some architectures have register windows, so that each function invocation can allocate a fresh set of registers without memory traffic.

If f needs to write an incoming parameter into the frame, where in the frame should it go? Ideally, f 's frame layout should matter only in the implementation of f . A straightforward approach would be for the caller to pass arguments a₁, …; a_k in registers and a_k+1, …; a_n at the end of its own frame - the place marked outgoing arguments in Image 6.2 - which become the incoming arguments of the callee. If the callee needed to write any of these arguments to memory, it would write them to the area marked local variables. The C coding language actually allows you to take the address of a formal parameter and guarantees that all the formal parameters of a function are at consecutive addresses! This is the varargs feature that printf uses. Allowing programmers to take the address of a parameter can lead to a dangling reference if the address outlives the frame - as the address of x will in int *f(int x){return &x;} - and even when it does not lead to bugs, the consecutive-address rule for parameters constrains the compiler and makes stack-frame layout more complicated. To resolve the contradiction that parameters are passed in registers, but have addresses too, the first k parameters are passed in registers; but any parameter whose address is taken must be written to a memory location on entry to the function. To satisfy printf, the memory locations into which register arguments are written must all be consecutive with the memory locations in which arguments k + 1, k + 2, etc., are written. Therefore, C programs can't have some of the arguments saved in one place and some saved in another - they must all be saved contiguously. So in the standard calling convention of many modern machines the calling function reserves space for the register arguments in its own frame, next to the place where it writes argument k + 1. But the calling function does not actually write anything there; that space is written into by the called function, and only if the called function needs to write arguments into memory for any reason.

A more dignified way to take the address of a local variable is to use call-by-reference. With call-by-reference, the programmer does not explicitly manipulate the address of a variable x. Instead, if x is passed as the argument to f(y), where y is a "by-reference" parameter, the compiler generates code to pass the address of x instead of the contents of x. At any use of y within the function, the compiler generates an extra pointer dereference. With call-by-reference, there can be no "dangling reference", since y must disappear when f returns, and f returns before x's scope ends.

RETURN ADDRESSES

When function g calls function f, eventually f must return. It needs to know where to go back to. If the call instruction within g is at address a, then (usually) the right place to return to is a + 1, the next instruction in g. This is called the return address. On 1970s machines, the return address was pushed on the stack by the call instruction. Modern science has shown that it is faster and more flexible to pass the return address in a register, avoiding memory traffic and also avoiding the need to build any particular stack discipline into the hardware.

On modern machines, the call instruction merely puts the return address (the address of the instruction after the call) in a designated register. A nonleaf procedure will then have to write it to the stack (unless interprocedural register allocation is used), but a leaf procedure will not.

FRAME-RESIDENT VARIABLES

So a modern procedure-call convention will pass function parameters in registers, pass the return address in a register, and return the function result in a register. Many of the local variables will be allocated to registers, as will the intermediate results of expression evaluation. Values are written to memory (in the stack frame) only when necessary for one of these reasons:

the variable will be passed by reference, so it must have a memory address (or, in the C language the & operator is anywhere applied to the variable);
the variable is accessed by a procedure nested inside the current one;^[2]
the value is too big to fit into a single register;^[3]
the variable is an array, for which address arithmetic is necessary to extract components;
the register holding the variable is needed for a specific purpose, such as parameter passing (as described above), though a compiler may move such values to other registers instead of storing them in memory;
or there are so many local variables and temporary values that they won't all fit in registers, in which case some of them are "spilled" into the frame.

We will say that a variable escapes if it is passed by reference, its address is taken (using C's & operator), or it is accessed from a nested function.

When a formal parameter or local variable is declared, it's convenient to assign it a location - either in registers or in the stack frame - right at that point in processing the program. Then, when occurrences of that variable are found in expressions, they can be translated into machine code that refers to the right location. Unfortunately, the conditions in our list don't manifest themselves early enough. When the compiler first encounters the declaration of a variable, it doesn't yet know whether the variable will ever be passed by reference, accessed in a nested procedure, or have its address taken; and it doesn't know how many registers the calculation of expressions will require (it might be desirable to put some local variables in the frame instead of in registers). An industrial-strength compiler must assign provisional locations to all formals and locals, and decide later which of them should really go in registers.

STATIC LINKS

In languages that allow nested function declarations (such as Pascal, ML, and Java), the inner functions may use variables declared in outer functions. This language feature is called block structure. For example, consider Program 6.3, which is written with a Pascal-like syntax. The function write refers to the outer variable output, and indent refers to outer variables n and output. To make this work, the function indent must have access not only to its own frame (for variables i and s) but also to the frames of show (for variable n) and prettyprint (for variable output).Nested functions.

1 type tree = {key: string, left: tree, right: tree}
2
3 function prettyprint(tree: tree) : string =
4 let
5 var output := ""
6
7 function write(s: string) =
8 output := concat(output,s)
9
10 function show(n:int, t: tree) =
11 let function indent(s: string) =
12 (for i := 1 to n
13 do write(" ");
14 output := concat(output,s); write("\n"))
15 in if t=nil then indent(".")
16 else (indent(t.key);
17 show(n+1,t.left);
18 show(n+1,t.right))
19 end
20
21 in show(0,tree); output
22 end

There are several methods to accomplish this:

Whenever a function f is called, it can be passed a pointer to the frame of the function statically enclosing f ; this pointer is the static link.
A global array can be maintained, containing - in position i - a pointer to the frame of the most recently entered procedure whose static nesting depth is i. This array is called a display.
When g calls f, each variable of g that is actually accessed by f (or by any function nested inside f ) is passed to f as an extra argument. This is called lambda lifting.

We will describe in detail only the method of static links. Which method should be used in practice? See Exercise 6.6. Whenever a function f is called, it is passed a pointer to the stack frame of the "current" (most recently entered) activation of the function g that immediately encloses f in the text of the program. For example, in Program 6.3: Line# 21 prettyprint calls show, passing prettyprint's own frame pointer as show's static link. 10 show stores its static link (the address of prettyprint's frame) into its own frame. 15 show calls indent, passing its own frame pointer as indent's static link. 17 show calls show, passing its own static link (not its own frame pointer) as the static link. 12 indent uses the value n from show's frame. To do so, it fetches at an appropriate offset from indent's static link (which points at the frame of show). 13 indent calls write. It must pass the frame pointer of prettyprint as the static link. To obtain this, it first fetches at an offset from its own static link (from show's frame), the static link that had been passed to show. 14 indent uses the variable output from prettyprint'sframe.Todosoit starts with its own static link, then fetches show's, then fetches output.^[4]

So on each procedure call or variable access, a chain of zero or more fetches is required; the length of the chain is just the difference in static nesting depth between the two functions involved.

FRAMES IN THE MiniJava COMPILER

What sort of stack frames should the MiniJava compiler use? Here we face the fact that every target-machine architecture will have a different standard stack frame layout. If we want MiniJava functions to be able to call C functions, we should use the standard layout. But we don't want the specifics of any particular machine intruding on the implementation of the translation module of the MiniJava compiler. Thus we must use abstraction. Just as the Symbol module provides a clean interface, and hides the internal representation of Symbol.Table from its clients, we must use an abstract representation for frames. The frame interface will look something like this:

package Frame;
import Temp.Temp; import Temp.Label;
public abstract class Access { ... }
public abstract class AccessList {...head;...tail;... }
public abstract class Frame {
 abstract public Frame newFrame(Label name,
 Util.BoolList formals);
 public Label name;
 public AccessList formals;
 abstract public Access allocLocal(boolean escape);
 /* ... other stuff, eventually ... *
}

The abstract class Frame is implemented by a module specific to the target machine. For example, if compiling to the MIPS architecture, there would be

package Mips;
class Frame extends Frame.Frame { ... }

In general, we may assume that the machine-independent parts of the compiler have access to this implementation of Frame; for example,

// in class Main.Main:
Frame.Frame frame = new Mips.Frame(...);

In this way the rest of the compiler may access frame without knowing the identity of the target machine (except an occurrence of the word Mips here and there). The class Frame holds information about formal parameters and local variables allocated in this frame. To make a new frame for a function f with k formal parameters, call newFrame(f, l), where l is a list of k booleans: true for each parameter that escapes and false for each parameter that does not. (In MiniJava, no parameters ever escape.) The result will be a Frame object. For example, consider a three-argument function named g whose first argument escapes (needs to be kept in memory). Then

frame.newFrame(g,new BoolList(true,
 new BoolList(false,
 new BoolList(false, null))))

returns a new frame object. The Access class describes formals and locals that may be in the frame or in registers. This is an abstract data type, so its implementation as a pair of subclasses is visible only inside the Frame module:

package Mips;
class InFrame extends Frame.Access {int offset; ... }
class InReg extends Frame.Access {Temp temp; ... }

InFrame(X) indicates a memory location at offset X from the frame pointer; InReg(t₈₄) indicates that it will be held in "register" t₈₄. Frame.Access is an abstract data type, so outside of the module the InFrame and InReg constructors are not visible. Other modules manipulate accesses using interface functions to be described in the next chapter. The formals field is a list of k "accesses" denoting the locations where the formal parameters will be kept at run time, as seen from inside the callee. Parameters may be seen differently by the caller and the callee. For example, if parameters are passed on the stack, the caller may put a parameter at offset 4 from the stack pointer, but the callee sees it at offset 4 from the frame pointer. Or the caller may put a parameter into register 6, but the callee may want to move it out of the way and always access it from register 13. On the Sparc architecture, with register windows, the caller puts a parameter into register o1, butthe save instruction shifts register windows so the callee sees this parameter in register i1. Because this "shift of view" depends on the calling conventions of the target machine, it must be handled by the Frame module, starting with newFrame. For each formal parameter, newFrame must calculate two things:

How the parameter will be seen from inside the function (in a register, or in a frame location).
What instructions must be produced to implement the "view shift."

For example, a frame-resident parameter will be seen as "memory at offset X from the frame pointer", and the view shift will be implemented by copying the stack pointer to the frame pointer on entry to the procedure.

REPRESENTATION OF FRAME DESCRIPTIONS

The implementation module Frame is supposed to keep the representation of Frame objects secret from any clients of the Frame module. But really it's an object holding:

the locations of all the formals,
instructions required to implement the "view shift",
the number of locals allocated so far,
and the label at which the function's machine code is to begin (see page 131).

Table 6.4 shows the formals of the three-argument function g (see page 127) as newFrame would allocate them on three different architectures: the Pentium, MIPS, and Sparc. The first parameter escapes, so it needs to be InFrame on all three machines. The remaining parameters are InFrame on the Pentium, but InReg on the other machines.

Table 6.4: Formal parameters for g(x₁, x₂, x₃) where x₁ escapes.
	Pentium	MIPS	Sparc

1	`InFrame`(8)	`InFrame`(0)	`InFrame`(68)
Formals 2	`InFrame`(12)	`InReg`(t₁₅₇)	`InReg`(t₁₅₇)
3	`InFrame`(16)	`InReg`(t₁₅₈)	`InReg`(t₁₅₈)
	M[sp + 0] ← fp	sp ← sp − K	`save %sp,-K,%sp`

View Shift	fp ← sp	M[sp + K + 0] ← `r4`	M[fp + 68] ← `i0`
	sp ← sp − K	t₁₅₇ ← `r5`	t₁₅₇ ← `i1`
		t₁₅₈ ← `r6`	t₁₅₈ ← `i2`

The freshly created temporaries t₁₅₇ and t₁₅₈, and the move instructions that copy r4 and r5 into them (or on the Sparc, i1 and i2), may seem superfluous. Why shouldn't the body of g just access these formals directly from the registers in which they arrive? To see why not, consider

void m(int x, int y) { h(y,y); h(x,x); }

If x stays in "parameter register 1" throughout m, and y is passed to h in parameter register 1, then there is a problem. The register allocator will eventually choose which machine register should hold t₁₅₇. If there is no interference of the type shown in function m, then (on the MIPS) the allocator will take care to choose register r4 to hold t₁₅₇ and r5 to hold t₁₅₈. Then the move instructions will be unnecessary and will be deleted at that time.

See also pages 157 and 251 for more discussion of the view shift.

LOCAL VARIABLES

Some local variables are kept in the frame; others are kept in registers. To allocate a new local variable in a frame f, the translation phase calls

f.allocLocal(false)

This returns an InFrame access with an offset from the frame pointer. For example, to allocate two local variables on the Sparc, allocLocal would be called twice, returning successively InFrame(-4) and InFrame(-8), which are standard Sparc frame-pointer offsets for local variables. The boolean argument to allocLocal specifies whether the new variable escapes and needs to go in the frame; if it is false, then the variable can be allocated in a register. Thus, f.allocLocal(false) might create InReg(t₄₈₁). For MiniJava, that no variables escape. This is because in MiniJava:

there is no nesting of classes and methods;
it is not possible to take the address of a variable;
integers and booleans are passed by value; and
objects, including integer arrays, can be represented as pointers that are also passedbyvalue.

The calls to allocLocal need not come immediately after the frame is created. In many languages, there may be variable-declaration blocks nested inside the body of a function. For example,

void f()
{int v=6;
 print(v);
 {int v=7;
 print(v);
 }
 print(v);
 {int v=8;
 print(v);
 }
 print(v);
}

In this case there are three different variables v. The program will print the sequence 6 7 6 8 6. As each variable declaration is encountered in processing the program, we will allocate a temporary or new space in the frame, associated with the name v. As each end (or closing brace) is encountered, the association with v will be forgotten - but the space is still reserved in the frame. Thus, there will be a distinct temporary or frame slot for every variable declared within the entire function.

The register allocator will use as few registers as possible to represent the temporaries. In this example, the second and third v variables (initialized to 7 and 8) could be held in the same register. A clever compiler might also optimize the size of the frame by noticing when two frame-resident variables could be allocated to the same slot.

TEMPORARIES AND LABELS

The compiler's translation phase will want to choose registers for parameters and local variables, and choose machine-code addresses for procedure bodies. But it is too early to determine exactly which registers are available, or exactly where a procedure body will be located. We use the word temporary to mean a value that is temporarily held in a register, and the word label to mean some machine-language location whose exact address is yet to be determined - just like a label in assembly language. Temps are abstract names for local variables; labels are abstract names for static memory addresses. The Temp module manages these two distinct sets of names.

package Temp;
public class Temp {
 public String toString();
 public Temp();
}
public class Label {
 public String toString();
 public Label();
 public Label(String s);
 public Label(Symbol s);
}
public class TempList {...}
public class LabelList {...}

new Temp.Temp() returns a new temporary from an infinite set of temps. new Temp.Label() returns a new label from an infinite set of labels. And new Temp.Label(string)) returns a new label whose assembly-language name is string.

When processing the declaration m(…), a label for the address of m's machine code can be produced by new Temp.Label(). It's tempting to call new Temp.Label("m") instead - the assembly-language program will be easier to debug if it uses the label m instead of L213 - but unfortunately there could be two different methods named m in different classes. A better idea is to call new Temp.Label("C"+"$"+"m"), where C is the name of the class in which the method m occurs.

MANAGING STATIC LINKS

The Frame module should be independent of the specific source language being compiled. Many source languages, including MiniJava, do not have nested function declarations; thus, Frame should not know anything about static links. Instead, this is the responsibility of the translation phase. The translation phase would know that each frame contains a static link. The static link would be passed to a function in a register and stored into the frame. Since the static link behaves so much like a formal parameter, we can treat it as one (as much as possible). ^[1]Before about 1960, they were passed not on the stack but in statically allocated blocks of memory, which precluded the use of recursive functions. ^[2]However, with register allocation across function boundaries, local variables accessed by inner functions can sometimes go in registers, as long as the inner function knows where to look. ^[3]However, some compilers spread out a large value into several registers for efficiency. ^[4]This program would be cleaner if show called write here instead of manipulating output directly, but it would not be as instructive.