A FUNCTIONAL INTERMEDIATE FORM

A functional coding language is one in which (as discussed in ) execution proceeds by binding variables to values, and never modifying a variable once it is initialized. This permits equational reasoning, which is useful to the programmer. But equational reasoning is even more useful to the compiler - many compiler optimizations involve the rewriting of a slow program into an equivalent faster program. When the compiler doesn't have to worry about x's value now versus x's value later, then these transformations are easier to express. This single-assignment property is at the heart of both functional coding and SSA form. There is a close relationship between the functional intermediate representations used by functional-language compilers and the SSA form used by imperative-language compilers. Image 19.18 shows the abstract syntax of the kind of intermediate representation used in modern functional-language compilers. It aspires to the best qualities of quadruples, SSA form, and lambda-calculus. As in quadruple notation, expressions are broken down into primitive operations whose order of evaluation is specified, every intermediate result is an explicitly named temporary, and every argument of an operator or function is an atom (variable or constant). As in SSA form and lambda-calculus, every variable has a single assignment (or binding), and every use of the variable is within the scope of the binding. As in lambda-calculus, scope is a simple syntactic notion, not requiring calculation of dominators.

atom → c Constant integer
atom → s Constant string pointer
atom → v Variable
exp → let fundefs in exp Function declaration
exp → let v = atom in exp Copy
exp → let v = binop(atom, atom) in exp Arithmetic operator
exp → let v = M[atom] in exp Fetch from memory
exp → M[atom]:=atom; exp Store to memory
exp → if atom relop atom then exp else exp Conditional branch
exp → atom(args) Tail call
exp → let v = atom(args) in exp Non-tail call
exp → return atom Return
args →
args → atom args fundefs →
fundefs → fundefs function v(formals) = exp formals →
formals → v formals
binop → plus | minus | mul | ...
relop → eq | ne | lt | ...

Image 19.18: Functional intermediate representation. Binding occurrences of variables are underlined.

Scope No variable name can be used in more than one binding. Every binding of a variable has a scope within which all the uses of that variable must occur. For a variable bound by let v = … in exp, the scope of v is just the

exp. The scope of a function variable f_i bound in
 let function f₁(...) = exp₁
 :
 :
 function f_k(...) = exp_k
 in exp

includes all the exp_j (to allow for mutually recursive functions) as well as the exp. For a variable bound as the formal parameter of a function, the scope is the body of that function. These scope rules make many optimizations easy to reason about; we will take inline expansion of functions as an example. As discussed in , when we have a definition f(x) = E and a use f(z) we can replace the use by a copy of E but with all the x's replaced by z's. In the Tree language of this is difficult to express because there are no functions; in the functional notation of the substitution can get complicated if z is a nonatomic expression (as shown in Algorithm 15.8b). But in the functional intermediate form of Image 19.18, where all actual parameters are atomic, inline expansion becomes very simple, as shown in Algorithm 15.8a. Translating SSA into functional form Any SSA program can be translated into this functional form, as shown in Algorithm 19.20. Each control-flow node with more than one predecessor becomes a function. The arguments of that function are precisely the variables for which there are φ-functions at the node. If node f dominates node g, then the function for g will be nested inside the body of the function for f . Instead of jumping to a node, a control-flow edge into a φ-containing node is represented by a function call. Program 19.19 shows how a translated program looks.SSA program of Image 19.4g converted to functional intermediate form.

ALGORITHM 19.20: Translating SSA to functional intermediate form.

Translate(node)=
 let C be the children of node in the dominator tree
 let p₁,..., p_n be the nodes of C that have more than one predecessor
 for i ← 1 to n
 let a₁,..., a_k be the targets of φ functions in p_i (possibly k = 0)
 let S_i = Translate(p_i)
 let F_i = "function f _pi (a₁,..., a_k) = S_i"
 let F = F₁ F₂ ... F_n
 return Statements(node, 1, F)
Statements(node, j, F)=
if there are < j statements in node
 then let s be the successor of node
 if s has only one predecessor
 then return Statements(s, 1, F)
 else s has m predecessors
 suppose node is the ith predecessor of s
 suppose the φ-functions in s are a₁ ← φ(a₁₁,..., a₁_m), ...
 a_k ← φ(a_k₁,..., a_km)
 return "let F in f_s(a₁_i,..., a_ki )"
else if the jth statement of node is a φ-function
 then return Statements(node, j + 1, F)
else if the jth statement of node is "return a"
 then return "let F in return a"
else if the jth statement of node is a ← b ⊕ c The cases for a ← b,
then let S = Statements(node, j + 1, F) a ← M[b], and M[a] ← b
return "let a = b ⊕ c in S" are similar.
else if the jth statement of node is "if a < b goto s₁ else s₂"
 then (in edge-split SSA form) s₁ has only one predecessor, as does s₂
 let S₁ = Translate(s₁)
 let S₂ = Translate(s₂)
 return "let F in if a < b then S₁ else S₂"

Translating functional programs into functional intermediate form A functional program in a language such as PureFun-MiniJava starts in a form that obeys all the scope rules, but arguments are not atomic and variables are not unique. It is a simple matter to introduce well-scoped intermediate temporaries by a recursive walk of expression trees; dominator and SSA calculations are unnecessary.

All of the SSA-based optimization algorithms work equally well on a functional intermediate form; so will the optimizations and transformations on functional programs described in . Functional intermediate forms can also be made explicitly typed, type-checkable, and polymorphic as described in . All in all, this kind of intermediate representation has much to recommend it.