OPTIMIZATION ALGORITHMS USING SSA

Since we are primarily interested in SSA form because it provides quick access to important dataflow information, we should pay some attention to datastructure representations of the SSA graph. The objects of interest are statements, basic blocks, and variables:

Statement Fields of interest are containing block, previous statement in block, next statement in block, variables defined, variables used. Each statement may be an ordinary assignment, ϕ-function, fetch, store, or branch.

Variable Has a definition site (statement) and a list of use sites.

Block Has a list of statements, an ordered list of predecessors, a successor (for blocks ending with a conditional branch, more than one successor). The order of predecessors is important for determining the meaning φ(v₁, v₂, v₃) inside the block.

DEAD-CODE ELIMINATION

The SSA data structure makes dead-code analysis particularly quick and easy. A variable is live at its site of definition if and only if its list of uses is not empty. This is true because there can be no other definition of the same variable (it's single-assignment form!) and the definition of a variable dominates every use - so there must be a path from definition to use.^[1] This leads to the following iterative algorithm for deleting dead code:

while there is some variable v with no uses
 and the statement that defines v has no other side effects
 do delete the statement that defines v

In deleting a statement v x ⊕ y or the statement v ← φ(x, y), we take care to remove the statement from the list of uses of x and of y. This may cause x or y to become dead, if it was the last use. To keep track of this efficiently, Algorithm 19.12 uses a work list W of variables that need to be reconsidered. This takes time proportional to the size of the program plus the number of variables deleted (which itself cannot be larger than the size of the program) - or linear time overall. The only question is how long it takes to delete S from a (potentially long) list of uses of x_i . By keeping x_i 's list of uses as a doubly linked list, and having each use of x_i point back to its own entry in this list, the deletion can be done in constant time.ALGORITHM 19.12: Dead-code elimination in SSA form.

W ← a list of all variables in the SSA program
while W is not empty
 remove some variable v from W
 if v's list of uses is empty
 let S be v's statement of definition
 if S has no side effects other than the assignment to v
 delete S from the program
 for each variable x_i used by S
 delete S from the list of uses of x_i
 W ← W ∪ {x_i}

If run on the program of Image 19.3b, this algorithm would delete the statement b₁ ← φ(b₀, b₂).

A more aggressive dead-code-elimation algorithm, which uses a different definition of dead, is shown on page 426.

SIMPLE CONSTANT PROPAGATION

Whenever there is a statement of the form v ← c for some constant c, then any use of v can be replaced by a use of c. Any φ-function of the form v ← φ(c₁, c₂,…, c_n), where all the c_i are equal, can be replaced by v c. Each of these conditions is easy to detect and implement using the SSA data structure, and we can use a simple work-list algorithm to propagate constants:

W ← a list of all statements in the SSA program
while W is not empty
 remove some statement S from W
 if S is v ← φ(c, c,..., c) for some constant c
 replace S by v ← c
 if S is v ← c for some constant c
 delete S from the program
 for each statement T that uses v
 substitute c for v in T
 W ← W ∪ {T}

If we run this algorithm on the SSA program of Image 19.4g, then the assignment j₃ ← i₁ will be replaced with j₃ ← 1, and the assignment i₁ ← 1 will be deleted. Uses of variables j₁ and k₁ will also be replaced by constants. The following transformations can all be incorporated into this work-list algorithm, so that in linear time all these optimizations can be done at once:

Copy propagation A single-argument φ-function x ←φ(y) or a copy assignment x ← y can be deleted, and y substituted for every use of x. Constant folding If we have a statement x ← a⊕b, where a and b are constant, we can evaluate c ← a ⊕ b at compile time and replace the statement with x ← c. Constant conditions In block L, a conditional branch if a < b goto L₁ else L₂, where a and b are constant, can be replaced by either goto L₁ or goto L₂, depending on the (compile-time) value of a < b. The control-flow edge from L to L₂ (or L₁, respectively) must be deleted; this reduces the number of predecessors of L₂ (or L₁), and the φ-functions in that block must be adjusted accordingly (by removing an argument).
Unreachable code Deleting a predecessor may cause block L₂ to become unreachable. In this case, all the statements in L₂ can be deleted; use lists of all the variables that are used in these statements must be adjusted accordingly. Then the block itself should be deleted, reducing the number of predecessors of its successor blocks.

CONDITIONAL CONSTANT PROPAGATION

In the program of Image 19.4b, is j always equal to 1?

If j = 1 always, then block 6 will never execute, so the only assigment to j is j ← i, so j = 1 always.
If sometimes j > 20, then block 6 will eventually execute, which assigns j ← k, so that eventually j > 20.

Each of these statements is self-consistent; but which is true in practice? In fact, when this program executes, j is never set to any value other than 1. This is a kind of least fixed point (analogous to what is described in on page 209). The "simple" constant-propagation algorithm has the problem of assuming the block 6 might be executed, and therefore that j might not be constant, and therefore that perhaps j ≥ 20, and therefore that block 6 might be executed. Simple constant propagation finds a fixed point that is not the least fixed point. Why would programmers put never-executed statements in their programs? Many programs have statements of the form if debug then … where debug is a constant false value; we would not like to let the statements in the debug clauses get in the way of useful optimizations. The SSA conditional constant propagation finds the least fixed point: It does not assume a block can be executed until there is evidence that it can be, and it does not assume a variable is nonconstant until there is evidence, and so on. The algorithm tracks the run-time value of each variable as follows: V[v]= ⊥ We have seen no evidence that any assignment to v is ever executed. V[v]= 4 We have seen evidence that an assignment v ← 4 is executed, but no evidence that v is ever assigned any other value. V[v]= ⊺ We have seen evidence that v will have, at various times, at least two different values, or some value (perhaps read from an input file or from memory) that is not predictable at compile time. Thus we have a lattice of values, with ⊥ meaning never defined, 4 meaning defined as 4, and ⊺ meaning overdefined: Java ScreenShot

New information can only move a variable up in the lattice.^[2] We also track the executability of each block, as follows: ε[B]= false We have seen no evidence that block B can ever be executed. ε[B]= true We have seen evidence that block B can be executed. Initially we start with V[]= ⊥ for all variables, and ε[]= false for all blocks. Then we observe the following:

Any variable v with no definition, which is therefore an input to the program, a formal parameter to the procedure, or (horrors!) an uninitialized variable, must have V[v] ⊺.
The start block B₁ is executable: ε[B₁] ← true.
For any executable block B with only one successor C, set ε[C] ← true.
For any executable assignment v ← x ⊕ y, where V[x]= c₁ and V[y]= c₂, set V[v] ← c₁ ⊕ c₂.
For any executable assignment v ← x ⊕ y, where V[x]=⊺ or V[y]=⊺, set V[v] ⊺.
For any executable assignment v ← φ(x₁,…, x_n), where V[x_i]= c₁, V[x_j]= c₂, c₁ ≠ c₂, the ith predecessor is executable, and the jth predecessor is executable, set V[v] ← ⊺.
For any executable assignment v ← MEM() or v ← CALL(), set V[v]← ⊺.
For any executable assignment v ← φ(x₁,…, x_n), where V[x_i]=⊺ and the ith predecessor is executable, set V[v]← ⊺.
For any assignment v← φ(x₁,…, x_n) whose ith predecessor is executable and V[x_i]= c₁; and for every j either the jth predecessor is not executable, or V[x_j]=⊥, or V[x_j]= c₁, set V[v] ← c₁.
For any executable branch if x < y goto L₁ else L₂, where V[x]=⊺ or V[y]=⊺, set ε[L₁] ← true and ε[L₂] ← true.
For any executable branch if x < y goto L₁ else L₂, where V[x]= c₁ and V[y]= c₂, set ε[L₁] ← true or ε[L₂] ← true depending on c₁ < c₂.

An executable assignment is an assignment statement in a block B with ε[B]= true. These conditions "ignore" any expression or statement in an unexecutable block, and the φ-functions "ignore" any operand that comes from an unexecutable predecessor. The algorithm can be made quite efficient using work lists: There will be one work list W_v for variables and and another work list W_b for blocks. The algorithm proceeds by picking x from W_v and considering conditions 4-9 for any statement in x's list of uses; or by picking a block B from W_b and considering condition 3, and conditions 4-9 for any statement within B. Whenever a block is newly marked executable, it and its executable successors are added to W_e. Whenever V[x] is "raised" from ⊺ to c or from c to ⊥, then x is added to W_v. When both W_v and W_b are empty, the algorithm is finished. The algorithm runs quickly, because for any x it raises V[x] at most twice, and for any B it changes ε[B] at most once. We use this information to optimize the program as follows. After the analysis terminates, wherever ε[B]= false, delete block B. Wherever V[x]= c, substitute c for x and delete the assignment to x. Image 19.13 shows the conditional constant propagation algorithm executed on the program of Image 19.4. The algorithm finds that all the j variables are constant (with value 1), k₁ is constant (with value 0), and block 6 is not executed. Deleting unreachable blocks, and replacing uses of constant variables with the constant value - deleting their definitions - leads to some empty blocks and a φ-function that has only one argument; these can be simplified, leaving the program of Image 19.13d.
Image 19.13: Conditional constant propagation.

The unique successor or predecessor property is important for the proper operation of this algorithm. Suppose we were to do conditional constant propagation on the graph of Image 19.2b, in a case where M[x] is known to be 1. Then blocks 1, 2, 3, and 4 will be marked executable, but it will not be clear that edge 2 → 4 cannot be taken. In Image 19.2c, block 5 would not be executable, making the situation clear. By using the edge-split SSA form, we avoid the need to mark edges (not just blocks) executable.

PRESERVING THE DOMINANCE PROPERTY

Almost every reasonable optimizing transformation - including the ones described above - preserves the dominance property of the SSA program: The definition of a variable dominates each use (or, when the use is in a φ-function, the predecessor of the use). It is important to preserve this property, since some optimization algorithms (such as Algorithm 19.17) depend on it. Also, the very definition of SSA form - that there is a φ-function at the convergence point of any two dataflow paths - implicitly requires it.ALGORITHM 19.17: Calculation of live ranges in SSA form, and building the inter-ference graph. The graph-walking algorithm is expressed as a mutual recursion between LiveOutAtBlock, LiveInAtStatement, and LiveOutAtStatement. The recursion is bounded whenever LiveOutAtBlock finds an already walked block, or whenever LiveOutAtStatement reaches the definition of v.

LivenessAnalysis() = LiveInAtStatement(s, v)=
 for each variable v v is live-in at s
 M ← {} if s is the first statement of some block n
 for each site-of-use s of v v is live-in at n
 if s is a φ-function with for each predecessor p of n
 v as its ith argument LiveOutAtBlock(p;v)
 let p be the ith predecessor of else
 the block containing s let s′ be the statement preceding s
 LiveOutAtBlock(p, v) LiveOutAtStatement(s′, v)
 else LiveInAtStatement(s, v) LiveOutAtStatement(s, v)=
LiveOutAtBlock(n;v)= v is live-out at s
 v is live-out at n let W be the set of variables that s defines
 if n∉ M for each variable w∉ (W -{v}
 M ← M ∪ {n} add (v, w) to interference graph
 let s be the last statement in n if v∉ W
 LiveOutAtStatement(s, v) LiveInAtStatement(s;v)

But there is one kind of optimization that does not preserve the dominance property. In the program of Image 19.14a, we can prove that - because the condition z < 0 evaluates the same way in blocks 1 and 4 - the use of x₂ in block 5 always gets the value x₁, never x₀. Thus it is tempting to substitute x₁ for x₂ in block 5. But the resulting graph does not have the dominance property: Block 5 is not dominated by the definition of x₁ in block 2.
Image 19.14: This transformation does not preserve the dominance property of SSA form, and should be avoided.

Therefore this kind of transformation - based on the knowledge that two conditional branches test the same condition - is not valid for SSA form.

ARRAYS, POINTERS, AND MEMORY

For many purposes in optimization, parallelization, and scheduling, the compiler needs to know, "how does statement B depend on statement A?" The transformations of constant propagation and dead-code removal have relied on this dependence information. There are several kinds of dependence relations:

Read-after-write A defines variable v, then B uses v.
Write-after-write A defines v, then B defines v.
Write-after-read A uses v, then B defines v.
Control A controls whether B executes.

Read-after-write dependences are evident in the SSA graph: A defines v, v's list of uses points to B;or B's use list contains v, and v's def-site is A. Control dependences will be discussed in .

In SSA form, there are no write-after-write or write-after-read dependences. Statements A and B can never write to the same variable, and any use must be "after" (that is, dominated by) the variable's definition.

MEMORY DEPENDENCE

The discussion thus far of assigments and φ-function has been only for scalar nonescaping variables. Real programs must also load and store memory words. One way to get a single-assignment property for memory is to ensure that each memory word is written only once. Although this seems severe, it is just what a pure functional coding language does (see ) - with a garbage collector behind the scenes to allow actual reuse of physical memory locations. However, in an imperative language we must do something else. Consider a sequence of stores and fetches such as this one:

1 M[i] ← 4
2 x ← M[j]
3 M[k] ← j

We cannot treat each individual memory location as a separate variable for static-single-assigment purposes, because we don't know whether i, j, and k are the same address. We could perhaps treat memory as a "variable", where the store instruction creates a new value (of the entire memory):

1 M₁ ← store(M₀, i, 4)
2 x ← load(M₁, j)
3 M₂ ← store(M₁, k, j)

This creates the def-use edges Java ScreenShot and . These def-use edges are like any SSA def-use relationship, and we make φ-functions for them at join points in the same way. But there is no edge from 2 → 3, so what prevents the compiler from reordering the statements as follows?

1 M₁ ← store(M₀, i, 4)
3 M₂ ← store(M₁, k j)
4 x ← load(M₁, j)

The functional dependences are still correct - if M₁ is viewed as a snapshot of memory after statement 1, then statement 4 is still correct in loading from address j in that snapshot. But it is inefficient - to say the least! - for the computer to keep more than one copy of the machine's memory. We would like to say that there is a write-after-read dependence 2 → 3 to prevent the compiler from creating M₂ before all uses of M₁ have been computed. But calculation of accurate dependence information for memory locations is beyond the scope of this chapter. A naive but practical solution In the absence of write-after-read and writeafter-write dependence information, we will just say that a store instruction is always presumed live - we will not do dead-code elimination on stores - and we will not transform the program in such a way as to interchange a load and a store, or two stores. Store instructions can be unreachable, however, and unreachable stores can be deleted.

The optimization algorithms presented in this chapter do not reorder instructions, and they do not attempt to propagate dataflow information through memory, so they implicitly use this naive model of loads and stores. ^[1]As usual, we are considering only connected graphs. ^[2]Authors in the subfield of dataflow analysis use ? to mean overdefined and > to mean never defined; authors in semantics and abstract interpretation use ? for undefined and > for overdefined; we are following the latter practice.