Previous Next 
A dataflow analysis of a control flow graph of quadruples collects information about the execution of the program. One dataflow analysis determines how definitions and uses are related to each other, another estimates what values a variable might have at a given point, and so on. The results of these analyses can be used to make optimizing transformations of the program.
For many optimizations we need to see if a particular assignment to a temporary t can directly affect the value of t at another point in the program. We say that an unambiguous definition of t is a particular statement (quadruple) in the program of the form t ← a ⊕ b or t ← M[a]. Given such a definition d, we say that d reaches a statement u in the program if there is some path of controlflow edges from d to u that does not contain any unambiguous definition of t. An ambiguous definition is a statement that might or might not assign a value to t. For example, if t is a global variable, and the statement s is a CALL to a function that sometimes modifies t but sometimes does not, then s is an ambiguous definition. But our MiniJava compiler treats escaping variables as memory locations, not as temporaries subject to dataflow analysis. This means that we never have ambiguous definitions; unfortunately, we also lose the opportunity to perform optimizations on escaping variables. For the remainder of this chapter, we will assume all definitions are unambiguous. We can express the calculation of reaching definitions as the solution of dataflow equations. We label every MOVE statement with a definition ID, and we manipulate sets of definition IDs. We say that the statement d_{1} : t ← x⊕y generates the definition d_{1}, because no matter what other definitions reach the beginning of this statement, we know that d_{1} reaches the end of it. And we say that this statement kills any other definition of t, because no matter what other definitions of t reach the beginning of the statement, they do not reach the end (they cannot directly affect the value of t after this statement). Let us define defs(t) as the set of all definitions (or definition IDs) of the temporary t. Table 17.2 summarizes the generate and kill effects of the different kinds of quadruples.
Statement s 
gen[s] 
kill[s] 



d : t ← b ⊕ c 
{d} 
defs(t) − {d} 
d : t ← M[b] 
{d} 
defs(t) − {d} 
M[a] ← b 
{} 
{} 
if a relop b goto L_{1} else goto L_{2} 
{} 
{} 
goto L 
{} 
{} 
L : 
{} 
{} 
f(a_{1},…, a_{n}) 
{} 
{} 
d : t ← f(a_{1},…, a_{n}) 
{d} 
defs(t) − {d} 
Using gen and kill, we can compute in[n] (and out[n]) the set of definitions that reach the beginning (and end) of each node n:
These equations can be solved by iteration: First in[n] and out[n] are initialized to the empty set, for all n; then the equations are treated as assignment statements and repeatedly executed until there are no changes. We will take Program 17.3 as an example; it is annotated with statement numbers that will also serve as definition IDs. In each iteration, we recalculate in and out for each statement in turn:
PROGRAM 17.31 : a ← 5
2 : c ← 1
3 : L_{1} : if c > a goto L_{2}
4 : c ← c + c
5 : goto L_{1}
6 : L_{2} : a ← c  a
7 : c ← 0
Iteration 3 serves merely to discover that nothing changed since iteration 2.
Having computed reaching definitions, what can we do with the information? The analysis is useful in several kinds of optimization. As a simple example, we can do constant propagation: Only one definition of a reaches statement 3, so we can replace the test c > a with c > 5.
Suppose we want to do commonsubexpression elimination; that is, given a program that computes x ⊕ y more than once, can we eliminate one of the duplicate computations? To find places where such optimizations are possible, the notion of available expressions is helpful. An expression x ⊕ y is available at a node n in the flow graph if, on every path from the entry node of the graph to node n, x ⊕ y is computed at least once and there are no definitions of x or y since the most recent occurrence of x ⊕ y on that path. We can express this in dataflow equations using gen and kill sets, where the sets are now sets of expressions. Any node that computes x ⊕ y generates {x ⊕ y}, and any definition of x or y kills {x ⊕ y}; see Table 17.4.
Statement s 
gen[s] 
kill[s] 

t ← b ⊕ c 
{b ⊕ c} − kill[s] 
expressions containing t 
t ← M[b] 
{M[b]} − kill[s] 
expressions containing t 
M[a] ← b 
{} 
expressions of the form M[x] 
if a > b goto L_{1} else goto L_{2} 
{} 
{} 
goto L 
{} 
{} 
L : 
{} 
{} 
f(a_{1},…, a_{n}) 
{} 
expressions of the form M[x] 
t ← f(a_{1},…, a_{n}) 
{} 
expressions containing t, and expressions of the form M[x] 
Basically, t ← b + c generates the expression b + c. But b ← b + c does not generate b + c, because after b + c there is a subsequent definition of b. The statement gen[s] = {b ⊕ c} − kill[s] takes care of this subtlety. A store instruction (M[a] ← b) might modify any memory location, so it kills any fetch expression (M[x]). If we were sure that a ≠ x, we could be less conservative, and say that M[a] ← b does not kill M[x]. This is called alias analysis; see . Given gen and kill, we compute in and out almost as for reaching definitions, except that we compute the intersection of the out sets of the predecessors instead of a union. This reflects the fact that an expression is available only if it is computed on every path into the node.
To compute this by iteration, we define the in set of the start node as empty, and initialize all other sets to full (the set of all expressions), not empty. This is because the intersection operator makes sets smaller, not bigger as the union operator does in the computation of reaching definitions. This algorithm then finds the greatest fixed point of the equations.
We say that an expression t ← x ⊕ y (in node s of the flow graph) reaches node n if there is a path from s to n that does not go through any assignment to x or y, or through any computation of x ⊕ y. As usual, we can express gen and kill; see Exercise 17.1.
In practice, the reaching expressions analysis is needed by the commonsubexpression elimination optimization only for a small subset of all the expressions in a program. Thus, reaching expressions are usually computed ad hoc, by searching backward from node n and stopping whenever a computation x ⊕ y is found. Or reaching expressions can be computed during the calculation of available expressions; see Exercise 17.4.
has already covered liveness analysis, but it is useful to note that liveness can also be expressed in terms of gen and kill. Any use of a variable generates liveness, and any definition kills liveness:
Statement s 
gen[s] 
kill[s] 



t ← b ⊕ c 
{b, c} 
{t} 
t ← M[b] 
{b} 
{t} 
M[a] ← b 
{a, b} 
{} 
if a > b goto L_{1} else goto L_{2} 
{a, b} 
{} 
goto L 
{} 
{} 
L : 
{} 
{} 
f(a_{1},…, a_{n}) 
{a_{1},…, a_{n}} 
{} 
t ← f(a_{1},…, a_{n}) 
{a_{1},…, a_{n}} 
{t} 

The equations for in and out are similar to the ones for reaching definitions and available expressions, but backward because liveness is a backward dataflow analysis:
JaVa  Previous Next 