Previous    Next

VARIOUS DATAFLOW ANALYSES

A dataflow analysis of a control flow graph of quadruples collects information about the execution of the program. One dataflow analysis determines how definitions and uses are related to each other, another estimates what values a variable might have at a given point, and so on. The results of these analyses can be used to make optimizing transformations of the program.

REACHING DEFINITIONS

For many optimizations we need to see if a particular assignment to a temporary t can directly affect the value of t at another point in the program. We say that an unambiguous definition of t is a particular statement (quadruple) in the program of the form t ← ab or t ← M[a]. Given such a definition d, we say that d reaches a statement u in the program if there is some path of control-flow edges from d to u that does not contain any unambiguous definition of t. An ambiguous definition is a statement that might or might not assign a value to t. For example, if t is a global variable, and the statement s is a CALL to a function that sometimes modifies t but sometimes does not, then s is an ambiguous definition. But our MiniJava compiler treats escaping variables as memory locations, not as temporaries subject to dataflow analysis. This means that we never have ambiguous definitions; unfortunately, we also lose the opportunity to perform optimizations on escaping variables. For the remainder of this chapter, we will assume all definitions are unambiguous. We can express the calculation of reaching definitions as the solution of dataflow equations. We label every MOVE statement with a definition ID, and we manipulate sets of definition IDs. We say that the statement d1 : t ← xy generates the definition d1, because no matter what other definitions reach the beginning of this statement, we know that d1 reaches the end of it. And we say that this statement kills any other definition of t, because no matter what other definitions of t reach the beginning of the statement, they do not reach the end (they cannot directly affect the value of t after this statement). Let us define defs(t) as the set of all definitions (or definition IDs) of the temporary t. Table 17.2 summarizes the generate and kill effects of the different kinds of quadruples.

Java ScreenShot
Table 17.2: Gen and kill for reaching definitions.

Statement s

gen[s]

kill[s]


d : t ← bc

{d}

defs(t) − {d}

d : t ← M[b]

{d}

defs(t) − {d}

M[a] ← b

{}

{}

if a relop b goto L1 else goto L2

{}

{}

goto L

{}

{}

L :

{}

{}

f(a1,…, an)

{}

{}

d : t ← f(a1,…, an)

{d}

defs(t) − {d}

Java ScreenShot

Using gen and kill, we can compute in[n] (and out[n]) the set of definitions that reach the beginning (and end) of each node n:

Java ScreenShot

These equations can be solved by iteration: First in[n] and out[n] are initialized to the empty set, for all n; then the equations are treated as assignment statements and repeatedly executed until there are no changes. We will take Program 17.3 as an example; it is annotated with statement numbers that will also serve as definition IDs. In each iteration, we recalculate in and out for each statement in turn:

PROGRAM 17.3
1 : a ← 5
2 : c ← 1
3 : L1 : if c > a goto L2
4 : c ← c + c
5 : goto L1
6 : L2 : a ← c - a
7 : c ← 0



Java End example
Java ScreenShot

Iteration 3 serves merely to discover that nothing changed since iteration 2.

Having computed reaching definitions, what can we do with the information? The analysis is useful in several kinds of optimization. As a simple example, we can do constant propagation: Only one definition of a reaches statement 3, so we can replace the test c > a with c > 5.

AVAILABLE EXPRESSIONS

Suppose we want to do common-subexpression elimination; that is, given a program that computes xy more than once, can we eliminate one of the duplicate computations? To find places where such optimizations are possible, the notion of available expressions is helpful. An expression xy is available at a node n in the flow graph if, on every path from the entry node of the graph to node n, xy is computed at least once and there are no definitions of x or y since the most recent occurrence of xy on that path. We can express this in dataflow equations using gen and kill sets, where the sets are now sets of expressions. Any node that computes xy generates {xy}, and any definition of x or y kills {xy}; see Table 17.4.

Java ScreenShot
Table 17.4: Gen and kill for available expressions.

Statement s

gen[s]

kill[s]

t ← bc

{bc} − kill[s]

expressions containing t

t ← M[b]

{M[b]} − kill[s]

expressions containing t

M[a] ← b

{}

expressions of the form M[x]

if a > b goto L1 else goto L2

{}

{}

goto L

{}

{}

L :

{}

{}

f(a1,…, an)

{}

expressions of the form M[x]

t ← f(a1,…, an)

{}

expressions containing t, and expressions of the form M[x]

Java ScreenShot

Basically, t ← b + c generates the expression b + c. But b ← b + c does not generate b + c, because after b + c there is a subsequent definition of b. The statement gen[s] = {bc} − kill[s] takes care of this subtlety. A store instruction (M[a] ← b) might modify any memory location, so it kills any fetch expression (M[x]). If we were sure that ax, we could be less conservative, and say that M[a] ← b does not kill M[x]. This is called alias analysis; see . Given gen and kill, we compute in and out almost as for reaching definitions, except that we compute the intersection of the out sets of the predecessors instead of a union. This reflects the fact that an expression is available only if it is computed on every path into the node.

Java ScreenShot

To compute this by iteration, we define the in set of the start node as empty, and initialize all other sets to full (the set of all expressions), not empty. This is because the intersection operator makes sets smaller, not bigger as the union operator does in the computation of reaching definitions. This algorithm then finds the greatest fixed point of the equations.

REACHING EXPRESSIONS

We say that an expression t ← xy (in node s of the flow graph) reaches node n if there is a path from s to n that does not go through any assignment to x or y, or through any computation of xy. As usual, we can express gen and kill; see Exercise 17.1.

In practice, the reaching expressions analysis is needed by the common-subexpression elimination optimization only for a small subset of all the expressions in a program. Thus, reaching expressions are usually computed ad hoc, by searching backward from node n and stopping whenever a computation xy is found. Or reaching expressions can be computed during the calculation of available expressions; see Exercise 17.4.

LIVENESS ANALYSIS

has already covered liveness analysis, but it is useful to note that liveness can also be expressed in terms of gen and kill. Any use of a variable generates liveness, and any definition kills liveness:

Statement s

gen[s]

kill[s]


t ← bc

{b, c}

{t}

t ← M[b]

{b}

{t}

M[a] ← b

{a, b}

{}

if a > b goto L1 else goto L2

{a, b}

{}

goto L

{}

{}

L :

{}

{}

f(a1,…, an)

{a1,…, an}

{}

t ← f(a1,…, an)

{a1,…, an}

{t}


The equations for in and out are similar to the ones for reaching definitions and available expressions, but backward because liveness is a backward dataflow analysis:

Java ScreenShot
JaVaScreenshot Previous    Next
Comments