Previous Next |
Many dataflow analyses - including the ones described in this chapter - can be expressed using simultaneous equations on finite sets. So also can many of the algorithms used in constructing finite automata () and parsers (). The equations can usually be set up so that they can be solved by iteration: by treating the equations as assignment statements and repeatedly executing all the assignments until none of the sets changes any more. There are several ways to speed up the evaluation of dataflow equations.
A set S over a finite domain (that is, where the elements are integers in the range 1 − N or can be put in an array indexed by 1 − N) can be represented by a bit vector. The ith bit in the vector is a 1 if the element i is in the set S. In the bit-vector representation, unioning two sets S and T is done by a bitwise-or of the bit vectors. If the word size of the computer is W, and the vectors are N bits long, then a sequence of N/W or instructions can union two sets. Of course, 2N/W fetches and N/W stores will also be necessary, as well as indexing and loop overhead. Intersection can be done by bitwise-and, set complement can be done by bitwise complement, and so on.
Thus, the bit-vector representation is commonly used for dataflow analysis. It would be inadvisable to use bit vectors for dataflow problems where the sets are expected to be very sparse (so the bit vectors would be almost all zeros), in which case a different implementation of sets would be faster.
Suppose we have a node n in the flow graph that has only one predecessor, p, and p has only one successor, n. Then we can combine the gen and kill effects of p and n and replace nodes n and p with a single node. We will take reaching definitions as an example, but almost any dataflow analysis permits a similar kind of combining. Consider what definitions reach out of the node n: out[n] = gen[n] ∪ (in[n] − kill[n]). We know in[n] is just out[p]; therefore out[n] = gen[n] ∪ ((gen[p] ∪ (in[p] − kill[p])) − kill[n]). By using the identity (A ∪ B) − C = (A − C) ∪ (B − C) and then (A − B) − C = A − (B ∪ C), we have out[n] = gen[n] ∪ ((gen[p] −kill[n]) ∪ (in[p] − (kill[p] ∪ kill[n])). If we want to say that node pn combines the effects of p and n, then this last equation says that the appropriate gen and kill sets for pn are gen[pn] = gen[n] ∪ (gen[p] − kill[n])
kill[pn] = kill[p] ∪ kill[n]. We can combine all the statements of a basic block in this way, and agglomerate the gen and kill effects of the whole block. The control-flow graph of basic blocks is much smaller than the graph of individual statements, so the multipass iterative dataflow analysis works much faster on basic blocks.
Once the iterative dataflow analysis algorithm is completed, we may recover the dataflow information of an individual statement (such as n) within a block (such as pn in our example) by starting with the in set computed for the entire block and - in one pass - applying the gen and kill sets of the statements that precede n in the block.
In a forward dataflow problem (such as reaching definitions or available expressions), the information coming out of a node goes in to the successors. If we could arrange that every node was calculated before its successors, the dataflow analysis would terminate in one pass through the nodes. This would be possible if the control-flow graph had no cycles. We would topologically sort the flow graph - this just gives an ordering where each node comes before its successors - and then compute the dataflow equations in sorted order. But often the graph will have cycles, so this simple idea won't work. Even so, quasi-topologically sorting a cyclic graph by depth-first search helps to reduce the number of iterations required on cyclic graphs; in quasisorted order, most nodes come before their successors, so information flows forward quite far through the equations on each iteration. Depth-first search (Algorithm 17.5) topologically sorts an acyclic graph graph, or quasi-topologically sorts a cyclic graph, quite efficiently. Using sorted, the order computed by depth-first search, the iterative solution of dataflow equations should be computed as
repeat for i ← 1 to N n sorted[i] in ∪_{p∈pred[n]} out[p] out[n] ← gen[n] ∪ (in - kill[n]) until no out set changed in this iteration
Topological-sort: function DFS(i)
N ← number of nodes if mark[i] = false
for all nodes i mark[i] ← true
mark[i] ← false for each successor s of node i
DFS(start-node) DFS(s)
sorted[N] ← i
N N - 1
There is no need to make in a global array, since it is used only locally in computing out.
For backward dataflow problems such as liveness analysis, we use a version of Algorithm 17.5, starting from exit-node instead of start-node, and traversing predecessor instead of successor edges.
Information about reaching definitions can be kept as use-def chains, that is, for each use of a variable x, a list of the definitions of x reaching that use. Use-def chains do not allow faster dataflow analysis per se, but allow efficient implementation of the optimization algorithms that use the results of the analysis. A generalization of use-def chains is static single-assignment form, described in . SSA form not only provides more information than use-def chains, but the dataflow analysis that computes it is very efficient.
One way to represent the results of liveness analysis is via def-use chains: a list, for each definition, of all possible uses of that definition. SSA form also contains def-use information.
If any out set changes during an iteration of the repeat-until loop of an iterative solver, then all the equations are recalculated. This seems a pity, since most of the equations may not be affected by the change. A work-list algorithm keeps track of just which out sets must be recalculated. Whenever node n is recalculated and its out set is found to change, all the successors of n are put onto the work list (if they're not on it already). This is illustrated in Algorithm 17.6.
ALGORITHM 17.6: A work-list algorithm for reaching definitions.W ← the set of all nodes
while W is not empty
remove a node n from W
old ← out[n]
in ← ∪_{p*in;pred[n]} out[p]
out[n] ← gen[n] ∪ (in - kill[n])
if old ≠ out[n]
for each successor s of n
if s ∉ W
put s into W
The algorithm will converge faster if, whenever a node is removed from W for processing, we choose the node in W that occurs earliest in the sorted array produced by Algorithm 17.5.
The coalescing, graph-coloring register allocator described in is an example of a work-list algorithm with many different work lists. describes a work-list algorithm for constant propagation.
Using the results of dataflow analysis, the optimizer can perform program transformations: moving, modifying, or deleting instructions. But optimizations can cascade:
Removal of the dead code a ← b ⊕ c might cause b to become dead in a previous instruction b ← x ⊕ y.
x ← b + c
y ← a + x
u ← b + c
v ← a + u
after u ← b + c is replaced by u ← x, copy propagation changes a + u to a + x, which is a common subexpression and can be eliminated. A simple way to organize a dataflow-based optimizer is to perform a global flow analysis, then make all possible dataflow-based optimizations, then repeat the global flow analysis, then perform optimizations, and so on until no more optimizations can be found. At best this iterates two or three times, so that on the third round there are no more transformations to perform. But the worst case is very bad indeed. Consider a program in which the statement z ← a_{1} + a_{2} + a_{3} +…+ a_{n} occurs where z is dead. This translates into the quadruples
Liveness analysis determines that z is dead; then dead-code elimination removes the definition of z. Then another round of liveness analysis determines that x_{n−2} is dead, and then dead-code elimination removes x_{n−2}, and so on. It takes n rounds of analysis and optimization to remove x_{1} and then determine that there is no more work to do. A similar situation occurs with common-subexpression elimination, when there are two occurrences of an expression such as a_{1} + a_{2} + a_{3} +…+ a_{n} in the program. To avoid the need for repeated, global calculations of dataflow information, there are several strategies:
Cutoff: Perform no more than k rounds of analysis and optimization, for k = 3 or so. Later rounds of optimization may not be finding many transformations to do anyway. This is a rather unsophisticated approach, but at least the compilation will terminate in a reasonable time.
The algorithm maintains a table T, mapping variables to value numbers, and also mapping triples of the form (value number, operator, value number) to value numbers. For efficiency, T should be represented as a hash table. There is also a global number N counting how many distinct values have been seen so far. Using T and N, the value-numbering algorithm (Algorithm 17.7) scans the quadruples of a block from beginning to end. Whenever it sees an expression b + c, it looks up the value number of b and the value number of c. It then looks up hash(n_{b}, n_{c}, +) in T ; if found, it means that b + c repeats the work of an earlier computation; we mark b + c for deletion, and use the previously computed result. If not found, we leave b + c in the program and also enter it in the hash table.
ALGORITHM 17.7: Value numbering.T ← empty N ← 0
for each quadruple a ← b ⊕ c in the block
if (b ↦ k) ∈ T for some k
n_{b} ← k
else
N ← N + 1
n_{b} ← N
put b ↦ n_{b} into T
if (c ↦ k) ∈ T for some k
n_{c} ← k
else
N ← N + 1
n_{c} ← N
put c ↦ n_{c} into T
if ((n_{b}, ⊕, n_{c}) ↦ m) ∈ T for some m
put a ↦ m into T
mark this quadruple a ← b ⊕ c as a common subexpression
else
N ← N + 1
put (n_{b}, ⊕, n_{c}) ↦ N into T
put a ↦ N into T
Image 17.8 illustrates value numbering on a basic block: (a) is the list of quadruples, and (b) is the table (after the algorithm is finished). We can view the table as a directed acyclic graph (DAG), if we view an entry (m; ⊕, n) ↦ q as a node q with edges to nodes m and n, as shown in Image 17.8c.
Value numbering is an example of a single dataflow analysis that calculates the effect of cascaded optimizations: in this case, cascaded common-subexpression elimination. But the optimizer would like to perform a wide variety of transformations - especially when the loop optimizations described in the next chapter are included. It is very hard to design a single dataflow analysis capable of predicting the results of many different optimizations in combination. Instead, we use a general-purpose dataflow analyzer and a general-purpose optimizer; but when the optimizer changes the program, it must tell the analyzer what information is no longer valid. Incremental liveness analysis. For example, an incremental algorithm for liveness analysis must keep enough information so that if a statement is inserted or deleted, the liveness information can be efficiently updated. Suppose we delete this statement s : a ← b ⊕ c from a flow graph on which we have live-in and live-out information for every node. The changes to the dataflow information are as follows:
a is no longer defined here. Therefore, if a is live-out of this node, it will now be live-in where it was not before.
A work-list algorithm will be useful here, since we can just add the predecessor of s to the work list and run until the work list is empty; this will often terminate quickly. Propagating change (1) does the same kind of thing that the original (non-incremental) work-list algorithm for liveness does: It makes the live-sets bigger. Thus, our proof (Exercise 10.2) that the algorithm finds a least fixed point of the liveness equations also applies to the propagation of additional liveness caused by the deletion of the definition of a. Even the proof that the liveness analysis terminates was based on the idea that any change makes things bigger, and there was an a priori limit to how big the sets could get. But change (2) makes live-sets smaller, not bigger, so naively running our original algorithm starting from the previously computed in and out sets may find a fixed point that is not a least fixed point. For example, suppose we have the following program:
Liveness analysis shows that d is live-in at statements 1, 2, 3, 3a, 4, 5. But a is not live-out of statement 3a, so this statement is dead code, and we can delete it. If we then start with the previously computed dataflow information and use Algorithm 10.4 (page 206) until it reaches a fixed point, we will end up with the column Y of Table 10.7, which is not the best possible approximation of the actual liveness information. A more refined liveness analysis Therefore, we must use a better algorithm. The solution is that at each point where a variable d is defined, we must keep track of exactly what uses it might have. Our liveness calculation will be very much like Algorithm 10.4, but it will operate on sets of uses instead of sets of variables. In fact, it is just like the reaching definitions algorithm in reverse. Let uses(v) be the set of all uses of variable v in the program. Given a statement s : a ← b ⊕ c, the set live-out[s] ∩ uses (a) contains all the uses of a that could possibly be reached by this definition. Now, when we delete a quadruple that uses some variable b, we can delete that use of b from all the live-in and live-out sets. This gives the least fixed point, as we desire. Cascades of dead code After deleting statement 3a from the program above, the incremental liveness analysis will find that statement 0 is dead code and can be deleted. Thus, incremental liveness analysis cooperates well with dead-code elimination. Other kinds of dataflow analysis can similarly be made in-cremental; sometimes, as in the case of liveness analysis, we must first refine the analysis.
JaVa | Previous Next |