EFFICIENT COMPUTATION OF THE DOMINATOR TREE
A major reason for using SSA form is that it makes the optimizing compiler faster. Instead of using costly iterative bit-vector algorithms to link uses to definitions (to compute reaching definitions, for example), the compiler can just look up the (unique) definition, or the list of uses, of each variable. For SSA to help make a compiler faster, we must be able to compute the SSA form quickly. The algorithms for computing SSA from the dominator tree are quite efficient. But the iterative set-based algorithm for computing dominators, given in , may be slow in the worst case. An industrial-strength compiler that uses dominators should use a more efficient algorithm for computing the dominator tree. The near-linear-time algorithm of Lengauer and Tarjan relies on properties of the depth-first spanning tree of the control-flow graph. This is just the recursion tree implicitly traversed by the depth-first search (DFS) algorithm, which numbers each node of the graph with a depth-first number (dfnum) as it is first encountered. The algorithm is rather technical; those readers who feel content just knowing that the dominator tree can be calculated quickly can skip to .
DEPTH-FIRST SPANNING TREES
We can use depth-first search to calculate a depth-first spanning tree of the control-flow graph. Image 19.8 shows a CFG and a depth-first spanning tree, along with the dfnum of each node.
Image 19.8: A control-Flow graph and trees derived from it. The numeric labels in part (b) are the dfnums of the nodes.
A given CFG may have many different depth-first spanning trees. From now on we will assume that we have arbitrarily picked one of them - by depth-first search. When we say "a is an ancestor of b" we mean that there is some path from a to b following only spanning-tree edges, or that a = b;"a is a proper ancestor of b" means that a is an ancestor of b and a ≠ b. Properties of depth-first spanning trees The start node r of the CFG is the root of the depth-first spanning tree. If a is a proper ancestor of b, then dfnum(a) < dfnum(b). Suppose there is a CFG path from a to b but a is not an ancestor of b. This means that some edge on the path is not a spanning-tree edge, so b must have been reached in the depth-first search before a was (otherwise, after visiting a the search would continue along tree edges to b). Thus, dfnum(a) > dfnum(b). Therefore, if we know that there is a path from a to b, we can test whether a is an ancestor of b just by comparing the dfnum's of a and b. When drawing depth-first spanning trees, we order the children of a node in the order that they are visited by the depth-first search, so that nodes to the right have a higher dfnum. This means that if a is an ancestor of b, and there is a CFG path from a to b that departs from the spanning tree, it must branch off to the right of the tree path, never to the left. Dominators and spanning-tree paths Consider a nonroot node n in the CFG, and its immediate dominator d. The node d must be an ancestor of n in the spanning tree - because any path (including the spanning-tree path) from r to n must include d. Therefore dfnum(d) < dfnum(n). Now we know that n's immediate dominator must be on the spanning-tree path between r and n; all that's left is to see how high up it is. If some ancestor x does not dominate n, then there must be a path that departs from the spanning-tree path above x and rejoins it below x. The nodes on the bypassing path are not ancestors of n, so their dfnum's are higher than n's. The path might rejoin the spanning-tree path to n either at n or above n.
SEMIDOMINATORS
Paths that bypass ancestors of n are useful for proving that those ancestors do not dominate n. Let us consider, for now, only those bypassing paths that rejoin the spanning tree at n (not above n). Let's find the path that departs from the tree at the highest possible ancestor s, and rejoins the tree at n. We will call s the semidominator of n. Another way of saying this is that s is the node of smallest dfnum having apathto n whose nodes (not counting s and n) are not ancestors of n. This description of semidominators does not explicitly say that s must be an ancestor of n, but of course any nonancestor with a path to n would have a higher dfnum than n's own parent in the spanning tree, which itself has a path to n with no nonancestor internal nodes (actually, no internal nodes at all). Very often, a node's semidominator is also its immediate dominator. But as the figure at right shows, to find the dominator of n it's not enough just to consider bypassing paths that rejoin the tree at n. Here, a path from r to n bypasses n's semidominator s, but rejoins the tree at node y, above n. However, finding the semidominator s is still a useful step toward finding the dominator d.
Semidominator Theorem To find the semidominator of a node n, consider all predecessors v of n in the CFG.
- If v is a proper ancestor of n in the spanning tree (so dfnum(v) < dfnum(n)), then v is a candidate for semi(n).
- If v is a nonancestor of n (so dfnum(v) > dfnum(n)), then for each u that is an ancestor of v (or u = v), let semi(u) be a candidate for semi(n).
Of all these candidates, the one with lowest dfnum is the semidominator of n. Proof See the Further Reading section. Calculating dominators from semidominators Let s be the semidominator of n. If there is a path that departs from the spanning tree above s, bypasses s, and rejoins the spanning tree at some node between s and n, then s does not dominate n. However, if we find the node y between s and n with the smallest-numbered semidominator, and semi(y) is a proper ancestor of s, then y's immediate dominator also immediately dominates n. Dominator Theorem. On the spanning-tree path below semi(n) and above or including n, let y be the node with the smallest-numbered semidominator (minimum dfnum(semi(y))). Then,
Proof See the Further Reading section.
THE LENGAUER-TARJAN ALGORITHM
Using these two theorems, Algorithm 19.9 uses depth-first search (DFS) to compute dfnum's for every node. Then it visits the nodes in order, from highest dfnum to lowest, computing semidominators and dominators. As it visits each node, it puts the node into a spanning forest for the graph. It's called a forest because there may be multiple disconnected fragments; only at the very end will it be a single spanning tree of all the CFG nodes.ALGORITHM 19.9: Lengauer-Tarjan algorithm for computing dominators.
DFS(node p, node n)= if dfnum[n] = 0 dfnum[n] ← N; vertex[N] ← n; parent[n] ← p N ← N + 1 for each successor w of n DFS(n, w) Link(node p, node n)= add edge p → n to spanning forest implied by ancestor array AncestorWithLowestSemi(node n)= in the forest, find the nonroot ancestor of n that has the lowest-numbered semidominator Dominators() = N ← 0;∀n. bucket[n] ←{} ∀n. dfnum[n] ← 0, semi[n] ← ancestor[n] ← idom[n] ← samedom[n] ← none DFS(none, r) for i ← N - 1 downto 1 Skip over node 0, the root node. n ← vertex[i]; p ← parent[n]; s ← p for each predecessor v of n These lines calcu- if dfnum[v]≤ dfnum[n] late the semidom- s′ v inator of n, based else s′ ← semi[AncestorWithLowestSemi(v)] on the Semidom- if dfnum[s′] < dfnum[s] inator Theorem. s ← s′ semi[n] ← s Calculation of n's dominator is deferred bucket[s] ← bucket[s] ∪ {n} until the path from s to n has been linked Link(p, n) into the forest. for each v in bucket[p] Now that the path from p to v has been linked into y ← AncestorWithLowestSemi(v) the spanning forest, these lines calculate the dom- if semi[y] = semi[v] inator of v, based on the first clause of the Domi- idom[v] ← p nator Theorem, or else defer the calculation until else samedom[v] ← y y's dominator is known. bucket[p] ← {} for i ← 1 to N - 1 n ← vertex[i] Now all the deferred dominator calcula- if samedom[n]≠ none tions, based on the second clause of the idom[n] ← idom[samedom[n]] Dominator Theorem, are performed.
Calculating semidominators requires that, given some edge v → n, we look at all ancestors of v in the spanning tree that have a higher dfnum than n. When Algorithm 19.9 processes node n, only nodes with a higher dfnum than n will be in the forest. Thus, the algorithm can simply examine all ancestors of v that are already in the forest. We use the Dominator Theorem to compute the immediate dominator of n, by finding node y with the lowest semidominator on the path from semi[n] to n. When s = semi[n] is being computed, it's not yet possible to determine y; but we will be able to do so later, when s is being added to the spanning forest. Therefore with each semidominator s we keep a bucket of all the nodes that s semidominates; when s is linked into the spanning forest, we can then calculate the idom of each node in [s]. The forest is represented by an ancestor array: For each node v, ancestor[v] points to v's parent. This makes searching upward from v easy. Algorithm 19.10a shows a too-slow version of the AncestorWithLowestSemi and Link functions that manage the spanning forest. Link sets the ancestor relation, and AncestorWithLowestSemi searches upward for the ancestor whose semidominator has the smallest dfnum.ALGORITHM 19.10: Two versions of AncestorWithLowestSemi and Link functions for operations on spanning forest. The naive version (a) takes O(N) per operation (so the algorithm runs in time O(N2)) and the efficient version (b) takes O(log N) amortized time per operation, for an O(N log N) algorithm.
AncestorWithLowestSemi(node v)= u ← v while ancestor[v]≠ none if dfnum[semi[v]] < dfnum[semi[u]] u ← v v ← ancestor[v] return u Link(node p, node n)= ancestor[n] ← p (a) Naive version, O(N) per operation. AncestorWithLowestSemi(node v)= a ← ancestor[v] if ancestor[a] ≠none b ← AncestorWithLowestSemi(a) ancestor[v] ← ancestor[a] if dfnum[semi[b]] < dfnum[semi[best[v]]] best[v] b return best[v] Link(node p, node n)= ancestor[n] ← p; best[n] ← n (b) With path-compression, O(log N) per operation.
But each call to AncestorWithLowestSemi could take linear time (in N, the number of nodes in the graph) if the spanning tree is very deep; and AncestorWithLowestSemi is called once for each node and edge. Thus Algorithm 19.9+19.10a has quadratic worst-case time complexity. Path compression The algorithm may call AncestorWithLowestSemi(v) several times for the same node v. The first time, AncestorWithLowestSemi traverses the nodes from v to a1, some ancestor of v, as shown in Image 19.11a. Then perhaps some new links a3 → a2 → a1 are added to the forest above a1, so the second AncestorWithLowestSemi(v) searches up to a3. But we would like to avoid the duplicate traversal of the path from v to a1. Furthermore, suppose we later call AncestorWithLowestSemi(w) on some child of v. During that search we would like to be able to skip from v to a1.
Image 19.11: Path compression. (a) Ancestor links in a spanning tree; AncestorWithLowestSemi(v) traverses three links. (b) New nodes a2, a3 are linked into the tree. Now AncestorWithLowestSemi(w) would traverse 6 links. (c) AncestorWithLowestSemi(v) with path compression redirects ancestor links, but best[v] remembers the best intervening node on the compressed path between v and a1. (d) Now, after a2 and a3 are linked, AncestorWithLowestSemi(w) traverses only 4 links.
The technique of path compression makes AncestorWithLowestSemi faster. For each node v in the spanning forest, we allow ancestor[v] to point to some ancestor of v that may be far above v's parent. But we must remember - in best[v] - the best node in the skipped-over path between ancestor[v] and v. ancestor[v] = Any node above v in the spanning forest. best[v] = The node whose semidominator has the lowest dfnum, in the skipped over path from ancestor[v] down to v (including v but not ancestor[v]). Now, when AncestorWithLowestSemi searches upwards, it can compress paths by setting ancestor[v] ← ancestor[ancestor[v]], as long as it updates best[v] at the same time. This is shown in Algorithm 19.10b. In a graph of K nodes and E edges, there will be K − 1 calls to Link and E + K − 1 calls to AncestorWithLowestSemi. With path compression it can be shown that this takes O(E log K) time. In terms of the "size" N = E + K of the control-flow graph, Algorithm 19.9+19.10b takes O(N log N) time. Balanced path compression The most sophisticated version of the Lengauer Tarjan algorithm is Algorithm 19.9 with Link and AncestorWithLowestSemi functions that rebalance the spanning trees, so that the work of path compression is undertaken only when it will do the most good. This algorithm has time complexity O(N · α(N)), where α(N) is the slowly growing inverse Ackermann function that is for all practical purposes constant. In practice it appears that this sophisticated algorithm is about 35% faster than the N log N algorithm (when measured on graphs of up to 1000 nodes). See also the Further Reading section of this chapter.