Previous Next 
If a loop contains a statement t← ⊕ b such that a has the same value each time around the loop, and b has the same value each time, then t will also have the same value each time. We would like to hoist the computation out of the loop, so it is computed just once instead of every time. We cannot always tell if a will have the same value every time, so as usual we will conservatively approximate. The definition d: t ← a_{1} ⊕ a_{2} is loopinvariant within loop L if, for each operand a_{i} ,
a_{i} is a constant,
This leads naturally to an iterative algorithm for finding loopinvariant definitions: First find all the definitions whose operands are constant or from outside the loop, then repeatedly find definitions whose operands are loopinvariant.
Suppose t ← a ⊕ b is loopinvariant. Can we hoist it out of the loop? In Image 18.6a, hoisting makes the program compute the same result faster. But in Image 18.6b, hoisting makes the program faster but incorrect  the original program does not always execute t ← a ⊕ b, but the transformed program does, producing an incorrect value for x if i ≥ N initially. Hoisting in Image 18.6c is also incorrect, because the original loop had more than one definition of t, and the transformed program interleaves the assignments to t in a different way. And hoisting in Image 18.6d is wrong because there is auseof t before the loopinvariant definition, so after hoisting, this use will have the wrong value on the first iteration of the loop.
With these pitfalls in mind, we can set the criteria for hoisting d : t ← a ⊕ b to the end of the loop preheader:
d dominates all loop exits at which t is liveout,
Implicit side effects These rules need modification if t ← a ⊕ b could raise some sort of arithmetic exception or have other side effects; see Exercise 18.7. Turning while loops into repeatuntil loops Condition (1) tends to prevent many computations from being hoisted from while loops; from Image 18.7a it is clear that none of the statements in the loop body dominates the loop exit node (which is the same as the header node). To solve this problem, we can transform the while loop into a repeat loop preceded by an if statement. This requires duplication of the statements in the header node, as shown in Image 18.7b. Statements in the body of a repeat loop dominate the loop exit (unless they are in an inner if, orifthereisa break statement), so condition (1) will be satisfied.
Some loops have a variable i that is incremented or decremented, and a variable j that is set (in the loop) to i ·c+d, where c and d are loopinvariant. Then we can calculate j's value without reference to i; whenever i is incremented by a we can increment j by c · a. Consider, for example, Program 18.8a, which sums the elements of an array. Using inductionvariable analysis to find that i and j are related induction variables, strength reduction to replace a multiplication by 4 with an addition, then inductionvariable elimination to replace i ≥ n by k ≥ 4n + a, followed by miscellaneous copy propagation, we get Program 18.8b. The transformed loop has fewer quadruples; it might even run faster. Let us now take the series of transformations one step at a time.
A loop before and after inductionvariable optimizations.
s ← 0 
s ← 0 
i ← 0 
k′ ← a 
L_{1} : if i≥n goto L^{2} 
b n · 4 
j ← i · 4 
c ← a + b 
k j + a 
L_{1} : if k′ ≥ c goto L^{2} 
x ← M[k] 
x ← M[k′] 
s_{s} + x 
s_{s} + x 
i ← i + 1 
k′ ← k′ + 4 
goto L_{1} 
goto L_{1} 
L_{2} 
L_{2} 
(a) Before 
(b) After 
We say that a variable such as i is a basic induction variable, and j and k are derived induction variables in the family of i. Right after j is defined (in the original loop), we have j = a_{j} + i · b_{j}, where a_{j} = 0 and b_{j} = 4. We can completely characterize the value of j at its definition by (i, a, b), where i is a basic induction variable and a and b are loopinvariant expressions. If there is another derived induction variable with definition k ← j + c_{k}(where c_{k} is loopinvariant), then k is also in the family of i. We can characterize k by the triple (i, a_{j} + c_{k}, b_{j} ), that is, k = a_{j} + c_{k} + i · b_{j} . We can characterize the basic induction variable i by a triple in the same way, that is (i, 0, 1), meaning that i = 0 + i · 1. Thus every induction variable can be characterized by such a triple. If an induction variable changes by the same (constant or loopinvariant) amount in every iteration of the loop, we say it is a linear induction variable. In Image 18.9a, the induction variable i is not linear: It is incremented by b in some iterations and by 1 in other iterations. Furthermore, in some iterations j D i · 4 and in other iterations the derived induction variable j gets (temporarily) left behind as i is incremented.
s ← 0 

j′ i · 4 

b′ b · 4 

s ← 0 
n′ n · 4 
L_{1} : if s > 0 goto L_{2} 
L_{1} : if s > 0 goto L_{2} 
i ← i + b 
j′ ← j′ + b′ 
j ← i · 4 
j ← j′ 
x ← M[j] 
x ← M[j] 
s ← s − x 
s ← s − x 
goto L_{1} 
goto L_{1} 
L_{2} : i i + 1 
L_{2} : j′ j′ + 4 
s ← s + j 
s ← s + j 
if i < n goto L_{1} 
if j′ < n′ goto L_{1} 
(a) Before 
(b) After 
Basic induction variables The variable i is a basic induction variable in a loop L with header node h if the only definitions of i within L are of the form i ← i + c or i ← i− c, where c is loopinvariant. Derived induction variables. The variable k is a derived induction variable in loop L if:
There is only one definition of k within L, of the form k ← j ·c or k ← j + d, where j is an induction variable and c; d are loopinvariant;
the only definition of j that reaches k is the one in the loop,
Assuming j is characterized by (i, a, b), then k is described by (i a · c, b · c) or (i, a + d, b), depending on whether k's definition was j · c or j + d. Statements of the form k ← j − c can be treated as k ← j + (−c) for purposes of inductionvariable analysis (unless −c is not representable, which can sometimes happen with 2's complement arithmetic). Division Statements of the form k ← j/c can be rewritten as so that k could be considered an induction variable. This fact is useful for floatingpoint calculations  though we must beware of introducing subtle numerical errors if 1/c cannot be represented exactly. If this is an integer division, we cannot represent 1/c at all.
On many machines, multiplication is more expensive than addition. So we would like to take a derived induction variable whose definition is of the form j ← i · c and replace it with an addition. For each derived induction variable j whose triple is (i, a, b), makeanew variable j′ (although different derived induction variables with the same triple can share the same j′ variable). After each assignment i ← i + c, make an assignment j′ ← j′ + c · b, where c · b is a loopinvariant expression that may be computed in the loop preheader. If c and b are both constant, then the multiplication may be done at compile time. Replace the (unique) assigment to j with j ← j′. Finally, it is necessary to initialize j′ at the end of the loop preheader, with j′ ← a + i · b. We say two induction variables x, y in the family of i are coordinated if (x − a_{x})/=b_{x} = (y − a^{y})/b^{y} at every time during the execution of the loop, except during a sequence of statements z_{i} ← z_{i} + c_{i}, where c_{i} is loopinvariant. Clearly, all the new variables in the family of i introduced by strength reduction are coordinated with each other, and with i. When the definition of an induction variable j ← … is replaced by j j′, we know that j′ is coordinated but j might not be. However, the standard copy propagation algorithm can help here, replacing uses of j by uses of j′ where there is no intervening definition of j′. Thus, instead of using flow analysis to learn whether j is coordinated, we just use j′ instead, where copy propagation says it is legal to do so. After strength reduction there is still a multiplication, but it is outside the loop. If the loop executes more than one iteration, then the program should run faster with additions instead of multiplication, on many machines. The results of strength reduction may be disappointing on processors that can schedule multiplications to hide their latency. Example Let us perform strength reduction on Program 18.8a. We find that j is a derived induction variable with triple (i, 0, 4), and k has triple (i, a, 4). After strength reduction on both j and k, wehave
s ← 0 i ← 0 j′ ← 0 k′ ← a L_{1} : if i ≥ n goto L^{2} j ← j′ k ← k′ x ← M[k] s ← s + x i ← i + 1 j′ ← j′ + 4 k′ ← k′ + 4 goto L_{1} L_{2}
We can perform deadcode elimination to remove the statement j ← j′. We would also like to remove all the definitions of the useless variable j′, but technically it is not dead, since it is used in every iteration of the loop.
After strength reduction, some of the induction variables are not used at all in the loop, and others are used only in comparisons with loopinvariant variables. These induction variables can be deleted. A variable is useless in a loop L if it is dead at all exits from L, and its only use is in a definition of itself. All definitions of a useless variable may be deleted.
In our example, after the removal of j, the variable j′ is useless. We can delete j′ ← j′ + 4. This leaves a definition of j′ in the preheader that can now be removed by deadcode elimination.
A variable k is almost useless if it is used only in comparisons against loopinvariant values and in definitions of itself, and there is some other induction variable in the same family that is not useless. An almostuseless variable may be made useless by modifying the comparison to use the related induction variable. If we have k < n, where j and k are coordinated induction variables in the family of i, and n is loopinvariant, then we know that (j − a_{j})/b_{j} =(k − a_{k})/b^{k}, so therefore the comparison k < n can be written as
Now, we can subtract a_{k} from both sides and multiply both sides by b_{j} /b_{k}. If b_{j} /b_{k} is positive, the resulting comparison is
but if b_{j} /b_{k} is negative, then the comparison becomes
instead. Finally, we add a_{j} to both sides (here we show the positive case):
The entire righthand side of this comparison is loopinvariant, so it can be computed just once in the loop preheader. Restrictions:
If b_{j} (n − a_{k}) is not evenly divisible by b^{k}, then this transformation cannot be used, because we cannot hold a fractional value in an integer variable.
Example In our example, the comparison i < n can be replaced by k′ < a + 4 · n. Of course, a + 4 · n is loopinvariant and should be hoisted. Then i will be useless and may be deleted. The transformed program is
s ← 0 k′ ← a b ← n · 4 c ← a + b L_{1} : if k′ < c goto L^{2} k ← k′ x ← M[k] s ← s + x k′ ← k′ + 4 goto L_{1} L_{2}
Finally, copy propagation can eliminate k ← k′, and we obtain Program 18.8b.
Safe coding languages automatically insert arraybounds checks on every subscript operation (see the sermon on page 147). Of course, in wellwritten programs all of these checks are redundant, since wellwritten programs don't access arrays out of bounds. We would like safe languages to achieve the fast performance of unsafe languages. Instead of turning off the bounds checks (which would not be safe) we ask the compiler to remove any checks that it can prove are redundant. We cannot hope to remove all the redundant bounds checks, because this problem is not computable (it is as hard as the halting problem). But many array subscripts are of the form a[i], where i is an induction variable. These the compiler can often understand well enough to optimize. The bounds for an array are generally of the form 0 ≤ i ∧ i < N. When N is nonnegative, as it always is for array sizes, this can be implemented as i ≤_{u} N, where ≤_{u} is the unsigned comparison operator. Conditions for eliminating arraybounds checking. Although it seems natural and intuitive that an induction variable must stay within a certain range, and we should be able to tell whether that range does not exceed the bounds of the array, the criteria for eliminating a bounds check from a loop L are actually quite complicated:
There is an induction variable j and a loopinvariant u used in a statement s_{1}, taking one of the following forms:
if j < u goto L_{1} else goto L_{2}
where L_{2} is out of the loop.
Often, n will be an array length. In a language with static arrays an array length n is a constant. In many languages with dynamic arrays, array lengths are loopinvariant. In MiniJava, Java, and ML the length of an array cannot be dynamically modified once the array has been allocated. The array length n will typically be calculated by fetching the length field of some array pointer v. For the sake of illustration, assume the length field is at offset 0 in the array object. To avoid the need for complicated alias analysis, the semantic analysis phase of the compiler should mark the expression M[v] as immutable, meaning that no other store instruction can possibly update the contents of the length field of the array v.If v is loopinvariant, then n will also be loopinvariant. Even if n is not an array length but is some other loop invariant, we can still optimize the comparison k <_{u} n. We want to put a test in the loop preheader that expresses the idea that in every iteration, k ≥ 0 ∧ k < n. Let k_{0} be the value of k at the end of the preheader, and let Δk_{1}, Δk_{2},…, Δk_{m} be all the loopinvariant values that are added to k inside the loop. Then we can ensure k ≥ 0 by testing
at the end of the preheader. Let Δk_{1}, Δk_{2},…, Δk_{p} be the set of loopinvariant values that are added to k on any path between s_{1} and s_{2} that does not go through s_{1} (again). Then, to ensure k < n at s_{2}, it is sufficient to ensure that k < n − (Δk_{1} + … + Δk_{p}) at s_{1}. Since we know (k − a^{k})/b_{k} = (j − a_{j}) /b_{j}, this test becomes
This will always be true if
since the test j < u dominates the test k < n. Since everything in this comparison is loopinvariant, we can move it to the preheader as follows. First, ensure that definitions of loopinvariants are hoisted out of the loop. Then, rewrite the loop L as follows: Copy all the statements of L to make a new loop L′ with header L′_{h}. Inside L′, replace the statement if k < n goto L′_{3} else goto L′_{4} by goto L′_{3}. At the end of the preheader of L, put statements equivalent to
The conditional goto tests whether k will always be between 0 and n. Sometimes we will have enough information to evaluate this complicated condition at compile time. This will be true in at least two situations:
all the loopinvariants mentioned in it are constants; or
let var u := length(A) var i := 0 in while i<u do (sum := sum + A[i]; i := i+1) end
The quadruples for length(A) will include u ← M[A], assuming that the length of an array is fetched from offset zero from the array pointer; and the quadruples for A[i] will include n ← M[A], to fetch n for doing the bounds check. Now the expressions defining u and n are common subexpressions, assuming the expression M[A] is marked so that we know that no other STORE instruction is modifying the contents of memory location M[A].
If we can evaluate the big comparison at compile time, then we can unconditionally use loop L or loop L′, and delete the loop that we are not using. Cleaning up After this optimization, the program may have several loose ends. Statements after the label L′_{4} may be unreachable; there may be several useless computations of n and k within L′. The former can be cleaned up by unreachablecode elimination, and the latter by deadcode elimination. Generalizations To be practically useful, the algorithm needs to be generalized in several ways:
The loopexit comparison might take one of the forms
if j ≤ u′ goto L_{1} else goto L_{2}
which compares j ≤ u′ instead of j < u.
where L_{2} is out of the loop and s_{2} dominates all the loop back edges. Then the Δki of interest are the ones between s_{2} and any back edge, and between the loop header and s_{1}.
while i<n1 do (if sum<0 then (i:=i+1; sum:= sum+i; i:=i+1) else i := i+2; sum := sum + a[i])
Here there are three Δi, (of 1, 1, and 2, respectively). Our analysis will assume that any, all, or none of these increments may be applied; but clearly the effect is i ← i + 2 on either path. In such cases, an analysis that hoists (and merges) the increments above the if will be useful.
Useless loop unrolling.
L_{1} : x ← M[i] 

s ← s + x 

i ← i + 4 

if i < n goto L′_{1} else L_{2} 

L_{1} : x ← M[i] 
L′_{1} : x ← M[i] 
s ← s + x 
s ← s + x 
i ← i + 4 
i ← i + 4 
if i < n goto L_{1} else L_{2} 
if i < n goto L′_{1} else L_{2} 
L_{2} 
L_{2} 
(a) Before 
(b) After 
JaVa  Previous Next 