Previous    Next

## LOOP-INVARIANT COMPUTATIONS

If a loop contains a statement t←b such that a has the same value each time around the loop, and b has the same value each time, then t will also have the same value each time. We would like to hoist the computation out of the loop, so it is computed just once instead of every time. We cannot always tell if a will have the same value every time, so as usual we will conservatively approximate. The definition d: t ← a1a2 is loopinvariant within loop L if, for each operand ai ,

1. ai is a constant,

2. or all the definitions of ai that reach d are outside the loop,
3. or only one definition of ai reaches d, and that definition is loop-invariant.

This leads naturally to an iterative algorithm for finding loop-invariant definitions: First find all the definitions whose operands are constant or from outside the loop, then repeatedly find definitions whose operands are loopinvariant.

### HOISTING

Suppose t ← ab is loop-invariant. Can we hoist it out of the loop? In Image 18.6a, hoisting makes the program compute the same result faster. But in Image 18.6b, hoisting makes the program faster but incorrect - the original program does not always execute t ← ab, but the transformed program does, producing an incorrect value for x if iN initially. Hoisting in Image 18.6c is also incorrect, because the original loop had more than one definition of t, and the transformed program interleaves the assignments to t in a different way. And hoisting in Image 18.6d is wrong because there is auseof t before the loop-invariant definition, so after hoisting, this use will have the wrong value on the first iteration of the loop.

Image 18.6: Some good and bad candidates for hoisting tab.

With these pitfalls in mind, we can set the criteria for hoisting d : tab to the end of the loop preheader:

1. d dominates all loop exits at which t is live-out,

2. and there is only one definition of t in the loop,
3. 3.and t is not live-out of the loop preheader.

Implicit side effects These rules need modification if tab could raise some sort of arithmetic exception or have other side effects; see Exercise 18.7. Turning while loops into repeat-until loops Condition (1) tends to prevent many computations from being hoisted from while loops; from Image 18.7a it is clear that none of the statements in the loop body dominates the loop exit node (which is the same as the header node). To solve this problem, we can transform the while loop into a repeat loop preceded by an if statement. This requires duplication of the statements in the header node, as shown in Image 18.7b. Statements in the body of a repeat loop dominate the loop exit (unless they are in an inner if, orifthereisa break statement), so condition (1) will be satisfied.

Image 18.7: A while loop (a), transformed into a repeat loop (b).

### INDUCTION VARIABLES

Some loops have a variable i that is incremented or decremented, and a variable j that is set (in the loop) to i ·c+d, where c and d are loop-invariant. Then we can calculate j's value without reference to i; whenever i is incremented by a we can increment j by c · a. Consider, for example, Program 18.8a, which sums the elements of an array. Using induction-variable analysis to find that i and j are related induction variables, strength reduction to replace a multiplication by 4 with an addition, then induction-variable elimination to replace in by k ≥ 4n + a, followed by miscellaneous copy propagation, we get Program 18.8b. The transformed loop has fewer quadruples; it might even run faster. Let us now take the series of transformations one step at a time.

A loop before and after induction-variable optimizations.
 s ← 0 s ← 0 i ← 0 k′ ← a L1 : if i≥n goto L2 b n · 4 j ← i · 4 c ← a + b k j + a L1 : if k′ ≥ c goto L2 x ← M[k] x ← M[k′] ss + x ss + x i ← i + 1 k′ ← k′ + 4 goto L1 goto L1 L2 L2 (a) Before (b) After

We say that a variable such as i is a basic induction variable, and j and k are derived induction variables in the family of i. Right after j is defined (in the original loop), we have j = aj + i · bj, where aj = 0 and bj = 4. We can completely characterize the value of j at its definition by (i, a, b), where i is a basic induction variable and a and b are loop-invariant expressions. If there is another derived induction variable with definition kj + ck(where ck is loop-invariant), then k is also in the family of i. We can characterize k by the triple (i, aj + ck, bj ), that is, k = aj + ck + i · bj . We can characterize the basic induction variable i by a triple in the same way, that is (i, 0, 1), meaning that i = 0 + i · 1. Thus every induction variable can be characterized by such a triple. If an induction variable changes by the same (constant or loop-invariant) amount in every iteration of the loop, we say it is a linear induction variable. In Image 18.9a, the induction variable i is not linear: It is incremented by b in some iterations and by 1 in other iterations. Furthermore, in some iterations j D i · 4 and in other iterations the derived induction variable j gets (temporarily) left behind as i is incremented.

 s ← 0 j′ i · 4 b′ b · 4 s ← 0 n′ n · 4 L1 : if s > 0 goto L2 L1 : if s > 0 goto L2 i ← i + b j′ ← j′ + b′ j ← i · 4 j ← j′ x ← M[j] x ← M[j] s ← s − x s ← s − x goto L1 goto L1 L2 : i i + 1 L2 : j′ j′ + 4 s ← s + j s ← s + j if i < n goto L1 if j′ < n′ goto L1 (a) Before (b) After

Image 18.9: The basic induction variable i is incremented by different amounts in different iterations; the derived induction variable j is not changed in every iteration.

#### DETECTION OF INDUCTION VARIABLES

Basic induction variables The variable i is a basic induction variable in a loop L with header node h if the only definitions of i within L are of the form ii + c or iic, where c is loop-invariant. Derived induction variables. The variable k is a derived induction variable in loop L if:

1. There is only one definition of k within L, of the form kj ·c or kj + d, where j is an induction variable and c; d are loop-invariant;

2. and if j is a derived induction variable in the family of i, then:
1. the only definition of j that reaches k is the one in the loop,

2. and there is no definition of i on any path between the definition of j and the definition of k.

Assuming j is characterized by (i, a, b), then k is described by (i a · c, b · c) or (i, a + d, b), depending on whether k's definition was j · c or j + d. Statements of the form kjc can be treated as kj + (−c) for purposes of induction-variable analysis (unless −c is not representable, which can sometimes happen with 2's complement arithmetic). Division Statements of the form kj/c can be rewritten as so that k could be considered an induction variable. This fact is useful for floating-point calculations - though we must beware of introducing subtle numerical errors if 1/c cannot be represented exactly. If this is an integer division, we cannot represent 1/c at all.

#### STRENGTH REDUCTION

On many machines, multiplication is more expensive than addition. So we would like to take a derived induction variable whose definition is of the form ji · c and replace it with an addition. For each derived induction variable j whose triple is (i, a, b), makeanew variable j′ (although different derived induction variables with the same triple can share the same j′ variable). After each assignment ii + c, make an assignment j′ ← j′ + c · b, where c · b is a loop-invariant expression that may be computed in the loop preheader. If c and b are both constant, then the multiplication may be done at compile time. Replace the (unique) assigment to j with jj′. Finally, it is necessary to initialize j′ at the end of the loop preheader, with j′ ← a + i · b. We say two induction variables x, y in the family of i are coordinated if (xax)/=bx = (yay)/by at every time during the execution of the loop, except during a sequence of statements zizi + ci, where ci is loop-invariant. Clearly, all the new variables in the family of i introduced by strength reduction are coordinated with each other, and with i. When the definition of an induction variable j ← … is replaced by j j′, we know that j′ is coordinated but j might not be. However, the standard copy propagation algorithm can help here, replacing uses of j by uses of j′ where there is no intervening definition of j′. Thus, instead of using flow analysis to learn whether j is coordinated, we just use j′ instead, where copy propagation says it is legal to do so. After strength reduction there is still a multiplication, but it is outside the loop. If the loop executes more than one iteration, then the program should run faster with additions instead of multiplication, on many machines. The results of strength reduction may be disappointing on processors that can schedule multiplications to hide their latency. Example Let us perform strength reduction on Program 18.8a. We find that j is a derived induction variable with triple (i, 0, 4), and k has triple (i, a, 4). After strength reduction on both j and k, wehave

``` s ← 0
i ← 0
j′ ← 0
k′ ← a
L1 : if i ≥ n goto L2
j ← j′
k ← k′
x ← M[k]
s ← s + x
i ← i + 1
j′ ← j′ + 4
k′ ← k′ + 4
goto L1
L2
```

We can perform dead-code elimination to remove the statement jj′. We would also like to remove all the definitions of the useless variable j′, but technically it is not dead, since it is used in every iteration of the loop.

#### ELIMINATION

After strength reduction, some of the induction variables are not used at all in the loop, and others are used only in comparisons with loop-invariant variables. These induction variables can be deleted. A variable is useless in a loop L if it is dead at all exits from L, and its only use is in a definition of itself. All definitions of a useless variable may be deleted.

In our example, after the removal of j, the variable j′ is useless. We can delete j′ ← j′ + 4. This leaves a definition of j′ in the preheader that can now be removed by dead-code elimination.

#### REWRITING COMPARISONS

A variable k is almost useless if it is used only in comparisons against loopinvariant values and in definitions of itself, and there is some other induction variable in the same family that is not useless. An almost-useless variable may be made useless by modifying the comparison to use the related induction variable. If we have k < n, where j and k are coordinated induction variables in the family of i, and n is loop-invariant, then we know that (jaj)/bj =(kak)/bk, so therefore the comparison k < n can be written as

Now, we can subtract ak from both sides and multiply both sides by bj /bk. If bj /bk is positive, the resulting comparison is

but if bj /bk is negative, then the comparison becomes

instead. Finally, we add aj to both sides (here we show the positive case):

The entire right-hand side of this comparison is loop-invariant, so it can be computed just once in the loop preheader. Restrictions:

1. If bj (nak) is not evenly divisible by bk, then this transformation cannot be used, because we cannot hold a fractional value in an integer variable.

2. If bj or bk is not constant, but is a loop-invariant value whose sign is not known, then the transformation cannot be used since we won't know which comparison (less-than or greater-than) to use.

Example In our example, the comparison i < n can be replaced by k′ < a + 4 · n. Of course, a + 4 · n is loop-invariant and should be hoisted. Then i will be useless and may be deleted. The transformed program is

``` s ← 0
k′ ← a
b ← n · 4
c ← a + b
L1 : if k′ < c goto L2
k ← k′
x ← M[k]
s ← s + x
k′ ← k′ + 4
goto L1
L2
```

Finally, copy propagation can eliminate kk′, and we obtain Program 18.8b.

### ARRAY-BOUNDS CHECKS

Safe coding languages automatically insert array-bounds checks on every subscript operation (see the sermon on page 147). Of course, in well-written programs all of these checks are redundant, since well-written programs don't access arrays out of bounds. We would like safe languages to achieve the fast performance of unsafe languages. Instead of turning off the bounds checks (which would not be safe) we ask the compiler to remove any checks that it can prove are redundant. We cannot hope to remove all the redundant bounds checks, because this problem is not computable (it is as hard as the halting problem). But many array subscripts are of the form a[i], where i is an induction variable. These the compiler can often understand well enough to optimize. The bounds for an array are generally of the form 0 ≤ ii < N. When N is nonnegative, as it always is for array sizes, this can be implemented as iu N, where ≤u is the unsigned comparison operator. Conditions for eliminating array-bounds checking. Although it seems natural and intuitive that an induction variable must stay within a certain range, and we should be able to tell whether that range does not exceed the bounds of the array, the criteria for eliminating a bounds check from a loop L are actually quite complicated:

1. There is an induction variable j and a loop-invariant u used in a statement s1, taking one of the following forms:

• if j < u goto L1 else goto L2

• if ju goto L2 else goto L1
• if u > j goto L1 else goto L2
• if uj goto L2 else goto L1

where L2 is out of the loop.

2. There is a statement s2 of the form, if k <u n goto L3 else goto L4, where k is an induction variable coordinated with j, n is loop-invariant, and s1 dominates s2.
3. There is no loop nested within L containing a definition of k.
4. k increases when j does, that is, b j /bk > 0.

Often, n will be an array length. In a language with static arrays an array length n is a constant. In many languages with dynamic arrays, array lengths are loop-invariant. In MiniJava, Java, and ML the length of an array cannot be dynamically modified once the array has been allocated. The array length n will typically be calculated by fetching the length field of some array pointer v. For the sake of illustration, assume the length field is at offset 0 in the array object. To avoid the need for complicated alias analysis, the semantic analysis phase of the compiler should mark the expression M[v] as immutable, meaning that no other store instruction can possibly update the contents of the length field of the array v.If v is loop-invariant, then n will also be loop-invariant. Even if n is not an array length but is some other loop invariant, we can still optimize the comparison k <u n. We want to put a test in the loop preheader that expresses the idea that in every iteration, k ≥ 0 ∧ k < n. Let k0 be the value of k at the end of the preheader, and let Δk1, Δk2,…, Δkm be all the loop-invariant values that are added to k inside the loop. Then we can ensure k ≥ 0 by testing

at the end of the preheader. Let Δk1, Δk2,…, Δkp be the set of loop-invariant values that are added to k on any path between s1 and s2 that does not go through s1 (again). Then, to ensure k < n at s2, it is sufficient to ensure that k < n − (Δk1 + … + Δkp) at s1. Since we know (kak)/bk = (jaj) /bj, this test becomes

This will always be true if

since the test j < u dominates the test k < n. Since everything in this comparison is loop-invariant, we can move it to the preheader as follows. First, ensure that definitions of loop-invariants are hoisted out of the loop. Then, rewrite the loop L as follows: Copy all the statements of L to make a new loop L′ with header Lh. Inside L′, replace the statement if k < n goto L3 else goto L4 by goto L3. At the end of the preheader of L, put statements equivalent to

The conditional goto tests whether k will always be between 0 and n. Sometimes we will have enough information to evaluate this complicated condition at compile time. This will be true in at least two situations:

1. all the loop-invariants mentioned in it are constants; or

2. n and u are the same temporary variable, ak = aj, bk = bj, and there are no Δk's added to k between s1 and s2. In a language like MiniJava or Java or ML, this could happen if the programmer writes,
```let var u := length(A)
var i := 0
in while i<u
do (sum := sum + A[i];
i := i+1)
end
```

The quadruples for length(A) will include uM[A], assuming that the length of an array is fetched from offset zero from the array pointer; and the quadruples for A[i] will include nM[A], to fetch n for doing the bounds check. Now the expressions defining u and n are common subexpressions, assuming the expression M[A] is marked so that we know that no other STORE instruction is modifying the contents of memory location M[A].

If we can evaluate the big comparison at compile time, then we can unconditionally use loop L or loop L′, and delete the loop that we are not using. Cleaning up After this optimization, the program may have several loose ends. Statements after the label L4 may be unreachable; there may be several useless computations of n and k within L′. The former can be cleaned up by unreachable-code elimination, and the latter by dead-code elimination. Generalizations To be practically useful, the algorithm needs to be generalized in several ways:

1. The loop-exit comparison might take one of the forms

• if ju′ goto L1 else goto L2

• if j > u′ goto L2 else goto L1
• if u′ ≥ j goto L1 else goto L2
• if u′ < j goto L2 else goto L1

which compares ju′ instead of j < u.

2. The loop-exit test might occur at the bottom of the loop body, instead of before the array-bounds test. We can describe this situation as follows: There is a test s2 : if j < u goto L1 else goto L2

where L2 is out of the loop and s2 dominates all the loop back edges. Then the Δki of interest are the ones between s2 and any back edge, and between the loop header and s1.

3. We should handle the case where bj/bk < 0.
4. We should handle the case where j counts downward instead of up, and the loop-exit test is something like jl, for l a loop-invariant lower bound.
5. The induction-variable increments might be "undisciplined"; for example,
```while i<n-1
do (if sum<0
then (i:=i+1; sum:= sum+i; i:=i+1)
else i := i+2;
sum := sum + a[i])
```

Here there are three Δi, (of 1, 1, and 2, respectively). Our analysis will assume that any, all, or none of these increments may be applied; but clearly the effect is ii + 2 on either path. In such cases, an analysis that hoists (and merges) the increments above the if will be useful.

Useless loop unrolling.
 L1 : x ← M[i] s ← s + x i ← i + 4 if i < n goto L′1 else L2 L1 : x ← M[i] L′1 : x ← M[i] s ← s + x s ← s + x i ← i + 4 i ← i + 4 if i < n goto L1 else L2 if i < n goto L′1 else L2 L2 L2 (a) Before (b) After

 JaVa Previous    Next