Previous    Next

## EXERCISES

• 20.1 Schedule the following loop using the Aiken-Nicolau algorithm:

for i ← 1 to N
aX[i - 2]
bY[i - 1]
ca × b
dU[i]
eX[i - 1]
fd + e
gd × c
h : X[i] ← g
j : Y[i] ← f

1. Label all the scalar variables with subscripts i and i − 1. Hint: In this loop there are no loop-carried scalar-variable dependences, so none of the subscripts will be i − 1.

2. Perform scalar replacement on uses of X[] and Y []. Hint: Now you will have subscripts of i − 1 and i − 2.
3. Perform copy propagation to eliminate variables a, b e.
4. Draw a data-dependence graph of statements c, d, f, g, h, j; label intraiteration edges with 0 and loop-carried edges with 1 or 2, depending on the number of iterations difference there is in the subscript.
5. Show the Aiken-Nicolau tableau (as in Table 20.6a).
6. Find the identical groups separated by increasing gaps. Hint: The identical groups will be c cycles apart, where in this case c is greater than one!
7. Show the steepest-slope group. Hint: The slope is not an integer.
8. Unroll the loop k times, where k is the denominator of the slope.
9. Draw the data-dependence graph of the unrolled loop.
10. Draw the tableau for the schedule of the unrolled loop.
11. Find the slope of the steepest-slope group. Hint: Now it should be an integer.
12. Move the shallow-slope group(s) down to close the gap.
13. Identify the loop body, the prologue, and the epilogue.
14. Write a schedule showing placement of the prologue, loop body, and epilogue in specific cycles, like Image 20.7.
15. Eliminate the subscripts on variables in the loop body, inserting move instructions where necessary, as in Image 20.8.
• 20.2 Do parts a−d of Exercise 20.1. Then use iterated modulo scheduling to schedule the loop for a machine that can issue three instructions at a time, of which at most one can be a memory instruction and at most one can be a multiply instruction. Every instruction completes in one cycle.
1. Explicitly represent the increment instruction ii+1ii + 1 and the loop branch k : if ii+1N goto loop in the data-dependence graph, with an edge from i to itself (labeled by 1), from i to k (labeled by 0), and from k to every node in the loop body (labeled by 1).

2. Calculate Δmin based on data-dependence cycles, the 2-instruction per cycle limit, the 1-load/store-per-cycle limit, and the 1-multiply-per-cycle limit. Remark: The Δ required for a data-dependence cycle is the length of the cycle divided by the sum of the edge labels (where edge labels show iteration distance, as described in Exercise 20.1d).
3. Run Algorithm 20.9, showing the SchedTime and Resource tables each time a variable has to be removed from the schedule, as in Image 20.10. Use the priority order H = [i, k, c, d, g, f, h, j].
4. Eliminate the subscripts on variables in the loop body, inserting move instructions where necessary, as in Image 20.8. If the move instructions don't fit into the 3-instruction-issue limit, then it's time to increase Δ and try again.
• 20.3 Consider the following program:
L : L :
a : aU[i] a : aU[i]
b : ba × a d : dd × a
c : V [i] ← b b : ba × a
i : ii + 1 c : V [i] ← b
d : dd × a i : ii + 1
e : if d ≤ 1.0 goto L e : if d ≤ 1:0 goto L
(I) UnscheduLed (II) ScheduLed

Suppose these loops are to be run on an out-of-order execution machine with these characteristics: Each instruction takes exactly one cycle, and may be executed as soon as its operands are ready and all preceding conditional branches have been executed. Several instructions may be executed at once, except that there is only one multiply unit. If two multiply instructions are ready, the instruction from an earlier iteration, or occurring Flrst in the same iteration, is executed. The program was originally written as shown in loop (I); the compiler has rescheduled it as loop (II). For each of the two loops:

1. Draw the data-dependence graph, showing loop-carried dependences with a dashed line.

2. Add the control dependence as a loop-carried edge from e to each of the other nodes.
3. To simulate how the machine will execute the loop, show the Aiken-Nicolau tableau, with the restriction that b and d must never be put in the same cycle. In a cycle where b and d's predecessors are both ready, prefer the instruction from the earlier iteration, or from earlier in the same iteration.
4. Compute the steepest slope in the tableau; how many cycles per iteration does the loop take?
5. Can compiler scheduling be useful for dynamically rescheduling (out-of-order execution) machines?
• 20.4 On many machines, instructions after a conditional branch can be executed even before the branch condition is known (the instructions do not commit until after the branch condition is verified). Suppose we have an out-of-order execution machine with these characteristics: An add or branch takes one cycle; a multiply takes 4 cycles; each instruction may be executed as soon as its operands are ready. Several instructions may be executed at once, except that there is only one multiply unit. If two multiply instructions are ready, the instruction from an earlier iteration, or occurring Flrst in the same iteration, is executed. For a machine with this behavior, do parts a-e of Exercise 20.3 for the following programs:
L : L :
a : ae × u b : be × v
b : be × v a : ae × u
c : ca + w c : ca + w
d : d ← c + x d : dc + x
e : ed + y e : ed + y
f : if e > 0.0 goto L f : if e > 0.0 goto L
(I) Unscheduled (II) Scheduled

• 20.5 Write a short program that contains an instance of each of the branch-prediction heuristics (pointer, call, return, loop header, loop preheader, guard)described on pages 457-458. Label each instance.
• 20.6 Use branch-prediction heuristics to predict the direction of each of the conditional branches in the programs of Exercise 8.6 (page 175) and Image 18.7b (page 386); explain which heuristic applies to each prediction.

 JaVa Previous    Next