List of Images

Introduction

Image 1.1: Phases of a compiler, and interfaces between them.
Image 1.4: Tree representation of a straight-line program.

Lexical Analysis

Image 2.1: Regular expression notation.
Image 2.2: Regular expressions for some tokens.
Image 2.3: Finite automata for lexical tokens. The states are indicated by circles; final states are indicated by double circles. The start state has an arrow coming in from nowhere. An edge labeled with several characters is shorthand for many parallel edges.
Image 2.4: Combined finite automaton.
Image 2.5: The automaton of Image 2.4 recognizes several tokens. The symbol | indicates the input position at each successive call to the lexical analyzer, the symbol ⊥ indicates the current position of the automaton, and ⊺ indicates the most recent position in which the recognizer was in a final state.
Image 2.6: Translation of regular expressions to NFAs.
Image 2.7: Four regular expressions translated to an NFA.
Image 2.8: NFA converted to DFA.

Parsing

Image 3.3: Parse tree.
Image 3.4: Two parse trees for the same sentence using Grammar 3.1.
Image 3.6: Two parse trees for the sentence 1-2-3 in Grammar 3.5.
Image 3.7: Two parse trees for the sentence 1+2*3 in Grammar 3.5.
Image 3.9: Parse trees that Grammar 3.8 will never produce.
Image 3.14: Predictive parsing table for Grammar 3.12.
Image 3.18: Shift-reduce parse of a sentence. Numeric subscripts in the Stack are DFA state numbers; see Table 3.19.
Image 3.21: LR(0) states for Grammar 3.20.
Image 3.24: LR(0) states and parsing table for Grammar 3.23.
Image 3.25: SLR parsing table for Grammar 3.23.
Image 3.27: LR(1) states for Grammar 3.26.
Image 3.29: A hierarchy of grammar classes.
Image 3.33: SableCC shift-reduce error message for Grammar 3.32.
Image 3.39: Burke-Fisher parsing, with an error-repair queue. Image 3.18 shows the complete parse of this string according to Table 3.19.

Abstract Syntax

Image 4.9: Abstract syntax for the MiniJava language.
Image 4.6: Orthogonal directions of modularity.

Semantic Analysis

Image 5.1: Several active environments at once.
Image 5.3: Hash tables.
Image 5.4: Binary search trees.
Image 5.7: A MiniJava Program and its symbol table

Activation Records

Image 6.2: A stack frame.

Translation to Intermediate Code

Image 7.1: Compilers for five languages and four target machines: (a) without an IR, (b) with an IR.
Image 7.2: Intermediate representation trees.
Image 7.4: Object initialization.

Basic Blocks and Traces

Image 8.1: Identities on trees (see also Exercise 8.1).
Image 8.3: Different trace coverings for the same program.

Instruction Selection

Image 9.1: Arithmetic and memory instructions. The notation M[x] denotes the memory word at address x.
Image 9.2: A tree tiled in two ways.
Image 9.4: The Schizo-Jouette architecture.

Liveness Analysis

Image 10.2: Liveness of variables a, b, c.
Image 10.9: Representations of interference.

Register Allocation

Image 11.3: Simplification stack, and a possible coloring.
Image 11.4: Graph coloring with coalescing.
Image 11.6: A coloring, with coalescing, for Graph 11.1.
Image 11.7: Moving a callee-save register to a fresh temporary.

Garbage Collection

Image 13.1: A heap to be garbage collected. Class descriptors are not shown in the diagram.
Image 13.4: Mark-and-sweep collection.
Image 13.7: Copying collection.
Image 13.10: Breadth-first copying collection.
Image 13.12: Generational collection. The bold arrow is one of the rare pointers from an older generation to a newer one.

Object-Oriented Languages

Image 14.2: Single inheritance of data fields.
Image 14.3: Class descriptors for dynamic method lookup.
Image 14.4: Multiple inheritance of data fields.
Image 14.5: Field offsets in descriptors for multiple inheritance.

Functional Programming Languages

Image 15.2: Closures for execution of twice(add(5)). SL=static link; RV=return value; EP=escaping-variables-pointer or environment-pointer.
Image 15.13: printTable as compiled.

Dataflow Analysis

Image 17.8: An illustration of value numbering. (a) A basic block; (b) the table created by the value-numbering algorithm, with hidden bindings shown crossed out; (c) a view of the table as a DAG.

Loop Optimizations

Image 18.1: Some loops; in each case, 1 is the header node.
Image 18.2: None of these contains a loop. Dotted lines indicate reduction of graph (c) by deleting edges and collapsing nodes.
Image 18.3: (a) A flow graph; (b) its dominator tree.
Image 18.4: The loop-nest tree for Image 18.3a. Each loop header is shown in the top half of each oval (nodes 1, 2, 5, 8); a loop comprises a header node (e.g., node 5), all the other nodes shown in the same oval (e.g., node 10), and all the nodes shown in subtrees of the loop-nest-tree node (e.g., 8, 9).
Image 18.5: (a) A loop; (b) the same loop with a preheader.
Image 18.6: Some good and bad candidates for hoisting t ← a ⊕ b.
Image 18.7: A while loop (a), transformed into a repeat loop (b).
Image 18.9: The basic induction variable i is incremented by different amounts in different iterations; the derived induction variable j is not changed in every iteration.

Static Single-Assignment Form

Image 19.1: (a) A straight-line program. (b) The program in singleassignment form.
Image 19.2: (a) A program with a control-Flow join; (b) the program trans formed to single assignment form; (c) edge-split SSA form.
Image 19.3: (a) A program with a loop; (b) the program transformed to edgesplit single-assignment form. a₀, b₀, c₀ are initial values of the variables before block 1.
Image 19.4: Conversion of a program to static single-assignment form. Node 7 is a postbody node, inserted to make sure there is only one loop edge (see Exercise 18.6); such nodes are not strictly necessary but are sometimes helpful.
Image 19.5: Node 5 dominates all the nodes in the grey area. (a) Dominance frontier of node 5 includes the nodes (4, 5, 12, 13) that are targets of edges crossing from the region dominated by 5 (grey area including node 5) to the region not strictly dominated by 5 (white area including node 5). (b) Any node in the dominance frontier of n is also a point of convergence of nonintersecting paths, one from n and one from the root node. (c) Another example of converging paths P_1,5 and P_5,5.
Image 19.8: A control-Flow graph and trees derived from it. The numeric labels in part (b) are the dfnums of the nodes.
Image 19.11: Path compression. (a) Ancestor links in a spanning tree; AncestorWithLowestSemi(v) traverses three links. (b) New nodes a₂, a₃ are linked into the tree. Now AncestorWithLowestSemi(w) would traverse 6 links. (c) AncestorWithLowestSemi(v) with path compression redirects ancestor links, but best[v] remembers the best intervening node on the compressed path between v and a₁. (d) Now, after a₂ and a₃ are linked, AncestorWithLowestSemi(w) traverses only 4 links.
Image 19.13: Conditional constant propagation.
Image 19.14: This transformation does not preserve the dominance property of SSA form, and should be avoided.
Image 19.15: Construction of the control-dependence graph.
Image 19.16: Aggressive dead-code elimination
Image 19.18: Functional intermediate representation. Binding occurrences of variables are underlined.

Pipelining and Scheduling

Image 20.1: Functional unit requirements of instructions (on the MIPS R4000 processor). This machine's Floating-point ADD instruction uses the instruction-fetch unit for one cycle; reads registers for one cycle; unpacks exponent and mantissa; then for the next cycle uses a shifter and an adder; then uses both the adder and a rounding unit; then the rounding unit and a shifter; then writes a result back to the register file. The MULT and CONV instructions use functional units in a different order.
Image 20.2: If there is only one functional unit of each kind, then an ADD cannot be started at the same time as a MULT (because of numerous resource hazards shown in boldface); nor three cycles after the MULT (because of Add, Round, and Write hazards); nor four cycles later (because of Add and Round hazards). But if there were two adders and two rounding units, then an ADD could be started four cycles after a MULT. Or with dual fetch units, multiple-access register file, and dual unpackers, the MULT and ADD could be started simultaneously.
Image 20.3: Data dependence. (Above) If the MULT produces a result that is an operand to ADD, the MULT must write its result to the register file before the ADD can read it. (Below) Special bypassing circuitry can route the result of MULT directly to the Shift and Add units, skipping the Write, Read, and Unpack stages.
Image 20.7: Pipelined schedule. Assignments in each row happen simultaneously; each right-hand side refers to the value before the assignment. The loop exit test i < N + 1 has been "moved past" three increments of i, so appears as i < N − 2.
Image 20.8: Pipelined schedule, with move instructions.
Image 20.10: Iterative modulo scheduling applied to Program 20.4b. Graph 20.5a is the data-dependence graph; Δ_min = 5 (see page 451); H = [c, d, e, a, b, f, j, g, h].
Image 20.11: Dependence of ADD's instruction-fetch on result of BRANCH.

The Memory Hierarchy

Image 21.1: Organization of a direct-mapped cache. Key field of the address is used to index the tags array and the data blocks; if tags[key] matches the tag field of the address then the data is valid (cache hit). Word index is used to select a word from the cache block.
Image 21.2: The memory hierarchy.
Image 21.3: Alignment of data objects (or basic blocks) to avoid crossing cache-block boundaries is often worthwhile, even at the cost of empty space between objects.
Image 21.4: If x is rarely true, basic-block placement (a) will occupy three in-cache blocks, while (b) will usually occupy only two.
Image 21.5: Execution of a dot-product loop, with 4-word cache blocks.
(a) Without prefetching, on a machine with dynamic instruction reordering, the number of outstanding instructions (reserved registers) grows proportionally to the cache-miss latency.
(b) With prefetching, the hardware reservation table never grows large. (Steady-state behavior is shown here, not the initial transient.)
Image 21.7: Matrix multiplication. Each element of C is computed from a row of A and a column of B. With blocking, a c × c block of the C matrix is computed from a c × N block of A and a N × c block of B.