ALGORITHMS FOR INSTRUCTION SELECTION

There are good algorithms for finding optimum and optimal tilings, but the algorithms for optimal tilings are simpler, as you might expect. Complex instruction set computers (CISC) have instructions that accomplish several operations each. The tiles for these instructions are quite large, and the difference between optimum and optimal tilings - while never very large - is at least sometimes noticeable. Most architectures of modern design are reduced instruction set computers (RISC). Each RISC instruction accomplishes just a small number of operations (all the Jouette instructions except MOVEM are typical RISC instructions). Since the tiles are small and of uniform cost, there is usually no difference at all between optimum and optimal tilings. Thus, the simpler tiling algorithms suffice.

MAXIMAL MUNCH

The algorithm for optimal tiling is called maximal munch. It is quite simple. Starting at the root of the tree, find the largest tile that fits. Cover the root node - and perhaps several other nodes near the root - with this tile, leaving several subtrees. Now repeat the same algorithm for each subtree. As each tile is placed, the instruction corresponding to that tile is generated. The maximal munch algorithm generates the instructions in reverse order - after all, the instruction at the root is the first to be generated, but it can only execute after the other instructions have produced operand values in registers. The "largest tile" is the one with the most nodes. For example, the tile for ADD has one node, the tile for SUBI has two nodes, and the tiles for STORE and MOVEM have three nodes each. If two tiles of equal size match at the root, then the choice between them is arbitrary. Thus, in the tree of Image 9.2, STORE and MOVEM both match, and either can be chosen. Maximal munch is quite straightforward to implement in Java. Simply write two recursive functions, munchStm for statements and munchExp for expressions. Each clause of munchExp will match one tile. The clauses are ordered in order of tile preference (biggest tiles first). Program 9.3 is a partial example of a Jouette code generator based on the maximal munch algorithm. Executing this program on the tree of Image 9.2 will match the first clause of munchStm; this will call munchExp to emit all the instructions for the operands of the STORE, followed by the STORE itself. Program 9.3 does not show how the registers are chosen and operand syntax is specified for the instructions; we are concerned here only with the pattern-matching of tiles.Maximal Munch in Java.

void munchMove(MEM dst, Exp src) {
 // MOVE(MEM(BINOP(PLUS, e₁, CONST(i))), e₂)
 if (dst.exp instanceof BINOP && ((BINOP)dst.exp).oper==BINOP.PLUS
 && ((BINOP)dst.exp).right instanceof CONST)
 {munchExp(((BINOP)dst.exp).left); munchExp(src); emit("STORE");}
 // MOVE(MEM(BINOP(PLUS, CONST(i), e₁)), e₂)
 else if (dst.exp instanceof BINOP && ((BINOP)dst.exp).oper==BINOP.PLUS
 && ((BINOP)dst.exp).left instanceof CONST)
 {munchExp(((BINOP)dst.exp).right); munchExp(src); emit("STORE");}
 // MOVE(MEM(e₁), MEM(e₂))
 else if (src instanceof MEM)
 {munchExp(dst.exp); munchExp(((MEM)src).exp); emit("MOVEM");}
 // MOVE(MEM(e₁, e₂)
 else
 {munchExp(dst.exp); munchExp(src); emit("STORE");}
}
void munchMove(TEMP dst, Exp src) {
 // MOVE(TEMP(t₁), e)
 munchExp(src); emit("ADD");
}
void munchMove(Exp dst, Exp src) {
 // MOVE(d, e)
 if (dst instanceof MEM) munchMove((MEM)dst,src);
 else if (dst instanceof TEMP) munchMove((TEMP)dst,src);
}
void munchStm(Stm s) {
 if (s instanceof MOVE) munchMove(((MOVE)s).dst, ((MOVE)s).src);
 ⋮ // CALL, JUMP, CJUMP unimplemented here
}
void munchExp(Exp)
MEM(BINOP(PLUS, e₁, CONST(i))) ⇒ munchExp(e₁); emit("LOAD");
MEM(BINOP(PLUS, CONST(i), e₁)) ⇒ munchExp(e₁); emit("LOAD");
MEM(CONST(i)) ⇒ emit("LOAD");
MEM(e₁) ⇒ munchExp(e₁); emit("LOAD");
BINOP(PLUS, e₁, CONST(i)) ⇒ munchExp(e₁); emit("ADDI");
BINOP(PLUS, CONST(i, e₁) ⇒ munchExp(e₁); emit("ADDI");
CONST(i) ⇒ munchExp(e₁); emit("ADDI");
BINOP(PLUS, e₁, CONST(i)) ⇒ munchExp(e₁); emit("ADD");
TEMP(t) ⇒ {}

If, for each node-type in the Tree language, there exists a single-node tile pattern, then maximal munch cannot get "stuck" with no tile to match some subtree.

DYNAMIC PROGRAMMING

Maximal munch always finds an optimal tiling, but not necessarily an optimum. A dynamic-programming algorithm can find the optimum. In general, dynamic coding is a technique for finding optimum solutions for a whole problem based on the optimum solution of each subproblem; here the subproblems are the tilings of the subtrees. The dynamic-programming algorithm assigns a cost to every node in the tree. The cost is the sum of the instruction costs of the best instruction sequence that can tile the subtree rooted at that node. This algorithm works bottom-up, in contrast to maximal munch, which works top-down. First, the costs of all the children (and grandchildren, etc.) of node n are found recursively. Then, each tree pattern (tile kind) is matched against node n. Each tile has zero or more leaves. In Image 9.1 the leaves are represented as edges whose bottom ends exit the tile. The leaves of a tile are places where subtrees can be attached. For each tile t of cost c that matches at node n, there will be zero or more subtrees s_i corresponding to the leaves of the tile. The cost c_i of each subtree has already been computed (because the algorithm works bottom-up). So the cost of matching tile t is just c + ∑c_i. Of all the tiles t_j that match at node n, the one with the minimum-cost match is chosen, and the (minimum) cost of node n is thus computed. For example, consider this tree: Java ScreenShot

The only tile that matches CONST 1 is an ADDI instruction with cost 1. Similarly, CONST 2 has cost 1. Several tiles match the + node: Java ScreenShot

The ADD tile has two leaves, but the ADDI tile has only one leaf. In matching the first ADDI pattern, we are saying "though we computed the cost of tiling CONST 2, we are not going to use that information." If we choose to use the first ADDI pattern, then CONST 2 will not be the root of any tile, and its cost will be ignored. In this case, either of the two ADDI tiles leads to the minimum cost for the + node, and the choice is arbitrary. The + node gets a cost of 2. Now, several tiles match the MEM node: Java ScreenShot

Either of the last two matches will be optimum. Once the cost of the root node (and thus the entire tree) is found, the instruction emission phase begins. The algorithm is as follows:

Emission(node n): for each leaf l_i of the tile selected at node n, perform Emission(l_i). Then emit the instruction matched at node n.

Emission(n) does not recur on the children of node n, but on the leaves of the tile that matched at n. For example, after the dynamic-programming algorithm finds the optimum cost of the simple tree above, the emission phase emits Java ScreenShot

but no instruction is emitted for any tile rooted at the + node, because this was not a leaf of the tile matched at the root.

TREE GRAMMARS

For machines with complex instruction sets and several classes of registers and addressing modes, there is a useful generalization of the dynamic-programming algorithm. Suppose we make a brain-damaged version of Jouette with two classes of registers: a registers for addressing, and d registers for "data." The instruction set of the Schizo-Jouette machine (loosely based on the Motorola 68000) is shown in Image 9.4.
Image 9.4: The Schizo-Jouette architecture.

The root and leaves of each tile must be marked with a or d to indicate which kind of register is implied. Now, the dynamic-programming algorithm must keep track, for each node, of the min-cost match as an a register, and also the min-cost match as a d register. At this point it is useful to use a context-free grammar to describe the tiles; the grammar will have nonterminals s (for statements), a (for expressions calculated into an a register), and d (for expressions calculated into a d register). describes the use of context-free grammars for source-language syntax; here we use them for quite a different purpose. The grammar rules for the LOAD, MOVEA, and MOVED instructions might look like this:

d → MEM(+(a, CONST))
d → MEM(+(CONST, a))
d → MEM(CONST)
d → MEM(a)
d → a
a → d

Such a grammar is highly ambiguous: There are many different parses of the same tree (since there are many different instruction sequences implementing the same expression). For this reason, the parsing techniques described in are not very useful in this app. However, a generalization of the dynamic-programming algorithm works quite well: The minimum-cost match at each node for each nonterminal of the grammar is computed. Though the dynamic-programming algorithm is conceptually simple, it becomes messy to write down directly in a general-purpose coding language such as Java. Thus, several tools have been developed. These codegenerator generators process grammars that specify machine instruction sets; for each rule of the grammar, a cost and an action are specified. The costs are used to find the optimum tiling, and then the actions of the matched rules are used in the emission phase. Like Yacc and Lex, the output of a code-generator generator is usually a program in C or Java that operates a table-driven matching engine with the action fragments (written in C or Java) inserted at the appropriate points. Such tools are quite convenient. Grammars can specify addressing modes of treelike CISC instructions quite well. A typical grammar for the VAX has 112 rules and 20 nonterminal symbols; and one for the Motorola 68020 has 141 rules and 35 nonterminal symbols. However, instructions that produce more than one result - such as autoincrement instructions on the VAX -are difficult to express using tree patterns.

Code-generator generators are probably overkill for RISC machines. The tiles are quite small, there aren't very many of them, and there is little need for a grammar with many nonterminal symbols.

FAST MATCHING

Maximal munch and the dynamic-programming algorithm must examine, for each node, all the tiles that match at that node. A tile matches if each nonleaf node of the tile is labeled with the same operator (MEM, CONST, etc.) as the corresponding node of the tree. The naive algorithm for matching would be to examine each tile in turn, checking each node of the tile against the corresponding part of the tree. However, there are better approaches. To match a tile at node n of the tree, the label at n can be used in a case statement:

match(n) {
 switch (label(n)) {
 case MEM: ...
 case BINOP: ...
 case CONST: ...
}

Once the clause for one label (such as MEM) is selected, only those patterns rooted in that label remain in consideration. Another case statement can use the label of the child of n to begin distinguishing among those patterns.

The organization and optimization of decision trees for pattern matching is beyond the scope of this tutorial. However, for better performance the naive sequence of clauses in function munchExp should be rewritten as a sequence of comparisons that never looks twice at the same tree node.

EFFICIENCY OF TILING ALGORITHMS

How expensive are maximal munch and dynamic programming? Let us suppose that there are T different tiles, and that the average matching tile contains K nonleaf (labeled) nodes. Let K′ be the largest number of nodes that ever need to be examined to see which tiles match at a given subtree; this is approximately the same as the size of the largest tile. And suppose that, on the average, T′ different patterns (tiles) match at each tree node. For a typical RISC machine we might expect T = 50, K = 2, K′ = 4, T′ = 5. Suppose there are N nodes in the input tree. Then maximal munch will have to consider matches at only N=K nodes because, once a "munch" is made at the root, no pattern-matching needs to take place at the nonleaf nodes of the tile. To find all the tiles that match at one node, at most K′ tree nodes must be examined; but (with a sophisticated decision tree) each of these nodes will be examined only once. Then each of the successful matches must be compared to see if its cost is minimal. Thus, the matching at each node costs K′ + T′, for a total cost proportional to (K′ + T′)N/K. The dynamic-programming algorithm must find all the matches at every node, so its cost is proportional to (K′ + T′)N. However, the constant of proportionality is higher than that of maximal munch, since dynamic coding requires two tree-walks instead of one.

Since K, K′, and T′ are constant, the running time of all of these algorithms is linear. In practice, measurements show that these instruction selection algorithms run very quickly compared to the other work performed by a real compiler - even lexical analysis is likely to take more time than instruction selection.