CANONICAL TREES

Let us define canonical trees as having these properties:

  1. No SEQ or ESEQ.
  2. The parent of each CALL is either EXP(…) or MOVE(TEMP t,…).

TRANSFORMATIONS ON ESEQ

How can the ESEQ nodes be eliminated? The idea is to lift them higher and higher in the tree, until they can become SEQ nodes. gives some useful identities on trees.
Image 8.1: Identities on trees (see also Exercise 8.1).

Identity (1) is obvious. So is identity (2): Statement s is to be evaluated; then e1; then e2; then the sum of the expressions is returned. If s has side effects that affect e1 or e2, then either the left-hand side or the right-hand side of the first equation will execute those side effects before the expressions are evaluated. Identity (3) is more complicated, because of the need not to interchange the evaluations of s and e1. For example, if s is MOVE(MEM(x), y) and e1 is BINOP(PLUS, MEM(x), z), then the program will compute a different result if s is evaluated before e1 instead of after. Our goal is simply to pull s out of the BINOP expression; but now (to preserve the order of evaluation) we must pull e1 out of the BINOP with it. To do so, we assign e1 into a new temporary t, and put t inside the BINOP. It may happen that s causes no side effects that can alter the result produced by e1. This will happen if the temporaries and memory locations assigned by s are not referenced by e1 (and s and e1 don't both perform external I/O). In this case, identity (4) can be used. We cannot always tell if two expressions commute. For example, whether MOVE(MEM(x), y) commutes with MEM(z) depends on whether x = z, which we cannot always determine at compile time. So we conservatively approximate whether statements commute, saying either "they definitely do commute" or "perhaps they don't commute." For example, we know that any statement "definitely commutes" with the expression CONST(n), so we can use identity (4) to justify special cases likeJava ScreenShot

The commute function estimates (very naively) whether a statement commutes with an expression:

static boolean commute(Tree.Stm a, Tree.Exp b) {
 return isNop(a)
 || b instanceof Tree.NAME
 || b instanceof Tree.CONST;
}
static boolean isNop(Tree.Stm a) {
 return a instanceof Tree.EXP
 && ((Tree.EXP)a).exp instanceof Tree.CONST;
}

A constant commutes with any statement, and the empty statement commutes with any expression. Anything else is assumed not to commute.

GENERAL REWRITING RULES

In general, for each kind of Tree statement or expression we can identify the subexpressions. Then we can make rewriting rules, similar to the ones in , to pull the ESEQs out of the statement or expression. For example, in [e1, e2, ESEQ(s, e3)], the statement s must be pulled leftward past e2 and e1. If they commute, we have (s; [e1, e2, e3]). But suppose e2 does not commute with s; then we must haveJava ScreenShot

Or if e2 commutes with s but e1 does not, we haveJava ScreenShot

The reorder function takes a list of expressions and returns a pair of (statement, expression-list). The statement contains all the things that must be executed before the expression-list. As shown in these examples, this includes all the statement-parts of the ESEQs, as well as any expressions to their left with which they did not commute. When there are no ESEQs at all we will use EXP(CONST 0), which does nothing, as the statement. Algorithm Step one is to make a "subexpression-extraction" method for each kind. Step two is to make a "subexpression-insertion" method: Given an ESEQ-clean version of each subexpression, this builds a new version of the expression or statement. These will be methods of the Tree.Exp and Tree.Stm classes:

package Tree;
abstract public class Exp {
 abstract public ExpList kids();
 abstract public Exp build(ExpList kids);
}
abstract public class Stm {
 abstract public ExpList kids();
 abstract public Stm build(ExpList kids);
}

Each subclass Exp or Stm must implement the methods; for example,

package Tree;
public class BINOP extends Exp {
 public int binop;
 public Exp left, right;
 public BINOP(int b, Exp l, Exp r) {binop=b; ...}
 public final static int PLUS=0, MINUS=1, MUL=2, DIV=3,
 AND=4,OR=5,LSHIFT=6,RSHIFT=7,ARSHIFT=8,XOR=9;
 public ExpList kids() {return new ExpList(left,
 new ExpList(right,null));}
 public Exp build(ExpList kids) {
 return new BINOP(binop,kids.head,kids.tail.head);
 }
}

Other subclasses have similar (or even simpler) kids and build methods. Using these build methods, we can write functions

static Tree.Stm do_stm(Tree.Stm s)
static Tree.ESEQ do_exp (Tree.Exp e)

that pull all the ESEQs out of a statement or expression, respectively. That is, do_stm uses s.kids() to get the immediate subexpressions of s, which will be an expression-list l. It then pulls all the ESEQs out of l recursively, yielding a clump of side-effecting statements s1 and a cleaned-up list l′. Then SEQ(s1, s.build(l′)) constructs a new statement, like the original s but with no ESEQs. These functions rely on auxiliary functions reorder_stm and reorder_exp for help; see also Exercise 8.3. The left-hand operand of the MOVE statement is not considered a subexpression, because it is the destination of the statement - its value is not used by the statement. However, if the destination is a memory location, then the address acts like a source. Thus we have,

public class MOVE extends Stm {
 public Exp dst, src;
 public MOVE(Exp d, Exp s) {dst=d; src=s;}
 public ExpList kids() {
 if (dst instanceof MEM)
 return new ExpList(((MEM)dst).exp,
 new ExpList(src,null));
 else return new ExpList(src,null);
 }
 public Stm build(ExpList kids) {
 if (dst instanceof MEM)
 return new MOVE(new MEM(kids.head), kids.tail.head);
 else return new MOVE(dst, kids.head);
 }
}

Now, given a list of "kids", we pull the ESEQs out, from right to left.

MOVING CALLS TO TOP LEVEL

The Tree language permits CALL nodes to be used as subexpressions. However, the actual implementation of CALL will be that each function returns its result in the same dedicated return-value register TEMP(RV). Thus, if we haveJava ScreenShot

the second call will overwrite the RV register before the PLUS can be executed. We can solve this problem with a rewriting rule. The idea is to assign each return value immediately into a fresh temporary register, that isJava ScreenShot

Now the ESEQ-eliminator will percolate the MOVE up outside of its containing BINOP (etc.) expressions. This technique will generate a few extra MOVE instructions, which the register allocator () can clean up. The rewriting rule is implemented as follows: reorder replaces any occurrence of CALL(f, args) byJava ScreenShot

and calls itself again on the ESEQ. But do_stm recognizes the patternJava ScreenShot

and does not call reorder on the CALL node in that case, but treats the f and args as the children of the MOVE node. Thus, reorder never "sees" any CALL that is already the immediate child of a MOVE. Occurrences of the pattern EXP(CALL(f, args)) are treated similarly.

A LINEAR LIST OF STATEMENTS

Once an entire function body s0 is processed with do_stm, the result is a tree s0′ where all the SEQ nodes are near the top (never underneath any other kind of node). The linearize function repeatedly applies the ruleJava ScreenShot

The result is that s0 is linearized into an expression of the formJava ScreenShot

Here the SEQ nodes provide no structuring information at all, and we can just consider this to be a simple list of statements,Java ScreenShot

where none of the si contain SEQ or ESEQ nodes. These rewrite rules are implemented by linearize, with an auxiliary function linear:

static Tree.StmList linear(Tree.SEQ s, Tree.StmList l) {
 return linear(s.left,linear(s.right,l));
}
static Tree.StmList linear(Tree.Stm s, Tree.StmList l) {
 if (s instanceof Tree.SEQ) return linear((Tree.SEQ)s,l);
 else return new Tree.StmList(s,l);
}
static public Tree.StmList linearize(Tree.Stm s) {
 return linear(do_stm(s), null);
}