Basic Blocks and Traces
ca-non-i-cal: reduced to the simplest or clearest schema possible
Webster's Dictionary
OVERVIEW
The trees generated by the semantic analysis phase must be translated into assembly or machine language. The operators of the Tree language are chosen carefully to match the capabilities of most machines. However, there are certain aspects of the tree language that do not correspond exactly with machine languages, and some aspects of the Tree language interfere with compiletime optimization analyses. For example, it's useful to be able to evaluate the subexpressions of an expression in any order. But the subexpressions of Tree.exp can contain side effects - ESEQ and CALL nodes that contain assignment statements and perform input/output. If tree expressions did not contain ESEQ and CALL nodes, then the order of evaluation would not matter. Some of the mismatches between Trees and machine-language programs are
- The CJUMP instruction can jump to either of two labels, but real machines' conditional jump instructions fall through to the next instruction if the condition is false.
- ESEQ nodes within expressions are inconvenient, because they make different orders of evaluating subtrees yield different results.
- CALL nodes within expressions cause the same problem.
- CALL nodes within the argument-expressions of other CALL nodes will cause problems when trying to put arguments into a fixed set of formal-parameter registers.
Why does the Tree language allow ESEQ and two-way CJUMP, if they are so troublesome? Because they make it much more convenient for the Translate (translation to intermediate code) phase of the compiler. We can take any tree and rewrite it into an equivalent tree without any of the cases listed above. Without these cases, the only possible parent of a SEQ node is another SEQ; all the SEQ nodes will be clustered at the top of the tree. This makes the SEQs entirely uninteresting; we might as well get rid of them and make a linear list of Tree.Stms. The transformation is done in three stages: First, a tree is rewritten into a list of canonical trees without SEQ or ESEQ nodes; then this list is grouped into a set of basic blocks, which contain no internal jumps or labels; then the basic blocks are ordered into a set of traces in which every CJUMP is immediately followed by its false label. Thus the module Canon has these tree-rearrangement functions:
package Canon; public class Canon { static public Tree.StmList linearize(Tree.Stm s); } public class BasicBlocks { public StmListList blocks; public Temp.Label done; public BasicBlocks(Tree.StmList stms); } StmListList(Tree.StmList head, StmListList tail); public class TraceSchedule { public TraceSchedule(BasicBlocks b); public Tree.StmList stms; }
Linearize removes the ESEQs and moves the CALLs to top level. Then BasicBlocks groups statements into sequences of straight-line code. Finally, TraceSchedule orders the blocks so that every CJUMP is followed by its false label.