Putting It All Together

de-bug: to eliminate errors in or malfunctions of
Webster's Dictionary

OVERVIEW

s 2-11 have described the fundamental components of a good compiler: a front end, which does lexical analysis, parsing, construction of abstract syntax, type-checking, and translation to intermediate code; and a back end, which does instruction selection, dataflow analysis, and register allocation. What lessons have we learned? We hope that the reader has learned about the algorithms used in different components of a compiler and the interfaces used to connect the components. But the authors have also learned quite a bit from the exercise. Our goal was to describe a good compiler that is, to use Einstein's phrase, "as simple as possible - but no simpler." we will now discuss the thorny issues that arose in designing the MiniJava compiler. Structured l-values Java (and MiniJava) have no record or array variables, as C, C++, and Pascal do. Instead, all object and array values are really just pointers to heap-allocated data. Implementing structured l-values requires some care but not too many new insights. Tree intermediate representation The Tree language has a fundamental flaw: It does not describe procedure entry and exit. These are handled by opaque procedures inside the Frame module that generate Tree code. This means that a program translated to Trees using, for example, the Pentium-Frame version of Frame will be different from the same program translated using SparcFrame - the Tree representation is not completely machine-independent. Also, there is not enough information in the trees themselves to simulate the execution of an entire program, since the view shift (page 128) is partly done implicitly by procedure prologues and epilogues that are not represented as Trees. Consequently, there is not enough information to do whole-program optimization (across function boundaries). The Tree representation is a low-level intermediate representation, useful for instruction selection and intraprocedural optimization. A high-level intermediate representation would preserve more of the source-program semantics, including the notions of nested functions (if applicable), nonlocal variables, object creation (as distinguished from an opaque external function call), and so on. Such a representation would be more tied to a particular family of source languages than the general-purpose Tree language is. Register allocation Graph-coloring register allocation is widely used in real compilers, but does it belong in a compiler that is supposed to be "as simple as possible"? After all, it requires the use of global dataflow (liveness) analysis, construction of interference graphs, and so on. This makes the back end of the compiler significantly bigger.

It is instructive to consider what the MiniJava compiler would be like without it. We could keep all local variables in the stack frame, fetching them into temporaries only when they are used as operands of instructions. The redundant loads within a single basic block can be eliminated by a simple intrablock liveness analysis. Internal nodes of Tree expressions could be assigned registers using Algorithms 11.10 and 11.9. But other parts of the compiler would become much uglier: The TEMPs introduced in canonicalizing the trees (eliminating ESEQs) would have to be dealt with in an ad hoc way, by augmenting the Tree language with an operator that provides explicit scope for temporary variables; the Frame interface, which mentions registers in many places, would now have to deal with them in more complicated ways. To be able to create arbitrarily many temps and moves, and rely on the register allocator to clean them up, greatly simplifies procedure-calling sequences and code generation.