MODULES AND INTERFACES
Any large software system is much easier to understand and implement if the designer takes care with the fundamental abstractions and interfaces. Image 1.1 shows the phases in a typical compiler. Each phase is implemented as one or more software modules.
Image 1.1: Phases of a compiler, and interfaces between them.
Breaking the compiler into this many pieces allows for reuse of the components. For example, to change the target machine for which the compiler produces machine language, it suffices to replace just the Frame Layout and Instruction Selection modules. To change the source language being compiled, only the modules up through Translate need to be changed. The compiler can be attached to a language-oriented syntax editor at the Abstract Syntax interface. The learning experience of coming to the right abstraction by several iterations of think-implement-redesign is one that should not be missed. However, the student trying to finish a compiler project in one semester does not have this luxury. Therefore, we present in this tutorial the outline of a project where the abstractions and interfaces are carefully thought out, and are as elegant and general as we are able to make them. Some of the interfaces, such as Abstract Syntax, IR Trees, and Assem, take the form of data structures: For example, the Parsing Actions phase builds an Abstract Syntax data structure and passes it to the Semantic Analysis phase. Other interfaces are abstract data types; the Translate interface is a set of functions that the Semantic Analysis phase can call, and the Tokens interface takes the form of a function that the Parser calls to get the next token of the input program.
DESCRIPTION OF THE PHASES
Each chapter of Part I of this tutorial describes one compiler phase, as shown in Table 1.2
![]() |
Phase |
Description | |
---|---|---|
| ||
2 |
Lex |
Break the source file into individual words, or tokens. |
| ||
3 |
Parse |
Analyze the phrase structure of the program. |
| ||
4 |
Semantic Actions |
Build a piece of abstract syntax tree corresponding to each phrase. |
| ||
5 |
Semantic Analysis |
Determine what each phrase means, relate uses of variables to their definitions, check types of expressions, request translation of each phrase. |
| ||
6 |
Frame Layout |
Place variables, function-parameters, etc. into activation records (stack frames) in a machine-dependent way. |
| ||
7 |
Translate |
Produce intermediate representation trees (IR trees), a notation that is not tied to any particular source language or target-machine architecture. |
| ||
8 |
Canonicalize |
Hoist side effects out of expressions, and clean up conditional branches, for the convenience of the next phases. |
| ||
9 |
Instruction Selection |
Group the IR-tree nodes into clumps that correspond to the actions of target-machine instructions. |
| ||
10 |
Control Flow Analysis |
Analyze the sequence of instructions into a control flow graph that shows all the possible flows of control the program might follow when it executes. |
| ||
10 |
Dataflow Analysis |
Gather information about the flow of information through variables of the program; for example, liveness analysis calculates the places where each program variable holds a still-needed value (is live). |
| ||
11 |
Register Allocation |
Choose a register to hold each of the variables and temporary values used by the program; variables not live at the same time can share the same register. |
| ||
12 |
Code Emission |
Replace the temporary names in each machine instruction with machine registers. |
|
![]() |
This modularization is typical of many real compilers. But some compilers combine Parse, Semantic Analysis, Translate, and Canonicalize into one phase; others put Instruction Selection much later than we have done, and combine it with Code Emission. Simple compilers omit the Control Flow Analysis, Data Flow Analysis, and Register Allocation phases.
We have designed the compiler in this tutorial to be as simple as possible, but no simpler. In particular, in those places where corners are cut to simplify the implementation, the structure of the compiler allows for the addition of more optimization or fancier semantics without violence to the existing interfaces.