LIVENESS IN THE MiniJava COMPILER

The flow analysis for the MiniJava compiler is done in two stages: First, the control flow of the Assem program is analyzed, producing a control-flow graph; then, the liveness of variables in the control-flow graph is analyzed, producing an interference graph.

GRAPHS

To represent both kinds of graphs, let's make a Graph abstract data type (Program 10.10).The Graph abstract data type.

package Graph;
public class Graph {
 public Graph();
 public NodeList nodes();
 public Node newNode();
 public void addEdge(Node from, Node to);
 public void rmEdge(Node from, Node to);
 public void show(java.io.PrintStream out);
}
public class Node {
 public Node(Graph g);
 public NodeList succ();
 public NodeList pred();
 public NodeList adj();
 public int outDegree();
 public int inDegree();
 public int degree();
 public boolean goesTo(Node n);
 public boolean comesFrom(Node n);
 public boolean adj(Node n);
 public String toString();
}

The constructor Graph() creates an empty directed graph; g.newNode() makes a new node within a graph g. A directed edge from n to m is created by g.addEdge(n,m); after that, m will be found in the list n.succ() and n will be in m.pred(). When working with undirected graphs, the function adj is useful: m.adj() = m.succ() ∪ m.pred(). To delete an edge, use rmEdge. To test whether m and n are the same node, use m==n. When using a graph in an algorithm, we want each node to represent something (an instruction in a program, for example). To make mappings from nodes to the things they are supposed to represent, we use a Hashtable. The following idiom associates information x with node n in a mapping mytable.

java.util.Dictionary mytable = new java.util.Hashtable();
 ... mytable.put(n,x);

CONTROL-FLOW GRAPHS

The FlowGraph package manages control-flow graphs. Each instruction (or basic block) is represented by a node in the flow graph. If instruction m can be followed by instruction n (either by a jump or by falling through), then there will be an edge (m, n) in the graph.

public abstract class FlowGraph extends Graph.Graph {
 public abstract TempList def(Node node);
 public abstract TempList use(Node node);
 public abstract boolean isMove(Node node);
 public void show(java.io.PrintStream out);
}

Each Node of the flow graph represents an instruction (or, perhaps, a basic block). The def() method tells what temporaries are defined at this node (destination registers of the instruction). use() tells what temporaries are used at this node (source registers of the instruction). isMove tells whether this instruction is a MOVE instruction, one that could be deleted if the def and use were identical. The AssemFlowGraph class provides an implementation of FlowGraph for Assem instructions.

package FlowGraph;
public class AssemFlowGraph extends FlowGraph {
 public Instr instr(Node n);
 public AssemFlowGraph(Assem.InstrList instrs);
}

The constructor AssemFlowGraph takes a list of instructions and returns a flow graph. In making the flow graph, the jump fields of the instrs are used in creating control-flow edges, and the use and def information (obtained from the src and dst fields of the instrs) is attached to the nodes by means of the use and def methods of the flowgraph. Information associated with the nodes For a flow graph, we want to associate some use and def information with each node in the graph. Then the liveness-analysis algorithm will also want to remember live-in and live-out information at each node. We could make room in the Node class to store all of this information. This would work well and would be quite efficient. However, it may not be very modular. Eventually we may want to do other analyses on flow graphs, which remember other kinds of information about each node. We may not want to modify the data structure (which is a widely used interface) for each new analysis. Instead of storing the information in the nodes, a more modular approach is to say that a graph is a graph, and that a flow graph is a graph along with separately packaged auxiliary information (tables, or functions mapping nodes to whatever). Similarly, a dataflow algorithm on a graph does not need to modify dataflow information in the nodes, but modifies its own privately held mappings.

There may be a trade-off here between efficiency and modularity, since it may be faster to keep the information in the nodes, accessible by a simple pointer-traversal instead of a hash-table or search-tree lookup.

LIVENESS ANALYSIS

The RegAlloc package has an abstract class InterferenceGraph to indicate which pairs of temporaries cannot share a register:

package RegAlloc;
abstract public class InterferenceGraph extends Graph.Graph{
 abstract public Graph.Node tnode(Temp.Temp temp);
 abstract public Temp.Temp gtemp(Node node);
 abstract public MoveList moves();
 public int spillCost(Node node);
}

The method tnode relates a Temp to a Node, and gtemp is the inverse map. The method moves tells what MOVE instructions are associated with this graph (this is a hint about what pairs of temporaries to try to allocate to the same register). The spillCost(n) is an estimate of how many extra instructions would be executed if n were kept in memory instead of in registers; for a naive spiller, it suffices to return 1 for every n. The class Liveness produces an interference graph from a flow graph:

package RegAlloc;
public class Liveness extends InterferenceGraph {
 public Liveness(FlowGraph flow);
}

In the implementation of the Liveness module, it is useful to maintain a data structure that remembers what is live at the exit of each flow-graph node:

private java.util.Dictionary liveMap =
 new java.util.Hashtable();

where the keys are nodes and objects are TempLists. Given a flow-graph node n, the set of live temporaries at that node can be looked up in a global liveMap. Having calculated a complete liveMap, we can now construct an interference graph. At each flow node n where there is a newly defined temporary d ∈ def(n), and where temporaries {t₁, t₂;…} are in the liveMap, we just add interference edges (d, t₁), (d, t₂),…. For MOVEs, these edges will be safe but suboptimal; pages 213-214 describe a better treatment.

What if a newly defined temporary is not live just after its definition? This would be the case if a variable is defined but never used. It would seem that there's no need to put it in a register at all; thus it would not interfere with any other temporaries. But if the defining instruction is going to execute (perhaps it is necessary for some other side effect of the instruction), then it will write to some register, and that register had better not contain any other live variable. Thus, zero-length live ranges do interfere with any live ranges that overlap them.