ALIAS ANALYSIS

The analyses we have described in this chapter consider only the values of Tree-language temporaries. Variables that escape are represented (by the front end of the compiler) in memory locations with explicit fetches and stores, and we have not tried to analyze the definitions, uses, and liveness of these variables. The problem is that a variable or memory location may have several different names, or aliases, so that it is hard to tell which statements affect which variables. Variables that can be aliases include:

variables passed as call-by-reference parameters (in Pascal, C++, Fortran);
variables whose address is taken (in C, C++);
l-value expressions that dereference pointers, such as p.x in MiniJava or *p in C;
l-value expressions that explicitly subscript arrays, such as a[i];
and variables used in inner-nested procedures (in Pascal, MiniJava, ML).

A good optimizer should optimize these variables. For example, in the program fragment

p.x := 5; q.x := 7; a := p.x

we might want our reaching definitions analysis to show that only one definition of p.x (namely, 5) reaches the definition of a. But the problem is that we cannot tell if one name is an alias for another. Could q point to the same record as p? If so, there are two definitions (5 and 7) that could reach a. Similarly, with call-by-reference parameters, in the program

function f( ref i: int, ref j: int) =
 (i := 5; j := 7; return i)

a naive computation of reaching definitions would miss the fact that i might be the same variable as j, if f is called with f(x,x). The may-alias relation We use alias analysis, a kind of dataflow analysis, to learn about different names that may point to the same memory locations. The result of alias analysis is a may-alias relation: p may-alias q if, in some run of the program, p and q might point to the same data. As with most dataflow analyses, static (compile-time) information cannot be completely accurate, so the may-alias relation is conservative: We say that p may-alias q if we cannot prove that p is never an alias for q.

ALIAS ANALYSIS BASED ON TYPES

For languages with strong typing (such as Pascal, Java, ML, MiniJava) where if two variables have incompatible types they cannot possibly be names for the same memory location, we can use the type information to provide a useful may-alias relation. Also in these languages the programmer cannot explicitly make a pointer point to a local variable, and we will use that fact as well. We divide all the memory locations used by the program into disjoint sets, called alias classes. For MiniJava, here are the classes we will use:

For every frame location created by Frame.allocLocal(true), wehave a new class;
For every record field of every record type, a new class;
For every array type a, a new class.

The semantic analysis phase of the compiler must compute these classes, as they involve the concept of type, of which the later phases are ignorant. Each class can be represented by a different integer. The Translate functions must label every fetch and store (that is, every MEM node in the Tree language) with its class. We will need to modify the Tree data structure, putting an aliasClass field into the MEM node. Given two MEM nodes M_i[x] and M_j[y], where i and j are the alias classes of the MEM nodes, we can say that M_i[x] may-alias M_j[y] if i = j.

This works for MiniJava and Java. But it fails in the presence of call-by-reference or type casting.

ALIAS ANALYSIS BASED ON FLOW

Instead of, or in addition to, alias classes based on types, we can also make alias classes based on point of creation. In Program 17.9a, even though p and q are the same type, we know they point to different records. Therefore we know that a must be assigned 0; the definition q.head:=5 cannot affect a. Similarly, in Program 17.9b we know p and q cannot be aliases, so a must be 0.p and q are not aliases.

type list = {head: int,
 tail: list}
var p : list := nil var q : list := nil q := list{head=0, tail=nil};
p := list{head=0, tail=q};
q.head := 5;
a := p.head
(a) MiniJava program

{int *p, *q;
 int h,i;
 p = &h;
 q = &i;
 *p = 0;
 *q = 5;
 a = *p;
}
(b) C program

To catch these distinctions automatically, we will make an alias class for each point of creation. That is, for every different statement where a record is allocated (that is, for each call to malloc in C or new in Pascal or Java) we make a new alias class. Also, each different local or global variable whose address is taken is an alias class. A pointer (or call-by-reference parameter) can point to variables of more than one alias class. In the program

 1 p := list {head=0, tail=nil};
 2 q := list {head=6, tail=p};
 3 if a=0
 4 then p:=q;
 5 p.head := 4;

at line 5, q can point only to alias class 2, but p might point to alias class 1 or 2, depending on the value of a. So we must associate with each MEM node a set of alias classes, not just a single class. After line 2 we have the information p ↦ {1}, q ↦ {2}; out of line 4 we have p ↦ {2}, q ↦ {2}. But when two branches of control flow merge (in the example, we have the control edges 3 → 5 and 4 → 5) we must merge the alias class information; at line 5 we have p ↦ {1, 2}, q ↦ {2}. Algorithm The dataflow algorithm manipulates sets of tuples of the form (t, d, k) where t is a variable and d, k is the alias class of all instances of the kth field of a record allocated at location d. The set in[s] contains (t, d, k) if t − k might point to a record of alias class d at the beginning of statement s. This is an example of a dataflow problem where bit vectors will not work as well as a tree or hash-table representation better suited to sparse problems. Instead of using gen and kill sets, we use a transfer function: We say that if A is the alias information (set of tuples) on entry to a statement s, then trans_s(A) is the alias information on exit. The transfer function is defined by Table 17.10 for the different kinds of quadruples.

Table 17.10: Transfer function for alias flow analysis.
Statement s	trans_s(A)
t ← b	(A − Σ_t) ∪ {(t, d, k) ‖ (b, d, k) ∈ A}
t ← b + k (k is a constant)	(A − Σ_t) ∪ {(t, d, i) ‖ (b, d, i − k) ∈ A}
t ← b ⊕ c	(A − Σ_t) ∪ {(t, d, i) ‖ (b, d, j) ∈ A} ∩ (c, d, k) ∈ A}
t ← M[b]	A ∪ Σ_t
M[a] ← b	A
if a > b goto L₁ else L₂	A
goto L	A
L :	A
f(a₁,…, a_n)	A
d : t ← `allocRecord`(a)	(A − Σ_t) ∪ {(t, d, 0)}
t ← f(a₁,…, a_n)	A ∪ Σ_t

The initial set A₀ includes the binding (FP, frame,0), where frame is the special alias class of all frame-allocated variables of the current function. We use the abbreviation Σ_t to mean the set of all tuples (t, d, k), where d, k is the alias class of any record field whose type is compatible with variable t. Cooperation from the front end in providing a "small" Σ_t for each t makes the analysis more accurate. Of course, in a typeless language, or one with type-casts, Σ_t might have to be the set of all alias classes. The set equations for alias flow analysis are Java ScreenShot

and we can compute a solution by iteration in the usual way. Producing may-alias information. Finally, we say that p may-alise q at statement s

if there exists d; k such that (p, d, k) ∈ in[s] and (q, d, k) ∈ in[s].

USING MAY-ALIAS INFORMATION

Given the may-alias relation, we can treat each alias class as a "variable" in dataflow analyses such as reaching definitions and available expressions. To take available expressions as an example, we modify one line of Table 17.4, the gen and kill sets:

statement s	gen[s]	kill[s]

M[a] ← b	{}	{M[x]\| a may alise x at s}

Now we can analyze the following program fragment:

1 : u ← M[t]
2 : M[x] ← r
3 : w ← M[t]
4 : b ← u + w

Without alias analysis, the store instruction in line 2 would kill the availability of M[t], since we would not know whether t and x were related. But suppose alias analysis has determined that t may-alias x at 2 is false; then M[t] is still available at line 3, and we can eliminate the common subexpression; after copy propagation, we obtain:

1 : z ← M[t]
2 : M[x] ← r
4 : b ← z + z

What we have shown here is intraprocedural alias analysis. But an interprocedural analysis would help to analyze the effect of CALL instructions. For example, in the program

1 : t ← fp + 12
2 : u ← M[t]
3 : f(t)
4 : w ← M[t]
5 : b ← u + w

does the function f modify M[t]? If so, then M[t] is not available at line 4.

However, interprocedural alias analysis is beyond the scope of this tutorial.

ALIAS ANALYSIS IN STRICT PURE-FUNCTIONAL LANGUAGES

Some languages have immutable variables that cannot change after their initialization. For example, const variables in the C language, most variables in the ML language, and all variables in PureFun-MiniJava (see ) are immutable. Alias analysis is not needed for these variables. The purpose of alias analysis is to determine whether different statements in the program interfere, or whether one definition kills another. Though it is true that there could be many pointers to the same value, none of the pointers can cause the value to change, i.e., no immutable variable can be killed.

This is a good thing for the optimizer, and also for the the programmer. The optimizer can do constant propagation and loop-invariant detection (see ) without being bothered by aliases; and the programmer can also understand what a segment of the program is doing without the confusion and complexity introduced by stores through aliased pointers.