Static Single-Assignment Form

dom-i-nate: to exert the supreme determining or guiding influence on
Webster's Dictionary

OVERVIEW

Many dataflow analyses need to find the use sites of each defined variable or the definition sites of each variable used in an expression. The def-use chain is a data structure that makes this efficient: For each statement in the flow graph, the compiler can keep a list of pointers to all the use sites of variables defined there, and a list of pointers to all definition sites of the variables used there. In this way the compiler can hop quickly from use to definition to use to definition. An improvement on the idea of def-use chains is static single assignment form, or SSA form, an intermediate representation in which each variable has only one definition in the program text. The one (static) definition site may be in a loop that is executed many (dynamic) times, thus the name static single assignment form instead of single assignment form (in which variables are never redefined at all). The SSA form is useful for several reasons:

Dataflow analysis and optimization algorithms can be made simpler when each variable has only one definition.
If a variable has N uses and M definitions (which occupy about N + M instructions in a program), it takes space (and time) proportional to N · M to represent def-use chains - a quadratic blowup (see Exercise 19.8). For almost all realistic programs, the size of the SSA form is linear in the size of the original program (but see Exercise 19.9).
Uses and defs of variables in SSA form relate in a useful way to the dominator structure of the control-flow graph, which simplifies algorithms such as interference graph construction.

a ← x + y

a₁ ← x + y

b ← a − 1

b₁ ← a₁ − 1

a ← y + b

a₂ ← y + b₁

b ← x · 4

b₂ ← x · 4

a ← a + b

a₃ ← a₂ + b₂

(a)

(b)

Image 19.1: (a) A straight-line program. (b) The program in singleassignment form.
Unrelated uses of the same variable in the source program become different variables in SSA form, eliminating needless relationships. An example is the program, for i ← to N do A[i] ← 0
for i ← to M do s ← s + B[i]
where there is no reason that both loops need to use the same machine register or intermediate code temporary variable to hold their respective loop counters, even though both are named i.

In straight line code, such as within a basic block, it is easy to see that each instruction can define a fresh new variable instead of redefining an old one, as shown in Image 19.1. Each new definition of a variable (such as a) is modified to define a fresh new variable (a₁, a₂,…), and each use of the variable is modified to use the most recently defined version. This is a form of value numbering (see page 365). But when two control-flow paths merge together, it is not obvious how to have only one assignment for each variable. In Image 19.2a, if we were to define a new version of a in block 1 and in block 3, which version should be used in block 4? Where a statement has more than one predecessor, there is no notion of "most recent."
Image 19.2: (a) A program with a control-Flow join; (b) the program trans formed to single assignment form; (c) edge-split SSA form.

To solve this problem we introduce a notational fiction, called a φ-function. Image 19.2b shows that we can combine a₁ (defined in block 1) and a₂ (defined in block 3) using the function a₃ ← φ(a₁, a₂). But unlike ordinary mathematical functions, φ (a₁, a₂) yields a₁ if control reaches block 4 along the edge 2 → 4, and yields a₂ if control comes in on edge 3 → 4. How does the φ-function know which edge was taken? That question has two answers:

If we must execute the program, or translate it to executable form, we can "implement" the φ-function using a MOVE instruction on each incoming edge, as shown in .
In many cases, we simply need the connection of uses to definitions, and don't need to "execute" the φ-functions during optimization. In these cases, we can ignore the question of which value to produce.

Consider Image 19.3a, which contains a loop. We can convert this to static single-assignment form as shown in Image 19.3b. Note that variables a and c each need a φ-function to merge their values that arrive on edges 1 → 2and 2 → 2. The φ-function for b₁ can later be deleted by dead-code elimination, since b₁ is a dead variable. The variable c is live on entry (after conversion to SSA, the implicit definition c₀ is live); this might be an uninitialized variable, or perhaps c is a formal parameter of the function whose body this is.
Image 19.3: (a) A program with a loop; (b) the program transformed to edgesplit single-assignment form. a₀, b₀, c₀ are initial values of the variables before block 1.

The assignment c₁ ← c₂ + b₂ will be executed many times; thus the variable c₁ is updated many times. This illustrates that we do not have a program with dynamic single-assignment (like a pure functional program), but a program in which each variable has only one static site of definition.

a ← x + y	a₁ ← x + y
b ← a − 1	b₁ ← a₁ − 1
a ← y + b	a₂ ← y + b₁
b ← x · 4	b₂ ← x · 4
a ← a + b	a₃ ← a₂ + b₂
(a)	(b)