Static Single-Assignment Form

dom-i-nate: to exert the supreme determining or guiding influence on

Webster's Dictionary

OVERVIEW

Many dataflow analyses need to find the use sites of each defined variable or the definition sites of each variable used in an expression. The def-use chain is a data structure that makes this efficient: For each statement in the flow graph, the compiler can keep a list of pointers to all the use sites of variables defined there, and a list of pointers to all definition sites of the variables used there. In this way the compiler can hop quickly from use to definition to use to definition. An improvement on the idea of def-use chains is static single assignment form, or SSA form, an intermediate representation in which each variable has only one definition in the program text. The one (static) definition site may be in a loop that is executed many (dynamic) times, thus the name static single assignment form instead of single assignment form (in which variables are never redefined at all). The SSA form is useful for several reasons:

  1. Dataflow analysis and optimization algorithms can be made simpler when each variable has only one definition.
  2. If a variable has N uses and M definitions (which occupy about N + M instructions in a program), it takes space (and time) proportional to N · M to represent def-use chains - a quadratic blowup (see Exercise 19.8). For almost all realistic programs, the size of the SSA form is linear in the size of the original program (but see Exercise 19.9).
  3. Uses and defs of variables in SSA form relate in a useful way to the dominator structure of the control-flow graph, which simplifies algorithms such as interference graph construction.
    Java Start Image

    ax + y

    a1x + y

    ba − 1

    b1a1 − 1

    ay + b

    a2y + b1

    bx · 4

    b2x · 4

    aa + b

    a3a2 + b2

    (a)

    (b)

    Java End Image

    Image 19.1: (a) A straight-line program. (b) The program in singleassignment form.
  4. Unrelated uses of the same variable in the source program become different variables in SSA form, eliminating needless relationships. An example is the program, for ito N do A[i] ← 0
    for ito M do ss + B[i]

    where there is no reason that both loops need to use the same machine register or intermediate code temporary variable to hold their respective loop counters, even though both are named i.

In straight line code, such as within a basic block, it is easy to see that each instruction can define a fresh new variable instead of redefining an old one, as shown in . Each new definition of a variable (such as a) is modified to define a fresh new variable (a1, a2,…), and each use of the variable is modified to use the most recently defined version. This is a form of value numbering (see page 365). But when two control-flow paths merge together, it is not obvious how to have only one assignment for each variable. In , if we were to define a new version of a in block 1 and in block 3, which version should be used in block 4? Where a statement has more than one predecessor, there is no notion of "most recent."
Image 19.2: (a) A program with a control-Flow join; (b) the program trans formed to single assignment form; (c) edge-split SSA form.

To solve this problem we introduce a notational fiction, called a φ-function. shows that we can combine a1 (defined in block 1) and a2 (defined in block 3) using the function a3 ← φ(a1, a2). But unlike ordinary mathematical functions, φ (a1, a2) yields a1 if control reaches block 4 along the edge 2 → 4, and yields a2 if control comes in on edge 3 → 4. How does the φ-function know which edge was taken? That question has two answers:

Consider , which contains a loop. We can convert this to static single-assignment form as shown in . Note that variables a and c each need a φ-function to merge their values that arrive on edges 1 → 2and 2 → 2. The φ-function for b1 can later be deleted by dead-code elimination, since b1 is a dead variable. The variable c is live on entry (after conversion to SSA, the implicit definition c0 is live); this might be an uninitialized variable, or perhaps c is a formal parameter of the function whose body this is.
Image 19.3: (a) A program with a loop; (b) the program transformed to edgesplit single-assignment form. a0, b0, c0 are initial values of the variables before block 1.

The assignment c1c2 + b2 will be executed many times; thus the variable c1 is updated many times. This illustrates that we do not have a program with dynamic single-assignment (like a pure functional program), but a program in which each variable has only one static site of definition.