TRANSLATION OF POLYMORPHIC PROGRAMS

After a polymorphic program is type-checked, it must be translated into machine code. There are several ways this can be done, four of which we will discuss:

Expansion: Don't generate code for a generic class such as Cons<X>; instead, create a new Cons class for each different class at which <X> is instantiated. Casting: Generate only a single Cons class, and use Java-style checked runtime casts to operate upon it. Erasure: Generate only a single Cons class, and operate on it directly.
Type-passing: Generate code for a Cons template class, and pass type parameters at run time.

Expansion and casting have the advantage that they are compatible with standard Java Virtual Machines; erasure is more efficient but incompatible; and type-passing has interesting advantages and disadvantages of its own. Expansion It's entirely possible to expand out all the polymorphic class instantiations into ordinary Java classes. This is called the heterogenous translation of GJ into Java, because the Cons_Int class will be entirely different and essentially unrelated to the Cons_Bool class, and so on. Templates in C++ work this way as well: They are expanded out into ordinary classes. Expansion is much like inline expansion of ordinary functions; but inline expansion of functions usally can't be done so completely as to eliminate all function definitions, because recursive functions or function calls within loops would expand the program infinitely. In contrast, instantiation of generic classes never depends on program variables and is not recursive. The advantages of expansion are that the resulting classes are generally compatible with ordinary Java (so that GJ can run in an ordinary Java Virtual Machine), and that the compiled code is fairly efficient (similar to ordinary Java). The disadvantages are that it makes many copies of the same code - at worst, it can cause exponential blowup of the program, but in practice this exponential behavior is rarely seen (expansion of C++ templates is tolerably efficient). Also, expansion interacts badly with the package mechanism of Java (see Exercise 16.3). Casting In the homogenous translation of GJ into Java, all the type parameters are simply erased. When a type variable is used in the declaration of a local variable or parameter, it is replaced by its bound. Thus, the translation of GPair (from page 339) would be

class GPair {
 Printable a;
 Printable b;
 void print_me() { a.print_me(); b.print_me(); }
}

which is well-typed Java code. However, a use of GPair such as

int sum(GPair<PrintableInt> p) {
 return p.a.x + p.b.x;
}

must be translated with casts in order to be legal Java:

int sum(GPair p) {
 return ((PrintableInt)(p.a)).x + ((PrintableInt)(p.b)).x;
}

Unfortunately, these casts from a superclass to a subclass require a run-time check in ordinary Java, even though (if the Java program results from the translation of a well-typed GJ program) the cast will always succeed. Another (minor) disadvantage of the homogenous translation is that class construction cannot be applied to type variables; i.e., new X(), where X is a type variable, cannot really be replaced by new C(), where C is the bound of X, because the wrong class will be constructed. Erasure If the GJ program is translated directly into machine code, the homogenous translation can safely be done without inserting casts: Just erase the type parameters from the program. The advantage is that there's no duplication of code and there's no extra run-time casting. Unfortunately, bypassing Java also means that the Java Virtual Machine Language (JVML, or "Java byte codes") must also be bypassed, since the Java bytecode verifier uses the Java type system, not the GJ type system. This translation technique is therefore incompatible with existing Java Virtual Machines that accept programs in JVML. Type-passing Instead of erasing the type parameters, they can be turned into value parameters. A polymorphic method

<X1 extends C1> int m (X1 x, int y)

can be translated (approximately) as

int m (Class X1, X1 x, int y)

where a class descriptor is really passed as a run-time argument. One advantage of this translation is that class construction can now be applied to type variables. An even more significant advantage is that, in principle, it may be possible to divorce class descriptors from the objects that belong to the classes. That is, instead of an object of class PrintableInt (see page 338) requiring two words to represent it - the class descriptor and the x field - now only one word would be required. Any place in the program that manipulates a PrintableInt would also have an explicit class parameter passed to it in an associated local variable; from this parameter, the virtual method table can be accessed as needed, and the garbage collecter can learn what it needs to know about the layout of objects. The disadvantage of type-passing is that there is a (small) run-time cost to passing the types, and that it is incompatible with Java and with standard JVMs.

POINTERS, INTEGERS, AND BOXING

Polymorphism in GJ works for object types, but not for int and boolean. Even in ordinary Java, the class extension mechanism works for objects but not integers. The solution in Java and GJ for programmers who wish to get the benefits of subclassing or polymorphism for intsistomakeawrapper class Integer that contains an instance variable of type int. This is called a boxed integer, compared to the raw int, which is an unboxed value. Boxed values - implemented by pointers to objects containing raw values - are much easier than unboxed values to compile polymorphically:

They are all the same size (the size of a pointer) so that the same machine code can obliviously manipulate boxed values of different types.
They can contain type or class descriptors (in the pointed-to object) so that the garbage collector can understand how to traverse them.

Some coding languages (such as ML and C#) automatically box values as necessary to support polymorphism. That is, the programmer never needs to distinguish between int and Integer because the compiler inserts the coercions automatically. There is still a cost at run time to box and unbox, but it is the same cost that would be paid if the programmer explicitly wrapped the integers in boxes.