POLYMORPHIC TYPE-CHECKING
Type-checking for a polymorphic language is not as straightforward as for a monomorphic language. Before embarking on an implementation, we must be clear about what the typing rules are. The types used in the basic MiniJava compiler could all be represented as atomic strings: int, boolean, int[], and class identifiers such as IntList. In GJ we have three kinds of type expressions:
Primitive types such as int and boolean;
Type apps of the form c 〈t1, t2,… tn〉, where c is a type constructor - a polymorphic class such as List in our example - and t1 through tn are type expressions; and Type variables such as the X in the type expression ListX.
All class identifiers will be considered polymorphic, but those with no type arguments can be considered as types of the form C<>. In this chapter we will omit discussion of array types, though in a "MiniGJ" compiler int[] could be treated as a primitive type. Syntactic conventions The type-checking algorithm will have several kinds of variables:
- T, Ti, U, V stand for type expressions;
- N, Ni, P, Q stand for nonvariable type expressions (that do not contain type variables);
- X, Xi, Y, Z stand for type variables;
- C stands for class names;
- m stands for method names;
- f stands for field names;
- e stands for program expressions; and
- x stands for program variables (local variables of methods).
We will often abbreviate the keyword extends with the symbol ◃. In discussing the type-checking algorithm we will require that every TyParams have an explicit bound, so instead of writing class List<X> we would write class List<X◃Object>. Method types It's convenient to separate the type of a method from the name of the method. For example, consider the declaration
<X extends Printable> GPair<X> firstTwo(List<X> x) { ... }
For any type X that's a subclass of Printable, the firstTwo method takes a list of X's, and returns a pair of X's. The type of the firstTwo method is 〈X ◃ Printable〉 List 〈X〉 → GPair 〈X〉. (Notice that the binding occurrence of X requires a bounding clause 〈X ◃ Printable〉, but the applied occurrences of X do not use the ◃ symbol.) In general, the form of a method type is
meaning that, with type parameters TyParams and value parameters whose types are TyList, the method returns a value of type Ty. Substitution Suppose we want to know the type of the firstTwo method when it's applied to a list of PrintableInt. We take List 〈X〉 → GPair 〈X〉 and substitute PrintableInt for X, yielding the method type
List 〈PrintableInt〉 → GPair 〈PrintableInt〉
We can write a substitution function in the compiler's type-checker to do this; in this chapter we'll write [V1,…, Vk/X1,…, Xk]U to mean the substitution of type Vi for every occurrence of type variable Xi in type expression (or function prototype) U. Because class parameterization is not nested in GJ - that is, U cannot contain class declarations - we don't need to worry about internal redeclarations of the type variables Xi; the problem of avoiding variable capture (described in ) does not occur. Class table, type table, and var table explained that, because classes can mutually refer to each other, type-checking must proceed in two phases: First, build a symbol table (the class table) mapping class names to class declarations; second, type-check the interior of the class declarations. The same approach works for GJ. The first phase is quite straightforward; the rest of this section explains the algorithm for the second phase. This phase will use the class table, and maintain a type table mapping formal type parameters to their bounds, and a var table mapping ordinary (value) variables to their types. For example, in processing the firstTwo method (described above), the type table would map X to Printable, and the var table would map x to List<X>. Well-formed types A type is well formed if every type variable has a bound in the appropriate type table. Thus, the type List<X> is well formed when processing the declaration of firstTwo because X ◃ Printable is in the type table. In general, we must also account for substitutions. Algorithm 16.1 shows the procedure checktype for checking wellformedness of types.ALGORITHM 16.1: Checking wellformedness of types and subtyping.
checkType(T) = if T is primitive (int, boolean, etc.) then OK else if T is Object then OK else if (T ↦ ...) is in the type table then OK else if T is C 〈T1,..., Tn〉 look up C in the class table, yielding class C 〈X1 ◃ N1,..., Xn ◃ Nn〉 ◃ N{...} for each Ti do checkType(Ti for each Ti do checkSubtype(Ti, [T1,..., Tn)X1,..., Xn]Ni checkSubtype(T, U) = if T = U then OK else if (T ↦ T′) is in the type table then subtype(T′, U else if T is C 〈T1,..., Tn〉 look up C in the class table, yielding class C 〈X1 ◃ N1,..., Xn ◃ Nn〉 ◃ N{...} subtype([T1,..., Tn/X1,..., Xn]N, U) else error
Subtyping In type-checking Java (and GJ) it is often necessary to test whether one type is a subtype of another: A is a subtype of B if, whenever a value of type B is required, a value of type A will suffice. A subtype is not exactly the same as a subclass, but still, in class-based languages like Java, if class A extends B, then there is also a subtype relationship. Checking subtyping in GJ is made more complex by the need to perform type substitution and the need to look up type variables in the type table; Algorithm 16.1 shows how to do it. Finding the bound of a type The formal type parameters of a polymorphic class or method are of the form hX ◃ Ni. Inside that class or method, suppose there's a local variable declaration X x, where X is the type and x is the variable. When type-checking expressions such as the field access x. f, we need to know what class x is an instance of. We don't know exactly what class x belongs to - it belongs to some unknown class X - but we do know that X is a subclass of N. Therefore, we take the type X and look up its bound, N, in the type table. The function getBound(T) looks up type bounds. Looking up field and method types When type-checking a field access e: f or a method call e.m 〈T 〉 (), one must look up the field f or method m in e's class. This may involve a search through superclasses, and in the process, types must be substituted for type variables. These searches are handled by fieldType and methodType in Algorithm 16.2. Algorithm 16.3 shows how to type-check expressions using all the auxiliary procedures defined thus far.ALGORITHM 16.2: Field and method search.
getBound(T) = if T is a type variable X then if (X ↦ N) is in the type table then N else error else if T is a nonvariable type N then N else error fieldType(f;) = if T is C 〈T1,..., Tn〉 look in the class table for class C 〈X1 ◃ N1,..., Xn ◃ Nn〉 ◃ N{fields, ...} if field f with type S is in fields return [T1,..., Tn/X1,..., Xn]S else return fieldType(f, [T1,..., Tn/X1,..., Xn]N) else error methodType(m, T ) = if T is C 〈T1,..., Tn〉 look in the class table for class C 〈X1 ◃ N1,..., Xn ◃ Nn〉 ◃ N{..., methods} if method m is in methods; with the declaration 〈Y1 ◃ P1,..., Yk ◃ Pk〉 U m(U1 x1 ... Ul xl){return e} then return [T1,..., Tn/X1,..., Xn](〈Y1 ◃ P1,..., Yk ◃ Pk〉 (U1 x1,..., Ul xl) → U) else return methodType(m, [T1,..., Tn/X1,..., Xn]N)
checkExp(e) = if e is a variable x look up (x ↦ T ) in the var table; return T else if e is a field-selection e0. f return fieldType(f; getBound(checkExp(e0))) else if e is a method-call e0m 〈V1,..., Vn〉 (e1,..., el) call methodType(m, getBound(checkExp(e0))), yielding 〈Y1 ◃ P1,..., Yk ◃ Pk〉 (U1 x1,..., Ul xl) → U) for each Vi do checkType(Vi) for each Vi do checkSubtype(Vi, [V1,..., Vk/Y1,..., Yk]Pi) for i ∈ 1 ... l do checkSubtype(checkExp(ei), [V1,..., Vk/Y1,..., Yk]Ui) return [V1,..., Vk/Y1,..., Yk]U else if e is new N) checktype(N/) return N
Algorithm 16.4 shows how to check class and method declarations. The formal type parameters of the class are added to the type table, then all the methods are checked. For each method, the formal type parameters and the (ordinary) value parameters of the method are added to the type table and value table, then the body of the method is type-checked. The four lines ending at the last call to checkSubtype are to make sure that the method over-riding is valid; in GJ as in Java, one can only override a method at the same argument and result type as in the superclass, but for GJ there are more substitutions to perform in order to check this.ALGORITHM 16.4: Type-checking class declarations.
checkClass(cdecl) = suppose cdecl is class C 〈X1 ◃ N1,..., Xn ◃ Nn〉 ◃ N{fields, methods} add (X1 ↦ N1,... Xk ↦ Nk) to the type table checkType(N) for each Ni do checkType(Ni) for each mdecl in methods suppose mdecl is 〈Y1 ◃ P1,..., Yj〉 ◃ Pj〉 T m(T1 x1 ... Tl xl) {return e} add (Y1 ↦ P1,... Yj ↦ Pj) to the type table checkType(T) for each Ti do checkType(Ti) for each Pi do checkType(Pi) add (this ↦ C 〈X1,..., Xn〉) to the var table add (x1 ↦ T1,... xl ↦ Tl) to the var table checkSubtype(checkType(e), T ) suppose methodType(m, N) is 〈Z1 ◃ Q1,..., Zj ◃ Qj 〉 (U1 x1,..., Ul xl) → U for each Ti check that Ti = ([Y1,..., Yj/Z1,..., Zj]Ui) for each Pi check that Pi = ([Y1,..., Yj/Z1,..., Zj]Qi) checkSubtype(T, [Y1,..., Yj/Z1,..., Zj]U) pop j most recent bindings from the type table pop l + 1 most recent bindings from the var table pop k most recent bindings from the type table