TYPE-CHECKING MiniJava
With what should a symbol table be filled - that is, what is a binding? To enable type-checking of MiniJava programs, the symbol table should contain all declared type information:
- each variable name and formal-parameter name should be bound to its type;
- each method name should be bound to its parameters, result type, and local variables; and
- each class name should be bound to its variable and method declarations.
For example, consider Image 5.7, which shows a program and its symbol table. The two class names B and C are each mapped to two tables for fields and methods. In turn, each method is mapped to both its result type and tables with its formal parameters and local variables.
Image 5.7: A MiniJava Program and its symbol table
The primitive types in MiniJava are int and boolean; all other types are either integer array, written int [], or class names. For simplicity, we choose to represent each type as a string, rather than as a symbol; this allows us to test type equality by doing string comparison. Type-checking of a MiniJava program proceeds in two phases. First, we build the symbol table, and then we type-check the statements and expressions. During the second phase, the symbol table is consulted for each identifier that is found. It is convenient to use two phases because, in Java and MiniJava, the classes are mutually recursive. If we tried to do type-checking in a single phase, then we might need to type-check a call to a method that is not yet entered into the symbol table. To avoid such situations, we use an approach with two phases. The first phase of the type-checker can be implemented by a visitor that visits nodes in a MiniJava syntaxtree and builds a symbol table. For instance, the visit method in Program 5.8 handles variable declarations. It will add the variable name and type to a data structure for the current class which later will be added to the symbol table. Notice that the visit method checks whether a variable is declared more than once and, if so, then it prints an appropriate error message.A visit method for variable declarations
class ErrorMsg { boolean anyErrors; void complain(String msg) { anyErrors = true; System.out.println(msg); } } // Type t; // Identifier i; public void visit(VarDecl n) { Type t = n.t.accept(this); String id = n.i.toString(); if (currMethod == null) { if (!currClass.addVar(id,t)) error.complain(id + "is already defined in " + currClass.getId()); } else if (!currMethod.addVar(id,t)) error.complain(id + "is already defined in " + currClass.getId() + "." + currMethod.getId()); }
The second phase of the type-checker can be implemented by a visitor that type-checks all statements and expressions. The result type of each visit method is String, for representing MiniJava types. The idea is that when the visitor visits an expression, then it returns the type of that expression. If the expression does not type-check, then the type-check is terminated with an error message. Let's take a simple case: an addition expression e1 + e2. In MiniJava, both operands must be integers (the type-checker must check this) and the result will be an integer (the type-checker will return this type). The visit method for addition is easy to implement; see Program 5.9.A visit method for plus expressions
// Exp e1,e2; public Type visit(Plus n) { if (! (n.e1.accept(this) instanceof IntegerType) ) error.complain("Left side of LessThan must be of type integer"); if (! (n.e2.accept(this) instanceof IntegerType) ) error.complain("Right side of LessThan must be of type integer"); return new IntegerType(); }
In most languages, addition is overloaded: The + operator stands for either integer addition or real addition. If the operands are both integers, the result is integer; if the operands are both real, the result is real. And in many languages if one operand is an integer and the other is real, the integer is implicitly converted into a real, and the result is real. Of course, the compiler will have to make this conversion explicit in the machine code it generates. For an assignment statement, it must be checked that the left-hand side and the right-hand side have the same type. When we allow extension of classes, the requirement is less strict: It is sufficient to check that the right-hand side is a subtype of the left-hand side. For method calls, it is necessary to look up the method identifier in the symbol table to get the parameter list and the result type. For a call e.m(…), where e has type C, we look up the definition of m in class C. The parameter types must then be matched against the arguments in the function-call expression. The result type of the method becomes the type of the method call as a whole. Every kind of statement and expression has its own type-checking rules, but in all the cases we have not already described, the rules can be derived by reference to the Java Language Specification.
ERROR HANDLING
When the type-checker detects a type error or an undeclared identifier, it should print an error message and continue - because the programmer would like to be told of all the errors in the program. To recover after an error, it's often necessary to build data structures as if a valid expression had been encountered. For example, type-checking
{int i = new C(); int j = i + i; ... }
even though the expression new C() doesn't match the type required to initialize an integer, it is still useful to enter i in the symbol table as an integer so that the rest of the program can be type-checked.
If the type-checking phase detects errors, then the compiler should not produce a compiled program as output. This means that the later phases of the compiler - translation, register allocation, etc. - will not be executed. It will be easier to implement the later phases of the compiler if they are not required to handle invalid inputs. Thus, if at all possible, all errors in the input program should be detected in the front end of the compiler (parsing and type-checking).