CISC MACHINES
A typical modern RISC machine has
- registers,
- only one class of integer/pointer registers,
- arithmetic operations only between registers,
- "three-address" instructions of the form r1 ← r2 ⊕ r3,
- load and store instructions with only the M[reg+const] addressing mode,
- every instruction exactly 32 bits long,
- one result or effect per instruction.
Many machines designed between 1970 and 1985 are complex instruction set computers (CISC). Such computers have more complicated addressing modes that encode instructions in fewer bits, which was important when computer memories were smaller and more expensive. Typical features found on CISC machines include
- few registers (16, or 8, or 6);
- registers divided into different classes, with some operations available only on certain registers;
- arithmetic operations can access registers or memory through "addressing modes";
- "two-address" instructions of the form r1 ← r1 ⊕ r2;
- several different addressing modes;
- variable-length instructions, formed from variable-length opcode plus variablelength addressing modes;
- instructions with side effects such as "autoincrement" addressing modes.
Most computer architectures designed since 1990 are RISC machines, but most general-purpose computers installed since 1990 are CISC machines: the Intel 80386 and its descendants (486, Pentium). The Pentium, in 32-bit mode, has six general-purpose registers, a stack pointer, and a frame pointer. Most instructions can operate on all six registers, but the multiply and divide instructions operate only on the eax register. In contrast to the "three-address" instructions found on RISC machines, Pentium arithmetic instructions are generally "two-address", meaning that the destination register must be the same as the first source register. Most instructions can have either two register operands (r1 ← r1 ⊕ r2), or one register and one memory operand, for example M[r1 + c] ← M[r1 + c] ⊕ r2 or r1 ← r1 ⊕ M[r2 + c], but not M[r1 + c1] ← M[r1 + c1] ⊕ M[r2 + c2] We will cut through these Gordian knots as follows:
- Few registers: We continue to generate TEMP nodes freely, and assume that the register allocator will do a good job.
- Classes of registers: The multiply instruction on the Pentium requires that its left operand (and therefore destination) must be the eax register. The highorder bits of the result (useless to a MiniJava program) are put into register edx. The solution is to move the operands and result explicitly; to implement t1 ← t2 × t3:
mov eax, t2 eax t2 mul t3 eax ← eax × t3; edx ← garbage mov t1, eax t1 ← eax
This looks very clumsy; but one job that the register allocator performs is to eliminate as many move instructions as possible. If the allocator can assign t1 or t3 (or both) to register eax, then it can delete one or both of the move instructions.
- Two-address instructions: We solve this problem in the same way as we solve the previous one: by adding extra move instructions. To implement t1 ← t2 + t3 we produce
mov t1,t2 t1 ← t2 add t1, t3 t1 ← t1 + t3
Then we hope that the register allocator will be able to allocate t1 and t2 to the same register, so that the move instruction will be deleted.
- Arithmetic operations can address memory: The instruction selection phase turns every TEMP node into a "register" reference. Many of these "registers" will actually turn out to be memory locations. The spill phase of the register allocator must be made to handle this case efficiently; see . The alternative to using memory-mode operands is simply to fetch all the operands into registers before operating and store them back to memory afterwards. For example, these two sequences compute the same thing:
mov eax, [ebp - 8] add eax, ecx add [ebp - 8], ecx mov [ebp - 8], eax
The sequence on the right is more concise (and takes less machine-code space), but the two sequences are equally fast. The load, register-register add, and store take 1 cycle each, and the memory-register add takes 3 cycles. On a highly pipelined machine such as the Pentium Pro, simple cycle counts are not the whole story, but the result will be the same: The processor has to perform the load, add, and store, no matter how the instructions specify them.
The sequence on the left has one significant disadvantage: It trashes the value in register eax. Therefore, we should try to use the sequence on the right when possible. But the issue here turns into one of register allocation, not of instruction speed; so we defer its solution to the register allocator.
- Several addressing modes: An addressing mode that accomplishes six things typically takes six steps to execute. Thus, these instructions are often no faster than the multi-instruction sequences they replace. They have only two advantages: They "trash" fewer registers (such as the register eax in the previous example), and they have a shorter instruction encoding. With some work, treematching instruction selection can be made to select CISC addressing modes, but programs can be just as fast using the simple RISC-like instructions.
- Variable-length instructions: This is not really a problem for the compiler; once the instructions are selected, it is a trivial (though tedious) matter for the assembler to emit the encodings.
- Instructions with side effects: Some machines have an "autoincrement" memory fetch instruction whose effect is
r2 ← M[r1]; r1 ← r1 + 4
This instruction is difficult to model using tree patterns, since it produces two results. There are three solutions to this problem:
- Ignore the autoincrement instructions, and hope they go away. This is an increasingly successful solution, as few modern machines have multiple-side-effect instructions.
- Try to match special idioms in an ad hoc way, within the context of a tree pattern-matching code generator.
- Use a different instruction algorithm entirely, one based on DAG patterns instead of tree patterns.
Several of these solutions depend critically on the register allocator to eliminate move instructions and to be smart about spilling; see .