Optimization Tricks
In this section, we go over some optimization tricks that HotSpot doesn't automatically do for you. Many of these are strength-reduction optimizations, which is simply a way to calculate a function more efficiently. We also talk about lookup tables and fixed-point math.
Algorithms
First, though, let's talk about algorithms. Choosing a better algorithm will give you better performance than applying other optimization tricks. An example is using a BSP tree instead of sorting polygons by z-depth or arranging objects so that fewer collision tests need to be made. Use your head and think creatively: Whenever you're using a "brute force" algorithm, try to come up with something better. Here are a few examples of using the right algorithms in the Java collection classes:
- Use a fast sort algorithm, such as QuickSort instead of Bubble Sort. Arrays.sort() and Collections.sort() use a tuned QuickSort algorithm.
- Use a HashMap for quick searches. Using a HashMap is faster than iterating through a List.
- Choose the right list: LinkedList has faster insertion and removal, but ArrayList has faster random access. The Vector class is like ArrayList, but with synchronized methods.
Also with lists, think about how you traverse them. Because an ArrayList has random access, you can use something like this:
// works fine for ArrayList, slower for LinkedList for (int i=0; i<list.size(); i++) { System.out.println(list.get(i)); }
But this wouldn't work as well for a LinkedList because the get() method searches through the list to find the index you're looking for. Instead, use an Iterator to traverse the items in a LinkedList:
// works better for LinkedList Iterator i = list.iterator(); while (i.hasNext()) { System.out.println(i.next()); }
Besides these basic examples, a great way to find some more ideas is to just pick up an algorithm textbook. (Well, you might have to read it, too.)
Strength Reduction: Bit Shifting
Remember that internally to the computer, numbers are stored as binary values, as in 1001011. With binary math, you can perform all sorts of neat operations, such as ultra-fast bit shifting. If you shift bits to the left, you have the same effect as multiplying by 2. In Java, shifting to the left looks like this:
(x << 1)
This statement shifts the bits to the left by 1, resulting in a multiplication of 2. Likewise, shifting over 2 bits is the same as multiplying by 4, and by 3 bits is the same as multiplying by 8. In short, multiplications of a power of 2 can be reduced using bit shifting. Instead of this:
int a = x * 32;
the bit-shifting version would be this:
int a = x << 5;
Likewise, the same works for division, only to the right. This division:
int a = x / 32;
becomes this:
int a = x >> 5;
This optimization varies in usefulness depending on the processor. Some processors show no speed increase because integer multiplication and division is pretty fast anyway, but other processors show a modest improvement.
Strength Reduction: Modulus
If you're performing a modulus operation on a power of 2, such as this:
int a = x % 32;
you can use the super-fast & operator instead. This involves using a bitmask, where the bitmask is the modulus number minus 1. For example, the binary value of 32 is 100000, and the bitmask is 011111 (or 31 in decimal form). So, using the & operator and the bitmask, you get this:
int a = x & 31;
Note that this works only for powers of 2.
Strength Reduction: Multiplication
Sometimes you want to calculate an entire series of multiplications, like this:
for (int i=0; i<array.length; i++) { array[i] = x*i; }
In this example, if x was 5, the result would be 0, 5, 10, 15, 20, and so on. See the pattern? Instead of multiplying in each iteration of the loop, you can use addition, which is faster than multiplication. Here, you start with a value of 0 and add x to this value every iteration of the loop:
value = 0; for (int i=0; i<array.length; i++) { array[i] = value; value+=x; }
Strength Reduction: Exponentiation
Similarly to how you reduced multiplication to addition, you can reduce exponentiation to multiplication. Here's an example:
for (int i=0; i<array.length; i++) { array[i] = (int)Math.pow(x,i); }
So, if x was 5, the result would be 1, 5, 25, 125, 625, and so on. Here, each number is simply the previous number times 5. Instead of calculating using the expensive pow() function, you can use a series of multiplications:
value = 1; for (int i=0; i<array.length; i++) { array[i] = value; value*=x; }
Both the multiplication and exponentiation examples show how to recognize patterns in a series of numbers and to base a calculation off the previous result rather than from scratch. Be sure to use this idea for other number patterns as well.
More Loop Invariant Hoisting
It's generally a good idea to move as much code out of a loop as possible so that those statements aren't executed repeatedly within the loop. HotSpot cannot do every loop invariant optimization itself. Here are a couple more examples. In this first example, the if statement is executed for every iteration of the loop, even though the result of the statement will never change while in the loop:
for (int i=0; i < 1000; i++) { if (bool) { doSomething(); } }
So, move the if statement out of the loop:
if (bool) { for (int i=0; i < 1000; i++) { doSomething(); } }
In this next example, the exact same Color objects are created every time a line is drawn:
for (int i=0; i < 100; i++) { g.setColor(new Color(0xff3366)); g.drawLine(i*10, 0, i*10+100, 100); g.setColor(new Color(0xff3366)); g.drawLine(i*10+1, 0, i*10+101, 100); }
So, you can move the object creation out of the loop:
Color red = new Color(0xff3366); Color blue = new Color(0x6633ff); for (int i=0; i < 100; i++) { g.setColor(red); g.drawLine(i*10, 0, i*10+100, 100); g.setColor(blue); g.drawLine(i*10+1, 0, i*10+101, 100); }
Lookup Tables
Another age-old optimization trick is to use lookup tables for computationally expensive functions. Lookup tables work by calculating many values of an expensive function and storing the results in a table. Later, the game can just look up a value from the table instead of computing it every time. This takes up a small amount of memory but makes things run faster. For example, instead of computing an equation:
y = computeFunction(x);
you can get the value from a table, where x is the index in the table:
y = table[x];
The drawback to lookup tables is that you have to limit the number of values that you precompute. However, in many cases you won't need every possible value anyway. For an example, you'll create a sine and cosine lookup table. For these trig functions, you usually want to compute the radian values from 0 to 2p. However, you can't use the values from 0 to 2p as indices in an array. There are only six integer values between those numbers, and you want way more than six precomputed values. In this case, use your own system instead of radians. In this system, you can create 4,096 precomputed values, from 0 to 4095. The value 2048 would be the same as p, and 4096 would be the same as 2p. To get a visual picture, take a look at Screenshot.
Screenshot The conversion from radians (0 to 2p) to your own system (0 to 4096).
I pretty much picked 4096 arbitrarily. A value as low as 256, for instance, wouldn't be enough granularity, but 4096 gives pretty good results. Also, by using 4,096 values, which is a power of 2, you can easily keep values within a valid range. If you want the cosine of -200 or the sine of 5,000, you can translate the value using the & operator with the bitmask:
x = x & 4095
Also, you can take advantage of one little trick with this equation:
cos(x) = sin(p/2-x)
Using this equation, you can get the cosine of a value from the sine lookup table. So, for both cosine and sine, you only need one table, not two. Now you'll implement a lookup table. The complete code is in Listing 16.3 and is part of the MoreMath class.
Listing 16.3 Trig Table Functions in MoreMath.java
package com.brackeen.javagamebook.util; /** The MoreMath class provides functions not contained in the java.lang.Math or java.lang.StrictMath classes. */ public class MoreMath { // a trig table with 4096 entries private static final int TABLE_SIZE_BITS = 12; private static final int TABLE_SIZE = 1 << TABLE_SIZE_BITS; private static final int TABLE_SIZE_MASK = TABLE_SIZE - 1; private static final int HALF_PI = TABLE_SIZE / 4; private static final float CONVERSION_FACTOR = (float)(TABLE_SIZE / (2 * Math.PI)); private static float[] sinTable; // init trig table when this class is loaded static { init(); } private static void init() { sinTable = new float[TABLE_SIZE]; for (int i=0; i<TABLE_SIZE; i++) { sinTable[i] = (float)Math.sin(i / CONVERSION_FACTOR); } } /** Cosine function, where angle is from 0 to 4096 instead of 0 to 2pi. */ public static float cos(int angle) { return sinTable[(HALF_PI-angle) & TABLE_SIZE_MASK]; } /** Sine function, where angle is from 0 to 4096 instead of 0 to 2pi. */ public static float sin(int angle) { return sinTable[angle & TABLE_SIZE_MASK]; } /** Converts an angle in radians to the system used by this class (0 to 2pi becomes 0 to 4096) */ public static int angleConvert(float angleInRadians) { return (int)(angleInRadians * CONVERSION_FACTOR); } }
When this class loads, the static init() method is called, which initiates the lookup table for sine. Also included is a function to convert radians to the system this class uses, in the angleConvert() method. The sin() and cos() functions return the floating-point value for the specified angle. Of course, lookup tables don't have to be limited to trig functions. They can be used for more complicated formulas. You could also use them on the server side, to cache expen sive lookups from the database. Use tables wherever they will help speed things up.
Fixed-Point Math
We covered fixed-point math in , "3D Graphics," so I won't repeat it here. Just keep in mind that floating-point arithmetic can sometimes be too slow, especially for operations such as converting a float to an int. Fixed-point math can help speed it up, but it also has a couple drawbacks: less accuracy and overflow issues when multiplying two fixed-point values.
Exceptions
Using exceptions can hurt HotSpot compilation and also limit what HotSpot can inline. First, don't catch exceptions if you don't need to. For example:
try { return array[index]; } catch (ArrayIndexOutOfBoundsException ex) { // do nothing }
Instead, just perform the bounds check yourself:
if (index > 0 && index < array.length) return array[index]; }
Of course, exceptions are commonly used in object-oriented systems, and many times throwing an exception won't noticeably hurt performance. Take this example:
public void takeItem(Item item) throws InventoryFullException;
In a game, a player might not pick up an item very often, so something like this would not hurt performance.
Input/Output
I/O can be one of the bottlenecks of a game, mainly because you have to wait for something slow to occur, such as accessing the hard drive or getting bytes from a painstakingly slow modem. Here are a few tips on making I/O a little easier to deal with:
- Use BufferedInputStream (for binary files) and BufferedReader (for text files). These classes read chunks of data at a time instead of 1 byte at a time. This will help no matter what you're reading from, either a file or a stream from the server.
- Perform I/O in a separate thread from your main loop. Reading and writing to a file or communicating with a server takes some time, so there's no reason to cause the game to appear to "freeze" when you're performing I/O.
- Consider combining several little files into one big file so that fewer files have to be opened.
- For large files (several megabytes), consider using memory-mapped files.
What? Memory-mapped files, what's that all about? Hey, I'm glad you asked.
Memory-Mapped Files
Memory-mapped files are files on, say, a hard drive, that are mapped to a region of memory. Programs can read from memory-mapped files just like reading from memory. Sections of the file are copied and removed from memory as needed. To memory-map a file for reading, just open the file as usual and use the FileChannel class to map it to a buffer:
File file = new File(filename); FileInputStream stream = new FileInputStream(file); MappedByteBuffer buffer = stream.getChannel().map( FileChannel.MapMode.READ_ONLY, 0, file.length()); stream.close();
The map() method allows you to map any part of the file you want. Here, you're mapping the entire file. The method returns a MappedByteBuffer. After the file is mapped, you can close the original stream (stream.close()). The operating system handles the memory management of the file. Typically, none of the files is loaded into memory until you start reading from it. You can read from the file using any of MappedByteBuffer's get() methods. Here's an example to copy a portion of the file to a 1KB array:
byte[] array = new byte[1024]; buffer.get(array);
A memory-mapped file is unmapped after it is garbage-collected. So, when you're done with the file, you can just clear the reference so that the garbage collector can do its work:
buffer = null;
Memory-mapped files don't always give a performance increase. With smaller files, memory-mapped files will probably be slower, and you won't see much of a difference if you're reading a file sequentially. But memory-mapped files are great for large files that need random access, such as databases or large map files with embedded textures.