Optimizing Texture Mapping

Unfortunately, the Graphics2D interface doesn't provide a fast interface for drawing one pixel at a time, each with a different color. And there are currently no interfaces in Java to directly access video memory. Theoretically, this could be accomplished using a ByteBuffer that points directly to video memory, but you'd still have the JNI overhead of drawing one pixel at a time. To make up for this, you'll first draw pixels to a BufferedImage and then to the BufferedImage to the screen. This is like the concept of double buffering discussed in , "2D Graphics and Animation." With BufferedImages, you can extract the data from the image as an array and copy pixel data directly to it. Copying an element to an array is one of the fastest operations you can do. The drawback here is the extra time it takes to copy the image to video memory using a Graphics.drawImage() call. This takes about 5% to 10% of the processor on the machines I have tested. Still, the benefits outweigh the costs. One decision to make is what color depth to use for the BufferedImage. Of course, the BufferedImage should have the same color depth and pixel layout as the display so that the image can be blitted to the screen quickly without any color conversion. So, you must decide on a display depth of 8-bit, 16-bit, or 24-bit color. With 8-bit color depth, you're forced to use a particular color palette, which visually limits things such as shading. 24-bit (or 32-bit) color gives you the best color quality, but 16-bit color is a faster choice because there is less data to push around. 16-bit color has less color quality than 24-bit color, but it is good enough for a 3D game with texture mapping and lighting. With a 16-bit BufferedImage, you can extract the BufferedImage's underlying array like this:

BufferedImage doubleBuffer;
short[] doubleBufferData;
...
// get the buffer data DataBuffer dest = doubleBuffer.getRaster().getDataBuffer();
doubleBufferData = ((DataBufferUShort)dest).getData();


Next we discuss the storage format of textures.

Texture Storage

Just as the double buffer has the same color depth and pixel format as the display, you want the textures to have the same color depth and pixel format. You could use BufferedImages for textures as you did in the first example, but you really don't need all the functionality that a BufferedImage provides. Instead, create an abstract Texture class, in Listing 8.4, that simply enables the caller to get the color at a particular (x,y) location within the texture.

Listing 8.4 Texture.java
/**
 The Texture class is an abstract class that represents a
 16-bit color texture.
*/
public abstract class Texture {
 protected int width;
 protected int height;
 /**
 Creates a new Texture with the specified width and height.
 */
 public Texture(int width, int height) {
 this.width = width;
 this.height = height;
 }
 /**
 Gets the width of this Texture.
 */
 public int getWidth() {
 return width;
 }
 /**
 Gets the height of this Texture.
 */
 public int getHeight() {
 return height;
 }
 /**
 Gets the 16-bit color of this Texture at the specified
 (x,y) location.
 */
 public abstract short getColor(int x, int y);
}


The Texture class is abstract, so you have to extend it to get any use out of it. The getColor() method needs to be implemented to return the 16-bit color value of the texture at the specified (x,y) location within the texture. Remember that you want to be able to tile a texture across a polygon if you have to. You can do this easily if you restrict the width and height of each texture to be a power of 2, such as 16, 32, 128, and so on. For example, let's say you have a texture with a width of 32 pixels. The value of 32 in binary is represented (in 8 bits) as this:

00100000


This number minus one, 31, is called the mask. For a power of 2, subtracting 1 has the same effect as converting the 1 to a 0 and the following 0s to 1s. It is represented in binary as follows:

00011111


This mask represents all the valid bits for this range of numbers—or, in this case, the texture x coordinate. If you perform a bitwise AND operation (using &) with the incoming texture x coordinate, you chop off the unwanted upper bits and get a value within the range you need: 0 to 31. Remember, this works only for powers of 2. This idea is summed up in the PowerOf2Texture class shown in Listing 8.5.

Listing 8.5 PowerOf2Texture.java
package com.brackeen.javagamebook.graphics3D.texture;
/**
 The PowerOf2Texture class is a Texture with a width and height
 that are a power of 2 (32, 128, etc.).
*/
public class PowerOf2Texture extends Texture {
 private short[] buffer;
 private int widthBits;
 private int widthMask;
 private int heightBits;
 private int heightMask;
 /**
 Creates a new PowerOf2Texture with the specified buffer.
 The width of the bitmap is 2 to the power of widthBits, or
 (1 << widthBits). Likewise, the height of the bitmap is 2
 to the power of heightBits, or (1 << heightBits).
 */
 public PowerOf2Texture(short[] buffer,
 int widthBits, int heightBits)
 {
 super(1 << widthBits, 1 << heightBits);
 this.buffer = buffer;
 this.widthBits = widthBits;
 this.heightBits = heightBits;
 this.widthMask = getWidth() - 1;
 this.heightMask = getHeight() - 1;
 }
 /**
 Gets the 16-bit color of the pixel at location (x,y) in
 the bitmap.
 */
 public short getColor(int x, int y) {
 return buffer[
 (x & widthMask) +
 ((y & heightMask) << widthBits)];
 }
}


The PowerOf2Texture class keeps track of a short array to hold the texture and also the width and height mask. The getColor() method uses the masks to get the correct x and y values so you can tile the texture across the polygon. Finally, you need an easy way to extract data from BufferedImages to create your textures. To do this, add a static createTexture() method to the Texture class. For now, it creates only PowerOf2Texture, but later you'll extend it to create other texture types.

/**
 Creates a Texture from the specified image.
*/
public static Texture createTexture(BufferedImage image) {
 int type = image.getType();
 int width = image.getWidth();
 int height = image.getHeight();
 if (!isPowerOfTwo(width) || !isPowerOfTwo(height)) {
 throw new IllegalArgumentException(
 "Size of texture must be a power of two.");
 }
 // convert image to an 16-bit image
 if (type != BufferedImage.TYPE_USHORT_565_RGB) {
 BufferedImage newImage = new BufferedImage(
 image.getWidth(), image.getHeight(),
 BufferedImage.TYPE_USHORT_565_RGB);
 Graphics2D g = newImage.createGraphics();
 g.drawImage(image, 0, 0, null);
 g.dispose();
 image = newImage;
 }
 DataBuffer dest = image.getRaster().getDataBuffer();
 return new PowerOf2Texture(
 ((DataBufferUShort)dest).getData(),
 countbits(width-1), countbits(height-1));
}


In this function you assume that the 16-bit display has the same pixel format as the BufferedImage type TYPE_USHORT_565_RGB. So far, I haven't found a 16-bit display that doesn't have this pixel format, but theoretically one could exist. Alternatively, you can always use the createCompatibleImage() method from GraphicsConfiguration to create an image compatible with the display. Next, you create a new PolygonRenderer class, called FastTexturedPolygonRendrer. This new class has the same setup routines as SimpleTexturedPolygonRenderer, calculating the A, B, and C vectors. It also preps the doubleBufferData array, which is the array you copy pixels to. At the end of each frame, the double buffer is blitted to the display. This class also contains an inner class called ScanRenderer. ScanRenderer is an abstract class that contains a method to draw a horizontal scan of a polygon:

/**
 The ScanRenderer class is an abstract inner class of
 FastTexturedPolygonRenderer that provides an interface for
 rendering a horizontal scan line.
*/
public abstract class ScanRenderer {
 protected Texture currentTexture;
 public void setTexture(Texture texture) {
 this.currentTexture = texture;
 }
 public abstract void render(int offset,
 int left, int right);
}


You'll create a few different ScanRenderers for different types of drawing techniques.

Raw Optimization

The simplest ScanRenderer you can create is one that performs only a polygon-fill routine, and it's also the fastest (aside from one that does nothing):

public void render(int offset, int left, int right) {
 for (int x=left; x<=right; x++) {
 doubleBufferData[offset++] = (short)0x0007;
 }
}


Now that you have the general idea, create a ScanRenderer that performs the same calculations as the SimpleTexturedPolygonRenderer:

public void render(int offset, int left, int right) {
 for (int x=left; x<=right; x++) {
 int tx = (int)(a.getDotProduct(viewPos) /
 c.getDotProduct(viewPos));
 int ty = (int)(b.getDotProduct(viewPos) /
 c.getDotProduct(viewPos));
 doubleBufferData[offset++] =
 currentTexture.getColor(tx, ty);
 viewPos.x++;
 }
}


Poof, you have a much faster texture-mapper! You're copying pixel data from the texture to the double buffer much faster now. I tested this scan renderer on two machines by moving the camera close enough to a textured polygon so that it fills an entire 640x480 screen. On a 2.4GHz Pentium 4, this resulted in a speedup of 46 times, and on an 867MHz G4, this resulted in a speedup of 144 times. Not bad at all. This is a great start, but it could be better. You might look at this and think, okay, just some multiplication and a couple divides—no big deal. But keep in mind that you're doing those multiplications for every pixel on the screen. Here you can apply one of the simplest forms of optimization, which is moving expensive code out of the loop. In this case, you really don't need to calculate four dot products for every pixel because the a, b, and c vectors aren't going to change. Also, you know that viewPos.x will increase by 1 for every pixel, so you can predict how the dot products will change. Here's a next iteration of the ScanRenderer that performs these optimizations:

public void render(int offset, int left, int right) {
 float u = a.getDotProduct(viewPos);
 float v = b.getDotProduct(viewPos);
 float z = c.getDotProduct(viewPos);
 float du = a.x;
 float dv = b.x;
 float dz = c.x;
 for (int x=left; x<=right; x++) {
 doubleBufferData[offset++] = currentTexture.getColor(
 (int)(u/z), (int)(v/z));
 u+=du;
 v+=dv;
 z+=dz;
 }
}


Compared to the last scan renderer, this optimization resulted in a speedup of 1.6 times on the Pentium 4, and a similar speedup of 1.4 times on the G4. Next, you take advantage of a trick used with scan converting in the previous chapter: You use integers instead of floats. Converting every pixel from a float to an integer can be expensive, so instead use integers the entire time:

public static final int SCALE_BITS = 12;
public static final int SCALE = 1 << SCALE_BITS;
...
public void render(int offset, int left, int right) {
 int u = (int)(SCALE * a.getDotProduct(viewPos));
 int v = (int)(SCALE * b.getDotProduct(viewPos));
 int z = (int)(SCALE * c.getDotProduct(viewPos));
 int du = (int)(SCALE * a.x);
 int dv = (int)(SCALE * b.x);
 int dz = (int)(SCALE * c.x);
 for (int x=left; x<=right; x++) {
 doubleBufferData[offset++] =
 currentTexture.getColor(u/z, v/z);
 u+=du;
 v+=dv;
 z+=dz;
 }
}


The Pentium 4 handles floating-point operations pretty well, so there was no speedup there. The G4, however, resulted in a speedup of 1.2 times over the last scan renderer. This has got you pretty close to the wire. The only things you're doing are a couple array references, some integer addition, and a couple of integer divides. When you look at this, you might notice the biggest bottleneck: those two divides. Addition is cheap, but division can take several clock cycles. But those divides are required, right? You can't get rid of these divides without sacrificing texture mapping quality. That's the key: sacrificing quality. Luckily, you can sacrifice a bit of quality in a way that is visually acceptable and get a performance boost in the process. The idea is to calculate only the correct texture coordinates every few pixels and then interpolate between these correct values. In mathematics, interpolating between two values means estimating a value between two known values. In the next ScanRenderer, you compute the correct texture coordinates every 16 pixels (or less than 16, if there are fewer pixels left in the scan). This way, between those 16 pixels, you need only a few additions and bit shifts.

public static final int INTERP_SIZE_BITS = 4;
public static final int INTERP_SIZE = 1 << INTERP_SIZE_BITS;
...
public void render(int offset, int left, int right) {
 float u = SCALE * a.getDotProduct(viewPos);
 float v = SCALE * b.getDotProduct(viewPos);
 float z = c.getDotProduct(viewPos);
 float du = INTERP_SIZE * SCALE * a.x;
 float dv = INTERP_SIZE * SCALE * b.x;
 float dz = INTERP_SIZE * c.x;
 int nextTx = (int)(u/z);
 int nextTy = (int)(v/z);
 int x = left;
 while (x <= right) {
 int tx = nextTx;
 int ty = nextTy;
 int maxLength = right-x+1;
 if (maxLength > INTERP_SIZE) {
 u+=du;
 v+=dv;
 z+=dz;
 nextTx = (int)(u/z);
 nextTy = (int)(v/z);
 int dtx = (nextTx-tx) >> INTERP_SIZE_BITS;
 int dty = (nextTy-ty) >> INTERP_SIZE_BITS;
 int endOffset = offset + INTERP_SIZE;
 while (offset < endOffset) {
 doubleBufferData[offset++] =
 texture.getColor(
 tx >> SCALE_BITS, ty >> SCALE_BITS);
 tx+=dtx;
 ty+=dty;
 }
 x+=INTERP_SIZE;
 }
 else {
 // variable interpolation size
 int interpSize = maxLength;
 u += interpSize * SCALE * a.x;
 v += interpSize * SCALE * b.x;
 z += interpSize * c.x;
 nextTx = (int)(u/z);
 nextTy = (int)(v/z);
 int dtx = (nextTx-tx) / interpSize;
 int dty = (nextTy-ty) / interpSize;
 int endOffset = offset + interpSize;
 while (offset < endOffset) {
 doubleBufferData[offset++] =
 texture.getColor(
 tx >> SCALE_BITS, ty >> SCALE_BITS);
 tx+=dtx;
 ty+=dty;
 }
 x+=interpSize;
 }
 }
}


On the two test machines, this scan renderer resulted in a speedup of 1.6 times on the Pentium 4 and a speedup of 1.4 times on the G4. As mentioned before, this ScanRenderer does sacrifice some image quality, but most of the time it's not noticeable. It is noticeable if the depth of the polygon greatly changes along the horizontal scan line. So, when you're looking straight at a polygon, the depth of the polygon doesn't change when scanning from left to right, so it looks fine. Also, floors will look fine because their depth doesn't change from left to right. However, when you're looking at a vertical polygon at the side, the depth changes a lot; the textures may be slightly off and even look a little warped. This is particularly noticeable on low-resolution screens, where 16 pixels cover a larger amount of space. An idea to limit the distortion is to use a variable interpolation size, depending on the change of depth of the polygon. Little or no change in depth could give you a larger interpolation size, and a larger change in depth would require a smaller interpolation. I'll leave this enhancement to you, as an exercise for the reader!

Method Inlining

There's one other optimization to mention: inlining methods. Calling a method has a certain overhead, but here you are calling texture.getColor() for every pixel. Luckily, the HotSpot VM (the default VM included with the Java 1.4 runtime) is smart enough to inline certain methods. Inlining a method basically moves the code from the method to the caller's code, effectively eliminating the method call. HotSpot inlines short methods that it determines are nonvirtual. (Remember, a nonvirtual method is one that doesn't have any subclasses that override it.) In the case of Texture, only the PowerOf2Texture class implements the getColor() method, and the method is short enough that HotSpot will inline it. Later in this chapter, however, you'll have other Texture subclasses, such as ShadedTexture. When you have another class that implements the getColor() method, HotSpot won't inline it in the method, and you'll be stuck with method overhead for every pixel. I tried to trick HotSpot into inlining that method by declaring a final local variable, with the idea that HotSpot would know the final local variable's class and could inline the method:

final Texture texture = currentTexture;


Because this local variable is final, its class never changes, so I thought HotSpot might be capable of inlining its getColor() method. It would have been nice if this had worked, but it didn't. At the moment, I don't see why HotSpot can't do inlining in this case, but it doesn't, so we must find another solution. A solution that does work is to create a different ScanRenderer for each type of texture. The code for each of the renderers is exactly the same, except for a local variable, texture, that is created to explicitly reference the texture:

public class PowerOf2TextureRenderer extends ScanRenderer {
 public void render(int offset, int left, int right) {
 PowerOf2Texture texture = (PowerOf2Texture)currentTexture;
 ...
 // texture-mapping code goes here
 }
}


In the case of PowerOf2TextureRenderer, HotSpot now knows that the texture is a PowerOf2Texture and can inline the getColor() method accordingly. It's not the cleanest solution in terms of solid, object-oriented code—if you change one ScanRenderer, you have to change them all—but it works. Another drawback is that you need to make the PowerOf2Texture class (and any other Texture that has a ScanRenderer) final so that no other classes can subclass them and cause HotSpot to stop inlining. Sometimes you have to work with what you've got, and this is one of those times. Programmers of Assembly language, C, and C++ sometimes have to do crazy things to optimize for a certain piece of hardware or operating system, and Java programmers sometimes have to do crazy things for the VM. On the two test machines, the inlining optimization resulted in a speedup of 2.3 times on the Pentium 4 and a more modest speedup of 1.3 times on the G4. Overall, compared to the original slow texture-mapper, this renderer is 284 times faster on the Pentium 4 and 436 times faster on the G4. Nice!

Fast Texture Mapping Demo

Okay, so now you have a Texture class and a FastTexturedPolygonRenderer, so let's make another demo. The TextureMapTest2, shown in Screenshot, creates a convex polyhedron with textures mapped on all sides.

Screenshot The TextureMapTest2 demo draws a texture-mapped polyhedron at a much faster frame rate than the TextureMapTest1 demo.

Java graphics 08fig06.jpg


TextureMapTest2 uses a TexturedPolygon3D class, which is just like a regular Polygon3D except that it has fields for the polygon's texture and texture bounds. The source code for TextureMapTest2 mostly just creates a bunch of polygons. A method in TextureMapTest2 worth mentioning is the setTexture() method:

public void setTexture(TexturedPolygon3D poly, Texture texture) {
 Vector3D origin = poly.getVertex(0);
 Vector3D dv = new Vector3D(poly.getVertex(1));
 dv.subtract(origin);
 Vector3D du = new Vector3D();
 du.setToCrossProduct(poly.getNormal(), dv);
 Rectangle3D textureBounds = new Rectangle3D(origin, du, dv,
 texture.getWidth(), texture.getHeight());
 poly.setTexture(texture, textureBounds);
}


This method creates the texture bounds for a polygon. Using this method means you don't have to explicitly create the texture bounds for every polygon. The good news is that TextureMapTest2 is much, much faster! As usual, press R to see the frame rate on your system. Next, we move on to sprucing up this demo a bit by adding some shading.

Screenshot


   
Comments