Profiling

Whether it's blitting sprites to the display or rendering 3D texture-mapped polygons, your game will spend most of its time in small areas of the code. Although many times you can guess where the bulk of the code execution is, there almost always are surprises. Methods you think will be slow might turn out not to be slow at all, and methods you might have glanced over might be more computationally expensive than you thought.

Benchmarking

Benchmarking methods on your own can be wildly inaccurate. You can try to benchmark a method to see how long it takes by doing something like this:

int count = 1000000;
long startTime = System.currentTimeMillis();
for (int i=0; i<count; i++) {
 myCoolMethod();
}
long totalTime = System.currentTimeMillis() - startTime;
System.out.println("total time: " + totalTime);


This code calls a method a million times to see how long it takes. What's happening here is that you're writing a micro-benchmark. Lots of things can have an effect on the results of this benchmark, such as current memory conditions and the granularity of your system timer (which we talk about later in this chapter). Furthermore, a method could perform well by itself but poorly in other situations, such as in the final game. Also, HotSpot could be performing optimizations that would work for the benchmark but not for the actual game. In short, it's best to profile your entire app rather than try to benchmark a method.

Using the HotSpot Profiler

Luckily, HotSpot has a thorough profiler that allows you to easily check where all that time is spent, whether it's in your code or in the Java APIs. Run your game with the -Xprof flag:

java -Xprof MyCoolGameClass


When the game exits, the profiler spits out profiling information to the console. These real-world results are more detailed, more useable, and easier to measure than trying to benchmark your game by hand. The profiling data is broken down for each thread. Each thread has information for the interpreted, (HotSpot) compiled, and native methods that were called in that thread. Compiled methods are the ones that HotSpot compiles into optimized code; the interpreted methods are the method that aren't compiled or optimized by HotSpot. The native methods are the methods in native code, most often operating system–specific code that HotSpot doesn't touch, of course. Take a look at the example in Listing 16.1. This is a profiler output for the main thread of 3D engine from , "Collision Detection."

Listing 16.1 Profiled Main Thread
Flat profile of 179.23 secs (16583 total ticks): main
 Interpreted + native Method
 2.3% 0 + 379 sun.java2d.loops.Blit.Blit
 0.2% 2 + 32 sun.awt.windows.Win32SurfaceData.flip
 0.1% 0 + 21 com.brackeen.javagamebook.graphics.
 ScreenManager.setFullScreen
 0.1% 17 + 0 com.brackeen.javagamebook.test.
 GameCore.gameLoop
 0.1% 0 + 10 sun.java2d.loops.MaskFill.MaskFill
 0.0% 6 + 0 sun.awt.windows.WComponentPeer.flip
 0.0% 0 + 5 sun.java2d.loops.DrawGlyphList.
 DrawGlyphList
 0.0% 0 + 5 java.io.WinNTFileSystem.getLength
 0.0% 1 + 3 java.lang.System.currentTimeMillis
 0.0% 0 + 4 java.lang.Thread.sleep
 3.5% 76 + 509 Total interpreted (including elided)
 Compiled + native Method
 59.9% 9933 + 0 com.brackeen.javagamebook.graphics3D.
 FastTexturedPolygonRenderer$
 ShadedSurfaceRenderer.render
 8.1% 1346 + 0 com.brackeen.javagamebook.bsp2D.
 BSPRenderer.setScanDepth
 2.6% 435 + 0 com.brackeen.javagamebook.bsp2D.
 BSPRenderer.drawCurrentPolygon
 2.3% 384 + 0 com.brackeen.javagamebook.graphics3D.
 ScanConverter.convert
 1.7% 289 + 0 com.brackeen.javagamebook.graphics3D.
 SortedScanConverter$SortedScanList.add
 1.7% 278 + 0 com.brackeen.javagamebook.graphics3D.
 SortedScanConverter.isFilled
 1.3% 218 + 0 com.brackeen.javagamebook.graphics3D.
 SortedScanConverter$SortedScanList.add
 1.0% 164 + 0 com.brackeen.javagamebook.graphics3D.
 SortedScanConverter.convert
 0.7% 111 + 0 com.brackeen.javagamebook.graphics3D.
 SortedScanConverter$
 SortedScanList.insert
 0.5% 87 + 0 com.brackeen.javagamebook.math3D.
 Vector3D.rotateY
 0.5% 80 + 0 com.brackeen.javagamebook.math3D.
 Vector3D.rotateZ
 0.5% 78 + 0 com.brackeen.javagamebook.math3D.
 Vector3D.rotateX
 0.4% 70 + 0 com.brackeen.javagamebook.graphics3D.
 ScanConverter.clearCurrentScan
 0.4% 60 + 0 com.brackeen.javagamebook.math3D.
 ViewWindow.project
 0.4% 60 + 0 com.brackeen.javagamebook.math3D.
 Polygon3D.isFacing
 0.3% 54 + 0 com.brackeen.javagamebook.math3D.
 Polygon3D.clip
 87.2% 14449 + 6 Total compiled (including elided)
 Stub + native Method
 8.1% 2 + 1343 sun.java2d.loops.Blit.Blit
 0.5% 0 + 87 sun.awt.windows.Win32SurfaceData.flip
 0.1% 1 + 12 java.lang.Float.floatToIntBits
 0.0% 1 + 6 sun.awt.image.BufImgSurfaceData.
 getSurfaceData
 0.0% 0 + 6 java.lang.StrictMath.pow
 0.0% 2 + 3 java.lang.StrictMath.floor
 0.0% 1 + 3 java.lang.Runtime.totalMemory
 0.0% 0 + 3 java.lang.StrictMath.atan2
 0.0% 0 + 2 java.lang.Runtime.freeMemory
 0.0% 0 + 1 sun.java2d.loops.MaskFill.MaskFill
 8.9% 8 + 1467 Total stub
 Runtime stub + native Method
 0.1% 18 + 0 interpreter_entries Runtime1 stub
 0.0% 6 + 0 alignment_frame_return Runtime1 stub
 0.1% 24 + 0 Total runtime stubs
 Thread-local ticks:
 0.0% 8 Blocked (of total)
 0.1% 12 Class loader
 0.1% 17 Interpreter
 0.0% 7 Compilation


The amount of time the thread ran (179.23 seconds) is displayed at the top. Most of this listing shows the amount of time spent in each method, in a percentage of the total time. It shows the "ticks" for interpreted, compiled, and native methods. The ticks are a measurement of time spent in each method. At the bottom, you see 0.0% for "Blocked (of total)" This is the percentage of time this thread was blocked from executing. A thread can be blocked for several reasons, such as if it is sleeping or waiting for I/O, or if another thread is running. Because the amount of time this thread was blocked is very low, the nonblocked amount of time was very high, so you can tell this thread is taking up most of the processor for the game. If you optimized this code based on the profiler output in Listing 16.1, obviously the texture mapping would be a first choice because it's hogging up 59.9%. Also note that the entire scan-conversion process involves several methods that total about 9.1%. This makes all the scan-conversion methods take more time than the single setScanDepth() method, which takes 8.1%. In short, scan conversion is taking more time than setting the Z-buffer. Note that the compiled methods are the ones that HotSpot determines are "hot." For a long-running game, you probably won't see a lot of time spent in an interpreted method. Profiling lists information for all the threads, not just the ones you create. You'll see AWT-EventQueue-0 and AWT-Shutdown, among others. These threads will be blocked most of the time. One other thread I want to show is AWT-Windows, here in Listing 16.2.

Listing 16.2 Profiled AWT-Windows Thread
Flat profile of 181.55 secs (16723 total ticks): AWT-Windows
 Interpreted + native Method
 99.9% 0 + 16699 sun.awt.windows.WToolkit.eventLoop
 0.0% 1 + 1 java.util.EventObject.<init>
 0.0% 0 + 2 sun.awt.PostEventQueue.postEvent
 0.0% 1 + 0 java.awt.event.ComponentEvent.<init>
 0.0% 0 + 1 sun.awt.SunToolkit.wakeupEventQueue
 0.0% 0 + 1 java.awt.EventQueue.wakeup
 0.0% 1 + 0 java.awt.AWTEvent.<init>
 0.0% 1 + 0 java.util.WeakHashMap.expungeStaleEntries
 99.9% 4 + 16704 Total interpreted
 Compiled + native Method
 0.0% 0 + 2 sun.awt.AWTAutoShutdown.setToolkitBusy
 0.0% 1 + 0 java.awt.event.InputEvent.getModifiers
 0.0% 1 + 0 java.awt.GraphicsEnvironment.
 getHeadlessProperty
 0.0% 0 + 1 java.awt.EventQueue.wakeup
 0.0% 2 + 3 Total compiled
 Stub + native Method
 0.0% 0 + 1 sun.awt.SunToolkit.wakeupEventQueue
 0.0% 0 + 1 Total stub
 Runtime stub + native Method
 0.0% 1 + 0 monitorexit Runtime1 stub
 0.0% 1 + 0 Total runtime stubs
 Thread-local ticks:
 0.0% 2 Blocked (of total)
 0.0% 6 Compilation


As you can see, this thread also shows that it's blocked 0.0% of the time. Does this mean the AWT-Windows thread is hogging the processor, with the eventLoop() native method taking 99.9% of the time? Luckily, no, it's not. Whew! The profiler isn't foolproof. In this case, the eventLoop() method is blocked most of the time, waiting for events, but it's blocked in native code (all of its 16,699 ticks are shown in the native column). The profiler can't detect that a thread is blocked if it's blocked in native code. Therefore, you can't estimate time spent in native code in cases like this. Finally, at the end of the profiler output you get a summary:

Global summary of 181.75 seconds:
100.0% 16830 Received ticks
 0.1% 13 Received GC ticks
 0.1% 25 Compilation
 0.4% 75 Other VM operations
 0.1% 13 Class loader
 0.1% 18 Interpreter
 0.0% 8 Unknown code


The summary in this case is great news because so little time is spent in garbage collection, compilation, and all the other things the VM does. Another method to profile your code is with the -Xrunhprof flag. This has more detailed information and can also monitor the heap, but it is in a format that is less easy to read. For all the options for this flag, run this command:

java -Xrunhprof:help


Screenshot


   
Comments