Advanced Topics

You must deal with some additional issues to deploy your multi-player game system in the real world. These include handling flaky client connections, dealing with network firewalls, and doing performance profiling and enhancement.

Disconnects and Reconnects

A major headache of game server development is dealing with client disconnections. Players don't take too kindly to a loss of network connection that forces them to forfeit a game. Disconnections can happen for a number of reasons. If the client is on a modem connection, that client might get disconnected. On any connection, there could be heavy network congestion between the client and the server. You must take care to recognize latent or disconnected clients quickly and to gracefully deal with them. Ideally, players should be able to reconnect and pick up where they left off in the game. Depending on the game genre, this might or might not be possible, but it should at least be considered.

Ping Events

A technique that is often used to keep tabs on client connections is to send periodic ping events (also called heartbeat events) between the client and server. Ping events are useful not only for detecting disconnects, but also for measuring latency of the client connection. When a client fails to send a ping event for a certain period of time, you can assume the client is having a connection problem. A thread that monitors those events can then take appropriate action, such as notifying other players in the client's game.

The Reaper

The example framework currently allows clients to remain connected indefinitely. This can lead to problems if idle players are wasting valuable server resources or clogging game lobbies. To avoid wasting server load on clients that are not active, it might be useful to disconnect idle clients after a fixed interval. This can be accomplished by having a period task (the Reaper) that checks the last action time of a client, which could be the time of the client's last non-ping event. Any clients that have not been active for the timeout period are sent a disconnect event and have their connection closed. java.util.Timer or a custom dedicated thread can be used to implement the Reaper.

HTTP Tunneling

In the not too distant past, it was possible to run a server on just about any port that you wished, and expect the majority of clients to be able to connect to it. However, the reality of the Internet today is that increasing security threats have sysadmins on the defensive. Because of this, a large number of businesses, home users, and even entire Internet access providers are blocking access to all unprivileged ports (greater than 1024) and even common ports such as 21 (FTP) and 25 (SMTP). As a result, one of the few (mostly) ports guaranteed to be open is the HTTP port (80). So you simply run your server on port 80, right? Well, the trick here comes when using applets for the game client. For unsigned applets, network access is limited to the server from which it was downloaded (the origin server). So, if the applet needs to be downloaded from a web server running on port 80, how can our game server also be running on port 80? Even worse, many firewalls do packet filtering and ensure that traffic on port 80 is indeed legitimate HTTP traffic. Currently, our game traffic looks nothing like the HTTP protocol. To top it off, HTTP is a stateless transactional protocol. In other words, the basic flow is request, response, request, response, and so on. The client always sends a request to the server to get information. For the game server, however, we require that the server be capable of pushing information to the client at any given time; the assumption is that there is a two-way, always-connected pipe. But don't despair: There are solutions to these problems, and we consider two of them here. First there are a couple of tricks needed for either method. The first is to wrap the communications to look like valid HTTP communications. Instead of using the over-the-wire protocol header discussed earlier, events would get wrapped in HTTP request and response headers. Additionally, using either XML or other ASCII event formats would be more in tune with using HTTP, but it is still possible to use binary event payloads, if desired. The second trick is to overcome the lack of full-duplex communication channels. This can be done by using a pair of connections, both originating from the client. Both use HTTP 1.1 keep-alive connections. We will call the first one the server push connection. The client opens this first, sends a single event to the server on it, and then reads events from the server only as they are available. We will call the second connection the client request connection, and it will be used for the client to send all further events to the server. That defeats the packet filters, but the other issue is that you need to serve the applet files and all related resources (images, sounds, and so on) from the same address as your server communications. Here you have two options: Either actually serve both types of traffic from the same server, or trick the client into thinking that it is coming from the same server.

Option 1: Combo-Server

The first possible solution is to give your game server the functionality of a web server in addition to its normal responsibilities. When you receive a client request, you just need to determine whether it's game traffic. If so, you need to route it to the correct GameController; if not, you need to send it to a DownloadController that will handle the file serving. A few problems arise with this first solution. The biggest one is that it is just not clean. You're writing a game server here. Why should you have to waste your time working on a finely tuned web server when Apache could do much better in its sleep? What's a poor game developer to do? Unfortunately, not much, but a developer with a bit of cash to blow can buy a number of off-the-shelf solutions in the form of an URL-inspecting load balancer so that hardware can sniff your incoming messages and route them accordingly.

Option 2: URL-Based Load Balancing

For those not familiar with high-end network gear, URL-based load balancing is a feature present in many firewall and load-balancer appliances today. These devices enable you to route traffic coming into a single IP and port combination across a server farm based on a number of criteria. The URL-inspection feature allows that routing to be performed based on the contents of the URL string in an HTTP request. Consider the following pseudo-code for the load-balancing logic:

if (URL_PATH matches "/GameServer*")
 originServer = "gameserver.hypefiend.com"
else
 originServer = "webserver.hypefiend.com"

So, a request like

GET /launchgame.html

GET /applet/gameapplet.jar

would get directed to webserver.hypefiend.com, which can be running a standard Apache or other web server on port 80 and can hold all of the game assets (web pages, graphics, applets, sounds). On the other hand, requests such as this get directed to gameserver.hypefiend.com, running the game server, now having only to be a game server:

GET /GameServer/gamename=rps&playerid=bret

To the client (in this case, the Java VM), the process is transparent. Both the downloads and the game traffic seem to be coming from the same IP address and port. A good load balancer is essential for any large system deployment, so this hardware might not really be an added expense. We'll leave it as an exercise for you to adapt your GameServer to use HTTP tunneling instead of raw sockets connections. It's not a huge stretch-the same basic architecture and all of the GameController code can be left untouched. Mainly you need to do the following:

Adapt SelectAndRead to recognize HTTP headers and pull event payloads from HTTP POST data.
Adapt EventWriter to wrap responses with HTTP headers.
Modify the GameServer and Player interface to allow for two connections per player.

Testing with Bots

To fulfill the design goal of handling a large number of simultaneous users, and to be able to test that functionality without having a small hoard of volunteer testers, you must find a way to simulate the load of a lot of users. To do this, you do what is commonly referred to as bot testing. Bot testing involves crafting a version of the client app that can provide unattended simulation of a large number of users connected and playing games. Although this will not accurately simulate all aspects of a real high user load, it allows testing of a number of key performance, stability, and longevity factors. Things to look for during bot testing are listed here:

The maximum number of simultaneous bot users the server supports
The event throughput (events/sec) at various levels of simultaneous connections
Connection delay at various levels of simultaneous connections
CPU usage at various levels of simultaneous connections
Event latency (the time it takes from the event being received to the time it starts processing)
EventQueue backlogs (for tuning your Wrap pool sizes)
Memory usage over time, carefully checking for any continuous growth that would indicate a leak (yes, it is possible in Java)
Any unusual exceptions or other errors
Any thread deadlocks

It is recommended that you have as many separate machines doing the bot testing as possible. The fewer number of connections per machine you have, the closer it will be to simulating real users. After all, many bots on the same machine actually send their packets to the server sequentially, not truly simultaneously. Definitely avoid running the bots on the same machine as the GameServer. Doing so will severely skew any test results.

Those Pesky Modems

Bot testing on your local network or even on remote well-connected machines will not provide some things when your target user audience might be mostly connected by modems (sometimes as slow as 28.8; can you imagine?). Modem connections are notorious for having extremely high latency (typically in the 250ms range just to reach the ISP), sporadic throughput, packet loss, and frequent disconnects. One trick that can be used to simulate at least some of these problems is to use a tool called a modem-emulating proxy server. A number of these tools are available, as a quick Google search will reveal. They provide a proxy server that limits the bandwidth of the connection and introduces artificial latency. One such app is called Sloppy (from slow proxy) and is available at www.dallaway.com/sloppy/. It proxies only HTTP communication, however, not generic TCP traffic, so unless you are using HTTP encapsulation, you'll need to find another tool. Proxy servers get you only so far. It is highly recommended that if your audience will be using modems, you do regular and thorough testing using the same hardware, OS, ISP, and so on as your audience does. It might be painful, but your players will thank you.

Profiling and Performance Stats

To optimize your app, it is first necessary to evaluate the efficiency of your server implementation and identify the bottlenecks. To do this, you need empirical evidence. Most programmers have used profiling tools, which measure the amount of time that a program spends executing each method. Java includes profiling tools as part of the Java Runtime, and a number of third-party tools are available as well. However, usually it becomes necessary to augment those tools with some custom code for generating performance statistics that is unique to your app. For instance, in our GameServer, knowing which methods your apps spend most of their time in is certainly useful, but the key metrics of interest are as follows:

Event processing time
Game logic times for each GameEvent type
Event latency (time that an event waits in queue before processing)
Database query times
Time to add a new connection

An easy way to obtain these stats is to insert some code around your key methods to check times and record to a log file for further analysis. Listing 6.14 shows the processEvent() method from RPSController with added performance stat code.

Listing 6.14 `processEvent()` with Timing Code

Public void processevent(event) {
 long start = System.currentTimeMillis();
 switch (e.getType()) {
 case GameEventDefault.C_LOGIN:
 login(e);
 break;
 case GameEventDefault.C_LOGOUT:
 logout(e);
 break;
 case GameEventDefault.C_JOIN_GAME:
 join(e);
 break;
 case GameEventDefault.C_QUIT_GAME:
 quit(e);
 break;
 case GameEventDefault.C_CHAT_MSG:
 chat(e);
 break;
 case GameEventDefault.C_MOVE:
 move(e);
 break;
 case GameEventDefault.C_GET_PLAYERS:
 getPlayers(e);
 break;
 }
 // log the method name, the event type,
 // the duration in the queue, time to process,
 // and current queue size
 statslog.info("processEvent, " + event.getType() + "," +
 (start - event.getQueuedTime()) + "," +
 System.currentTimeMillis() - start) + "," +
 eventQueue.size();
}

If you tweak your Log4J Appender to output only the timestamp and the data that you provide, you will have a nice comma-separated value (CSV) file that you can analyze with external tools such as Excel or some Perl scripts.

Performance Tweaks

You can tweak performance of server apps in endless ways. The most important thing is not to spend time optimizing code that doesn't need it. It might seem obvious, but often what you think is your bottleneck really isn't. The only way to know for sure is to perform profiling to measure which code is actually wasting most of your CPU time. That being said, because most of your server's effort involves handling GameEvents, there are a couple of simple rules for tweaking that part of the core system:

Make events smaller. The smaller the event, the less time it takes to allocate, serialize, and transmit.
Send fewer events. Every event counts, so if a client doesn't absolutely need the data, don't send it. For example, in GameController.sendBroadcastEvent(), we don't send the event to the originator of the message.

The Evil Trash Collector

By default, the Java VM runs a garbage-collection thread in the background that periodically fires up and does a full garbage-collection run. Another method is available, called incremental garbage collection, that is invoked with the -Xincgc command-line switch. Incremental garbage collection avoids the "stall" that can happen as the garbage collection loads down the server in a short burst. Instead, the garbage collector runs more frequently but does less work in each cycle. As a result, there will not be such a sudden burst of work, but at the expense of worse overall performance of the GC. Additionally, JDK 1.4.1 includes some experimental garbage-collection options: -XX:+UseConcMarkSweepGC and -XX+UseParallelGC. These options are nonstandard and should be used with caution, but they might provide great benefits to your app's performance. For more details, see http://java.oracle.com/docs/hotspot/VmOptions.html.

Object Reuse

Speaking of garbage collection, the best route to avoiding it is to use fewer objects in the first place. There are a few tactics for doing this. The first is basic optimization, going through and removing unnecessary allocations. On top of that, simple object reuse, as done with the outgoing chat events in the RPSController, can be helpful. But the best tactic for oft-created objects is to use an object pool. Object pools, like thread pools and database connection pools, provide a mechanism for reusing objects. The pool is created with an initial number of objects of the given class. When you need an instance of the class, instead of using the new operator to create one, you call a method of the pool to fetch an instance. Our game server could benefit greatly from pooling some objects, notably GameEvents. An active GameServer can be creating GameEvents at an astounding rate. But the life of these events is very short. They are created, sent to a client (or clients), and then destroyed. Pooling GameEvents could make a drastic difference in the performance of a loaded game server. The hard part is figuring out how many objects are required in the pool, but this can be determined during automated testing, or the pool can be made adaptive using a low-priority background thread to adjust the size of the pool. Many web and app servers use that technique for managing their request-handling thread pools.

Other Tweaks

Here are some other suggestions for getting the most out of your game server:

Threads. Keep the number of threads down. Performance stats help identify what Wraps need more or fewer threads by checking their average queue size. Every thread, idle or not, takes additional overhead, so start with the least amount you can.
Synchronization. Keep it tight. Keep statements that don't need to be there out of the synchronized blocks.
Busy loops. Avoid using busy wait loops, like this one:
```
while (true) {
 checkForSomeCondition();
 try {
 Thread.sleep(SLEEP_TIME);
 }
 catch(InterruptedException ie) {}
}
```
While waiting for something, use a blocking queue or similar construct. For executing periodic tasks, use java.util.Timer.
Logging. Remove all logging statements, even ones that won't be logged, in tight inner loops. Keep a check on logging that is enabled in the live code. Too much logging kills a server's performance.