Addressing Performance or Scalability Problems

Once we know where our performance problems are, we can look at addressing them. The earlier they are brought to light, the better our chance of eliminating them without the need for major design changes or reworking existing code. There are many techniques for addressing performance or scalability problems. In this section we'll consider some of the most useful.

Server Choice and Server Configuration

Before looking at design modifications or code changes, we should ensure that the cause of the problem isn't external to the app. Choice of app server and server configuration will have a vital influence on app performance. It's vital to tune the app server to meet the needs of the apps it runs. Unless your code is particularly bad, this will usually produce a better return than optimizing code, at no cost in maintainability. The performance tuning parameters available differ between app servers, but they typically include the following, some of which we've discussed in previous chapters:

Thread pool size. This will affect both the web container and EJB container. Too many threads will result in the JVM running into operating system limitations on how many threads can efficiently run concurrently; too few threads will result in unnecessary throttling of throughput.
Database connection pool size.
JSP recompilation support. Often, this can be disabled in production for a slight performance gain.
State replication mechanism (if any) for HTTP sessions and stateful session EJBs.
Pool size for all EJB types.
Locking and caching policy for entity beans (relevant only if using entity beans).
Locking and caching policy for any other O/R mapping layer, such as a JDO implementation.
JMS options such as whether messages are saved to a persistent backing store.
The transaction manager used. For example, it may be faster to disable two-phase commit support in the common case where we only use a single transactional resource.

Tuning JVM options and operating system configuration is another productive area. Usually the tuning should be that appropriate to the app server, rather than any particular app running on it, so the best place to look for such information is in documentation supplied with your app server. Important JVM options include initial and maximum heap size and garbage-collection parameters. See http://java.oracle.com/docs/hotspot/gc/ for detailed information on Sun 1.3 JVMs. It may also be possible to disable unused app server services, to make more memory available to apps and eliminate unnecessary background thread activity. app server vendors tend to produce good guidelines on performance tuning and J2EE performance in general. Some good resources are:

WebLogic
http://edocs.bea.com/wls/docs70/perform/
IBM WebSphere
http://www-4.ibm.com/software/webservers/appserv/ws_bestpractices.pdf
SunONE/iPlanet
http://developer.iplanet.com/viewsource/char_tuningias/index.jsp

Database configuration is equally important, and requires specialist skills.

Dispensing with Redundant Container Services

While J2EE app servers provide many valuable services, most of these services are not free. Using them unnecessarily may harm performance. Sometimes by avoiding or eliminating the use of unnecessary container services we can improve performance. For example, we can:

Avoid unnecessary use of EJB. All EJB invocations - even local invocations - carry an overhead of container interception. Thus the use of EJB where it doesn't deliver real value can reduce performance.
Avoid unnecessary use of JMS. Generally, JMS should be used when business requirements dictate the use of an asynchronous model. Use of JMS in an attempt to improve performance can often backfire.
Avoid use of entity EJBs where they don't add value, and where JDBC access is a simpler, faster alternative.
Avoid duplication in container services. For example, if we're relying on the web container to manage HttpSession objects, we may not need also to rely on the EJB container to manage stateful session beans; it will often prove more scalable to pass session state to stateful session beans as necessary, or avoid the use of EJB.

By far the most important performance gain is likely to bein avoiding unnecessary remote invocation; we discuss thisfurther below.

A simpler design is often a more performant design, and often leads to no loss in scalability.

Caching

One of the most important techniques to improve performance in J2EE apps is caching: storing data that is expensive to retrieve to enable it to be returned quickly to clients, without further retrieval from the original source. Caching can be done at many points in a J2EE architecture, but is most beneficial when it results in one architectural tier being able to avoid some calls to the tier beneath it. Caching can produce enormous performance gains in distributed apps, by eliminating remote calls. In all apps it can avoid calls from the app server to the database, which will probably involve network round trips as well as the overhead of JDBC. A successful caching strategy will boost the performance even of those parts of the app that don't directly benefit from cached data. Server response time will be better in general because of the reduced workload, and network bandwidth will be freed. The database will have less work to do, meaning that it responds faster and each database instance may be able to support a larger cluster of J2EE servers. However, caching poses serious design challenges, whether it is implemented by J2EE server vendor, app developer, or a third party, so it should not be used without justification in the form of test results and other solid evidence. If we implement a cache in app code, it means that we may have to write the kind of complex code, such as thread management, that J2EE promised to deliver us from. The fact that J2EE apps may be required to run in a clustered environment may add significant complexity, even if we use a third-party caching solution.

When to Cache

Before we can begin to cache data, there are several questions we should ask. Most relate to the central issues of staleness, contention, and clustering:

How slow is it to get the data without caching it? Will introducing a cache improve the performance of the app enough to justify the additional complexity? Caching to avoid network round trips is most likely to be worthwhile. However, as caching usually adds complexity, we shouldn't implement caching unless it's necessary.
Do we know the usage and volatility of the data set we want to cache? If it's read-only, caching is an obvious optimization. As we move along the continuum towards write-only, caching becomes less and less attractive. Concurrency issues rise and the benefit of avoiding reads diminishes.
Can we tolerate stale data? If so, how stale? In some cases this might be several minutes, making the return on caching much greater. In other cases, data must always be up-to-date, and no caching is acceptable. Business requirements should indicate what degree of staleness is permissible.
What is the consequence of the cache getting it wrong? If a user's credit card might be debited by an incorrect amount (for example, if cached price information was out of date), an error is obviously unacceptable. If, on the other hand, the occasional user on a media site sees a news headline that is 30 seconds out of date, this may be a perfectly acceptable price to pay for a dramatic boost in performance.
Can we cope with the increased implementation complexity required to support caching? This will be mitigated if we use a good, generic cache implementation, but we must be aware that read-write caching introduces significant threading issues.
Is the volume of data we need to cache manageable? Clearly, if the data set we need to cache contains millions of entities, and we can't predict which ones users will want, a cache will just waste memory. Databases are very good at plucking small numbers of records from a large range, and our cache isn't likely to do a better job.
Will our cache work in a cluster? This usually isn't an issue for reference data: it's not a problem if each server has its own copy of read-only data, but maintaining integrity of cached read-write data across a cluster is hard. If replication between caches looks necessary, it's pretty obvious that we shouldn't be implementing such infrastructure as part of our app, but looking for support in our app server or a third-party product.
Can the cache reasonably satisfy the kind of queries clients will make against the data? Otherwise we might find ourselves trying to reinvent a database. In some situations, the need for querying might be satisfied more easily by an XML document than cached Java objects.
Are we sure that our app server cannot meet our caching requirements? For example, if we know that it offers an efficient entity bean cache, caching data on the client may be unnecessary. One decisive issue here will be how far (in terms of network distance) the client is from the EJB tier.

The Pareto Principle (the 80/20 rule) is applicable to caching. Most of the performance gain can often be achieved with a small proportion of the effort involved in tackling the more difficult caching issues.

Important

Data caching can radically improve the performance of J2EE apps. However, caching can add much complexity and is a common cause of bugs. The difficulty of implementing different caching solutions varies greatly. Jump at any quick wins, such as caching read-only data. This adds minimal complexity, and can produce a good performance improvement. Think much more carefully about any alternatives when caching is a harder problem - for example, when it concerns read-write data.

Don't rush to implement caching with the assumption that it will be required; base caching policy on performance analysis.

A good app design, with a clean relationship between architectural tiers, will usually facilitate adding any caching required. In particular, interface-based design facilitates caching; we can easily replace any interface with a caching implementation, if business requirements are satisfied. We'll look at an example of a simple cache shortly.

Where to Cache

As using J2EE naturally produces a layered architecture, there are multiple locations where caching may occur. Some of these types of caching are implemented by the J2EE server or underlying database, and are accessible to the developer via configuration, not code. Other forms of caching must be implemented by developers, and can absorb a large part of total development effort. Let's look at choices for cache locations, beginning from the backend:

Location of cache	Implemented by	Likely performance improvement	Complexity of implementation	Notes
Database.	RDBMS vendor	Significant. However, database cached in the RDBMS is still a long way from a user of the app, especially in a distributed app.	No J2EE work required. Some database configuration may be required. Index creation is simple. We may also be able to use more efficient table types, depending on our target database.	RDBMS caching is often forgotten by J2EE developers. Most RDBMSs cache execution paths for common statements and may cache query results. RDBMS indexes amount to caching ahead of time, and can produce dramatic performance improvements. We looked at the use of `PreparedStatements` in app code in : this can ensure that the RDBMS can perform effective caching. Whatever the caching our J2EE server offers, and whatever caching we implement, the database cache should still be a help.
Entity bean cache.	EJB container vendor or vendor of CMP implementation	Varies. Can be very significant if it greatly reduces the number of calls to the underlying database. However, this presumes a highly efficient entity bean implementation: in a clustered environment, the cache will need to be distributed, raising problems of transactional integrity and replication. An entity bean cache is still a long way from the client: network round trips may be necessary to get to the EJB tier in the first place. Thus I feel that the value of entity bean caching is often overrated in distributed apps.	Nil, or very little.	The J2EE specification does not guarantee caching, meaning that an architecture that performs satisfactorily only with efficient entity bean caching is not portable. However, a third-party persistence manager might be used with multiple app servers.
Data cache in data access tier - that is, web container or EJB container - other than entity bean cache. For example, a JDO implementation or third-party O/R mapping solution such as TopLink.	Third-party vendor	Benefits similar to an entity bean cache. Also similar problems: will break in a clustered environment unless it is distributed, meaning that only high-end products will be suitable for scalable deployments.	Little.	Introduces dependence on another product besides the EJB container. However, we may have opted for a third-party O/R mapping tool for other reasons.
Session EJBs.	Developer	Depends on how expensive the retrieval of cached data was. Doesn't eliminate network round trips to the EJB container in distributed apps.	Little-to-Moderate.	It's difficult to use the Singleton design pattern in the EJB tier, so cached data may be duplicated in stateless session bean instances, with the caches potentially in different states. However, cached data will benefit all users of a stateless session bean. Stateful session bean s can only cache data on behalf of a single user. The `ejbCreate()` method is the natural place to retrieve data. However, it is easy to use lzy loading, retrieving resources only when first required in a business method, because EJBs can be implemented as though they are single threaded. Thus there is no need to perform synchronization, or worry about race conditions. The `ejbRemove()` method should be used for freeing resources that can't simply be left to be garbage collected. We discussed data caching in session beans in .
Business objects running in the web container.	Developer or third-party solution	Very significant. Eliminates network round trips to the EJB container in distributed apps. Even when EJBs are collocated in the same VM, an invocation on an EJB will be slower than an invocation of a local method.	Moderate-to-high.	Quick wins, such as caching reference data, will produce big returns. However, careful thought is advisable before trying to implement more problematic caching solutions. The J2EE infrastructure cannot help address concurrency issues. However, we may be able to use third-party solutions, two of which are discussed below.
Web tier.	Developer or third-party solution	Very significant.	Moderate-to-high.	There are a host of alternatives here, such as caching custom tags, caching servlets, and caching filters. Caching filters are particularly attractive, as they enable cache settings to be managed declaratively in the `web.xml` deployment descriptor.
Web tier.	J2EE server vendor	Very significant.	Little-to-moderate.	Similar to the above, but provided by the app server vendor. Web tier caching is provided by WebSphere and iPlanet/SunONE among other servers.
Cache in front of the J2EE app server, achieved by setting HTTP cache control headers or "edge side" caching.	Developer, possibly relying on a third-party caching product	Very significant.	Little.	We'll discuss "front" caching for web solutions under Web Tier Performance Issues below.

Generally, the closer to the client we can cache, the bigger the performance improvement, especially in distributed apps. The flip side is that the closer to the client we cache, the narrower the range of scenarios that benefit from the cache. For example, if we cache the whole of an app's dynamically generated pages, response time on these pages will be extremely fast (of course, this particular optimization only works for pages that don't contain user-specific information). However, this is a "dumb"form of caching - the cache may have an obvious key for the data (probably the requested URL), but it can't understand the data it is storing, because it is mixed with presentation markup. Such a cache would be of no use to a Swing client, even if the data in the varying fragments of the cached pages were relevant to a Swing client.

Important

J2EE standard infrastructure is really geared only to support the caching of data in entity EJBs. This option isn't available unless we choose to use entity EJBs (and there are many reasons why we might not). It's also of limited value in distributed apps, as they face as much of a problem in moving data from EJB container to remote client as in moving data from database to EJB container.

Thus we often need to implement our own caching solution, or resort to another third-party caching solution. I recommend the following guidelines for caching:

Avoid caching unless it involves reference data (in which case it's simple to implement) or unless performance clearly requires it. In general, distributed apps are much more likely to need to implement data caching than collocated apps.
As read/write caches involve complex concurrency issues, use third-party libraries (discussed below) to conceal the complexity of the necessary synchronization. Use the simplest approach to ensuring integrity under concurrent access that delivers satisfactory performance.
Consider the implications of multiple caches working together. Would it result in users seeing data that is staler than any one of the caches might tolerate? Or does one cache eliminate the need for another?

Third-party Caching Products for Use in J2EE apps

Let's look at some third-party commercial caching products that can be used in J2EE apps. The main reasons we might spend money on a commercial solution are to achieve reliable replicated caching functionality, and avoid the need to implement and maintain complex caching functionality in-house. Coherence, from Tangosol (http://www.tangosol.com/products-clustering.jsp) is a replicated caching solution, which claims even to support clusters including geographically dispersed servers. Coherence integrates with most leading app servers, includingJBoss. Coherence caches are basically alternatives to standard Java map implementations, such as java.util.HashMap, so using them merely requires Coherence-specific implementations of Java core interfaces. SpiritCache, from SpiritSoft (http://www.spiritsoft.net/products/jms-jcache/overview.html) is also a replicated caching solution, and claims to provide a "universal caching framework for the Java platform". The SpiritCache API is based on the proposed JCache standard API (JSR-107: http://jcp.org/jsr/detail/107.jsp). JCache, proposed by Oracle, defines a standard API for caching and retrieving objects, including an event-based system allowing app code to register for notification of cache events.

Important

Commercial caching products are likely to prove a very good investment for apps with sophisticated caching requirements, such as the need for caching across a cluster of servers. Developing and maintaining complex caching solutions in-house can prove very expensive. However, even if we use third-party products, running a clustered cache will significantly complicate app deployment, as the caching product - in addition to the J2EE app server - will need to be configured appropriately for our clustered environment.

Code Optimization

Since design largely determines performance, unless app code is particularly badly written, code optimization is seldom worth the effort inJ2EE apps unless it is targeted at known problem areas. However, all professional developers should be familiar with performance issues at code level to avoid making basic errors. For discussion of Java performance in general, I recommend Java Performance Tuning by Jack Shirazi from Oracle () and Java 2 Performance and Idiom Guide form Prentice Hall, (). There are also many good online resources on performance tuning. Shirazi maintains a performance tuning web site (http://www.javaperformancetuning.com/) that contains an exhaustive directory of code tuning tips from many sources.

Important

Avoid code optimizations that reduce maintainability unless there is an overriding performance imperative. Such "optimizations" are not just a one-off effort, but are likely to prove an ongoing cost and cause of bugs.

The higher-level the coding issue, the bigger the potential performance gain by code optimization. Thus there often is potential to achieve good results by techniques such as reordering the steps of an algorithm, so that expensive tasks are executed only if absolutely essential. As with design, an ounce of prevention is worth a pound of cure. While obsession with performance is counter-productive, good programmers don't write grossly inefficient code that will later need optimization. Sometimes, however, it does make sense to try a simple algorithm first, and change the implementation to use a faster but more complex algorithm only if it proves necessary. Really low-level techniques such as loop unrolling are unlikely to bring any benefit to J2EE systems. Any optimization should be targeted, and based on the results of profiling. When looking at profiler output, concentrate on the slowest five methods; effort directed elsewhere will probably be wasted. The following table lists some potential code optimizations (worthwhile and counter-productive), to illustrate some of the tradeoffs between performance and maintainability to be considered:

Technique	Performance improvement	Effect on maintainability
Minimize object creation, through techniques such as object pooling and "canonicalizing" objects (preventing the creation of multiple objects representing the same value).	Varies. May reduce the work of garbage collection. The performance benefit may not be very great with the sophisticated garbage collection of newer VMs.	Implementing such algorithms may be complex. Code may become harder to read. This is the kind of performance issue that should be kept in mind when writing code in the first place. We shouldn't create large numbers of objects without good reason if there is an alternative, such as using primitives.
Use the correct collection type: for example, `java.util.Linke dList` when we don't know how many elements to expect, and when one will be added at a time, or `java.util.Array List` when we know how many elements to expect. Remember all those data structures modules in Computer Science I? Sun have implemented many of the standard data structures as core library collections, meaning we just need to choose the most appropriate.	Varies. May be very significant if the list grows unpredictably or requires sorting.	None. We should access the Collection through its interface (such as `java.util.List`) rather than concrete class (such as `java.util.LinkedList`).
Use an exception rather than a check to end a loop.	Varies with virtual machine.	Likely to make code harder to read. This is an example of an optimization that should be avoided if possible.
Use final classes and methods.	Slight.	It's often good style to use final classes and methods, so we often use this "optimization" for other reasons (see ).
Avoid using `System.out.`	Significant if a lot of output is involved.	In any case, it's vital that an enterprise app uses a proper logging framework. This is discussed in .
Avoid evaluating unnecessary conditions. Java guarantees "short-circuit" evaluation of ands and ors. Thus we should perform the quickest checks first, potentially avoiding the need to evaluate slower checks.	Can be significant if a piece of code is frequently invoked.	None.
Avoid operations on `Strings`, using `StringBuffers` in preference. As `Strings` are immutable, `String` operations are likely to be inefficient and wasteful, resulting in the creation of many short-lived objects.	May be significant, depending on the JVM.	Due to the significant performance benefit, this is a case where a professional developer should simply get used to reading the slightly more verbose `StringBuffer` syntax. Note that the HotSpot JVM in Sun's JDK 1.3 appears to perform such optimization automatically; however, it's best not to rely on this.
Avoid unnecessary `String` or `StringBuffer` operations. Even `StringBuffer` operations are relatively slow.	Significant.	None. This is an area where we can safely achieve quick wins.
Minimize the use of interfaces, as they may be slower to invoke than classes.	Very slight.	This is the kind of "optimization" that has the potential to wreck a codebase. The marginal performance gain isn't worth the damage this approach can wreak.

String and StringBuffer operations can have a big impact on performance. Even StringBuffer operations are surprisingly expensive, as use of a profiler such asJProbe quickly demonstrates. Be very aware of string operations in heavily used code, making sure they are necessary and as efficient as possible. As an example of this, consider logging in our sample app. The following seemingly innocent statement in our TicketController web controller, performed only once, accounts for a surprisingly high 5% of total execution time if a user requests information about a reservation already held in their session:

 logger.fine("Reservation request is [" + reservationRequest + " ] ");

The problem is not the logging statement itself, but that of performing a string operation (which HotSpot optimizes to a StringBuffer operation) and invoking the toString() method on the ReservationRequest object, which performs several further string operations. Adding a check as to whether the log message will ever be displayed, to avoid creating it if it won't be, will all but eliminate this cost in production, as any good logging package provides highly efficient querying of log configuration:

 if (logger.isLoggable(Level.FINE))
 logger.fine("Reservation request is [" + reservationRequest + "]");

Of course a 5% performance saving is no big deal in most cases, but such careless use of logging can be much more critical in frequently-invoked methods. Such conditional logging is essential in heavily used code.

Important

Generating log output usually has a minor impact on performance. However, building log messages unnecessarily, especially if it involves unnecessary toString() invocations, can be surprisingly expensive.

Two particularly tricky issues are synchronization and reflection. These are potentially important, because they sit midway between design and implementation. Let's take a closer look at each in turn. Correct use of synchronization is an issue of both design and coding. Excessive synchronization throttles performance and has the potential to deadlock. Insufficient synchronization can cause state corruption. Synchronization issues often arise when implementing caching. The essential reference on Java threading is Concurrent Programming in Java: Design Principles and Patterns from Oracle (). I strongly recommend referring to this tutorial when implementing any complex multi-threaded code. However, the following tips may be useful:

Don't assume that synchronization will always prove disastrous for performance. Base decisions empirically. Especially if operations executed under synchronization execute quickly, synchronization may ensure data integrity with minimal impact on performance. We'll look at a practical example of the issues relating to synchronization later in this chapter.
Use automatic variables instead of instance variables where possible, so that synchronization is not necessary (this advice is particularly relevant to web-tier controllers).
Use the least synchronization consistent with preserving state integrity.
Synchronize the smallest possible sections of code.
Remember that object references, like ints (but not longs and doubles) are atomic (read or written in a single operation), so their state cannot be corrupted. Hence a race condition in which two threads initialize the same object in succession (as when putting an object into a cache) may do no harm, so long as it's not an error for initialization to occur more than once, and be acceptable in pursuit of reduced synchronization.
Use lock splitting to minimize the performance impact of synchronization. Lock splitting is a technique to increase the granularity of synchronization locks, so that each synchronized block locks out only threads interested in the object being updated. If possible, use a standard package such as Doug Lea's util.concurrent to avoid the need to implement well-known synchronization techniques such as lock splitting. Remember that using EJB to take care of concurrency issues isn't the only alternative to writing your own low-level multi-threaded code: util.concurrent is an open source package that can be used anywhere in aJava app.

Reflection has a reputation for being slow. Reflection is central to much J2EE functionality and a powerful tool in writing generic Java code, so it's worth taking a close look at the performance issues involved. It reveals that most of the fear surrounding the performance of reflection is unwarranted. To illustrate this, I ran a simple test to time four basic reflection operations:

Loading a class by name with the Class.forName (String) method. The cost of invoking this method depends on whether the requested class has already been loaded. Any operation - using reflection or not - will be much slower if it requires a class to be loaded for the first time.
Instantiating a loaded class by invoking the Class.newInstance() method, using the class's no-argument constructor.
Introspection: finding a class's methods using Class.getMethods().
Method invocation using Method.invoke(), once a reference to a method has been cached.

The source code for the test can be found in the sample app download, under the path /framework/test/reflection/Tests.Java. The following method was invoked via reflection:

 public String foo(int i) {
 return "This is a string with a number " + i + " in it";
 }

The most important results, in running these tests concurrently on a IGhz Pentium III under JDK 1.3.1_02, were:

10,000 invocations this method via Method.invoke() took 480ms.
10,000 invocations this method directly took 30Ims (less than twice as fast).
10,000 creations of an object with two superclasses and a fairly large amount of instance data took 21,37 Ims.
10,000 creations of objects of the same class using the new operations took 21,280ms. This means that whether reflection or the new operator is used will produce no effect on the cost of creating a large object.

My conclusions, from this and tests I have run in the past, and experience from developing real apps, are that:

Invoking a method using reflection is very fast once a reference to the Method object is available. When using reflection, try to cache the results of introspection if possible. Remember that a method can be invoked on any object of the declaring class. If the method does any work at all, the cost of this work is likely to outweigh the cost of reflective invocation.
The cost of instantiating any but trivial objects dwarfs the cost of invoking the newInstance() method on the relevant class. When a class has several instance variables and superclasses with instance data, the cost of object creation is hundreds of times more expensive than that of initiating that object creation through reflection.
Reflective operations are so fast that virtually any amount of reflection done once per web request will have no perceptible effect on performance.
Slow operations such as string operations are slower than invoking methods using reflection.
Reflective operations are generally faster - and some dramatically faster - inJDK 1.3.1 and JDK 1.4 than inJDK 1.3.0 and earlier JDKs. Sun have realized the importance of reflection, and have put much effort into improving the performance of reflection with each newJVM.

Important

The assumption among many Java developers that "reflection is slow" is misguided, and becoming increasingly anachronistic with maturing JVMs. Avoiding reflection is pointless except in unusual circumstances - for example, in a deeply nested loop. Appropriate use of reflection has many benefits, and its performance overhead is nowhere near sufficient to justify avoiding it. Of course app code will normally use reflection only via an abstraction provided by infrastructure code.

Previous Next

Comments