Case Study: The Pet Store

We will use JMeter to gather performance metrics on a new prototype of the AAA Pet Store and compare the performance of a new technology (XML/XSLT) to the existing technology (straight JSP). These numbers will help us decide whether to make the switch in the near future.

Business Need

XML and Extensible Style Language Transformation (XSLT) have garnered a lot of media attention recently. Therefore, let's imagine that our chief technology strategist has suggested that the design team look into porting the AAA Pet Store's presentation logic to this new technology. Always cautious, the engineering team decided that before they could create any estimates related to XML use, we had to know more. So, we went ahead and built a prototype of the pet store that converts model data into XML using an open-source API called JDom. One of our enterprising design team members learned XSL and wrote style sheets to turn the XML into HTML. Then, we used Apache's Xalan engine to perform the transformation. Voilà! We had a working prototype. Management was ready to move again, but a couple of programmers had doubts. Something bothered us about the prototype—it seemed as though we were just adding the XML/XSLT on top of our existing code. After all, we still had JSPs pushing out HTML; they were just going through some extra steps. XP says, "Listen to your gut." Before management commits us to a new schedule involving XML, we decide to do some quick performance analysis to see what kind of price we are paying for these extra steps. It may be that the prototype, while functional, doesn't represent a design that will work under load.

Prototype Architecture

Let's take a look at the prototype's structure so we can get a better sense of what we are about to test. The new interface reuses the app's model code, so there's nothing new there. The changes begin to happen at the JSP level. Instead of spitting out HTML, the JSPs generate XML elements using JDOM ( The XML generation looks like this (from productlist.jsp):

Element eProduct = new Element("CURRENT_PRODUCT");
 Element id = new Element("ID");
 Element name = new Element("NAME");
 Element description = new Element("DESCRIPTION");
 Element ePrice = new Element("PRICE");

This page looks like a good candidate for refactoring. Perhaps if we pursue this architecture, we can write use some object-to-XML mapping tools. Also, this code probably should execute in a separate class so that we can test it more easily. We take down these ideas as notes for later. The XML that this page produces looks something like the following:


After all, the XML has been modeled in JDOM objects; the JSP writes the XML out to a String and then pass it to Xalan's XSLT processing engine. We note this operation as a potential performance bottleneck. JDOM may already represent its object in a form that an XSLT engine can read—reading from and writing to Strings could be unnecessary. The XSLT engine uses a style sheet specified by the JSP to transform XML output of the page into HTML. Here is the XSL for the product element:

<xsl:if test="CURRENT_PRODUCT">
 <table width="100%" cellpadding="0" cellspacing="0" border="0">
 <td align="center">
 <xsl:value-of select="CURRENT_PRODUCT/NAME" /> </b>
 <br /> <xsl:value-of select="CURRENT_PRODUCT/DESCRIPTION" /> <br /> <b>
 <xsl:value-of select="CURRENT_PRODUCT/PRICE" /> </b>

The end result is a page that mimics the page created by the regular pet store app (we could even use HttpUnit to verify that they are structurally identical). However, with some refactoring, XML data generated by the page could be transformed in a number of different ways, or even sent to partners without transformation. If those are pressing business needs, maybe this will be a viable architecture. We have to weigh these potential advantages (and XP says we should treat potential advantages with suspicion) against the results of our JMeter testing in order to give management good feedback about the decisions they make.

Creating the Test

We decided to compare the XSL prototype's performance against that of the existing system. By taking this as our goal, we simplify our task. Real-world performance conditions can be hard to replicate, making absolute performance difficult to measure. By testing two alternatives side by side, we develop an idea of their relative worth. When deciding between the two options, that's all the data we need.

The Test Plan

We decide to subject both Web apps to the same series of tests. Several simulated users will access a series of pages over and over again. While the system is under this load, we will gather its performance metrics. We will test each app four times: with 10, 30, 100, and 500 users. We know that the AAA Pet Store gets an average of 10 to 30 concurrent users during the day, but the customer worries about the potential increase in use connected to their upcoming promotional campaign. We model the test case after typical user behavior: We enter at the index page, go to a pet category, and then view a couple of animals within the category. This test exercises every major page in the prototype—another excellent feature. We note a couple of complicating factors: The Web server, the database, and the testing app (JMeter) are collocated. This fact eliminates one type of test noise (data transfer over an open network) but generates another: If the load becomes too high, the three apps might begin to compete with one another for scarce resources, increasing test times artificially. We decide to accept this risk, especially in light of the fact that Distributed JMeter ( rmi.html) was developed precisely to eliminate uncertain network bottlenecks. Also, we are testing two apps side by side, and any box issues should affect both equally. Still, we minimize our risk by reducing the number of extraneous processes running on the test box.

The simulated users will hit an average of one page per second, with a variation of one second (users might request another page immediately, or could wait two seconds before doing so). Real users are probably more variable, but we choose to ignore that fact for this test. Without research, have no idea what a real use pattern would look like, and (again) the side-by-side test minimizes risk.

Creating the Test in JMeter

We create a ThreadGroup in JMeter's test tree area and set the initial number of threads to 10. We will go back and manually edit this value before each test run. Then, we add a Web Testing controller to the ThreadGroup. (By the way, you can see the final test configuration in the sidebar of any of the screenshots in this section.) To the Web Testing controller, we add URL samples for each page we want a test user to visit (index, product, and so on). A Cookie Manager element takes care of maintaining client state for the entire test (important for the pet store's navigation system). Right before the listeners, we add a Gaussian random timer. We set the values as decided earlier.

The Listeners

We add several listeners to the test, each with a different purpose. The View Results listener allows us to verify that all the pages are coming through OK. After we are sure the test works, we remove this listener because it clutters the interface and adds a slight performance drag to the test. The three main visualizers we employ are the graph, the spline visualizer, and the file reporter. The graph is our best friend: It provides a quick visual analysis while also providing a running average and deviation count along its side. It also automatically scales its view of the data to keep the data in eyesight. This works for a quick check, but it prevents good visual comparisons between systems under different loads (a heavily loaded system might appear to have better response times because its response times vary so widely—to keep the data in view, JMeter shrinks the Y axis of the graph). The spline visualizer tracks minimum and maximum response times for us, and also provides a picture of the performance relative to time. This is handy, but because response times can vary so widely (under heavy load, both apps have occasional sharp spikes), the spline visualizer's averaging behavior can create a strange picture of the data. We keep it in this test only for its quick statistics.

The file reporter is probably our biggest asset in the long term. We use it to store all the response times from each test. Then, we can run a more detailed analysis if the data warrants (if both systems perform identically, we can probably skip this step). We manually change the output file for each test run to keep the data for different loads and apps separate (we use names like XSLT_100.txt).


Executing the tests is as simple as clicking on Start and then clicking on Stop after enough data has been gathered. We do not enforce any particular policy on the length of test runs--average behavior interests us more, and without a few hours for each test run, a real-world test length cannot be simulated. To switch test apps, we remove the Web testing controller from the TestPlan and replace it with a new one. This leaves the listeners, timer, and cookie manager properly configured.


Even under the lightest load, the XSLT prototype takes longer to respond. As the user load increases, the XSLT times increase more rapidly than do those of the plain-JSP app. We punch the numbers we gathered (by simply writing down JMeter's analyses) into a table as shown here (all times are shown in milliseconds). Note that the minimum time was usually for the first request, before the system became loaded, and so was always near 0.







Plain JSP


Plain JSP


Plain JSP


Plain JSP





























As you can gather from the statistics, the JSP version of the site with 100 users performs much like the XSLT version with 10. The next two figures show two of the test graphs. We discard most of the graphs generated by JMeter during the test because they are so difficult to compare visually. If we need valid graphs later, we will generate them from the report files.

Java Click To expand Java Click To expand

Both graphs reveal a strong tendency to have occasional response times much higher than the average response time (the light gray line). The next figure shows how response times degrade as users are added to the load (this naturally happens as JMeter starts more threads.)

Java Click To expand


Obviously, the XSLT prototype does not scale as well as the plain JSP solution. However, this is just the first step of a comprehensive evaluation. We begin by determining the actual performance needs of the client. Yes, the JSP version outperforms the XSLT version, but the difference between the two implementations barely registers to the end user under light load. A careful analysis of the pet store's current usage (and/or future projections) could define how much worse XSLT must perform to be ruled out. We also know that the tests do not accurately model user behavior—more time could be spent on that issue. Also, what other performance issues might affect the end user's experience? If the time to retrieve data from a production database (as opposed to a collocated one) adds a reliable one second to every response, the page-view time could change from acceptable to unacceptable. Finally, what is acceptable or unacceptable to a user? How many seconds or milliseconds occupy the space between "zippy" and "I'll take my business elsewhere"? Any or all of these questions could yield avenues for further research.

What Do We Do?

If all the questions have been resolved and the client decides that AAA Pet Store must go forward with XSLT, and that it must handle 100 concurrent users at one second per page or less, then we need to examine what can be done to speed things up. The JMeter tests we ran could be rewritten in HttpUnit and turned into functional tests for the customer—then they would know when we had achieved their performance goals. After that, we could use profiling tools to expose bottlenecks in the system. With the bottlenecks in sight, we could refactor components, testing our performance expectations with JUnitPerf. Our prototype isn't necessarily fatally flawed—some caching and refactoring might bring it well within the functional test's "pass" range.

One thing is certain: A sensible team would not build new functionality around the unaltered prototype without at least devoting some resources to the task of improving performance. Even in that situation, the client would have to hear that there was significant risk in going forward without more exploratory development.


We developed a "rubber to the road" prototype of a proposed redesign of AAA Pet Store using XSLT. Along the way, we (re)learned the value of XP's rapid feedback principle. A half-day's worth of testing with JMeter was enough to give the team some solid data about the relative performance characteristics of the two frameworks. Real numbers (even if gathered quickly) are worth a thousand conversations like this:

Imagine that the XSLT prototype was constructed to model the future architecture of a multimillion- dollar Web site. Using JMeter to gather feedback about its potential performance could save someone a costly mistake. An XP team always looks for hidden difficulties and lurking problems so they can be avoided. JMeter is a good flashlight to shine into the dark crannies of performance.