JaVa
   

Spider Example

HttpUnit's Java foundations mean that it's possible to devise all sorts of interesting testing apps. The API merely offers objects for test developers to employ at their convenience. In fact, verifying page output from individual requests (as we did with the sales report page in our previous examples) is perhaps the lowest-order use of the framework. The code for verifying three simple pages was over 100 lines long. Writing a test like that could take several hours, especially as you slowly tweak it to account for wrinkles in the underlying site. To quote one developer I worked with, "You could check the output manually in three seconds." Clearly, writing code like that is not the best option. Instead, test developers should seek ways to automatically verify portions of Web apps without manually asserting every single page. The spider example illustrates this approach by attempting to solve a common problem: how to quickly verify that a new site deployment has no major errors.

Spider Development: First Iteration

Our criteria for basic verification of a deployment is that each user-accessible link will display a page without an exception (HTTP code 500 Internal Server Error). HttpUnit helpfully turns these exceptions into HttpInternalErrorExceptions, which will fail tests for us. So, we have a starting point. If we can retrieve a response for every link on the site without an exception, the site must at least be up and running.

Java Start Sidebar
Is This a Valid Test?

Checking all the internal links is not a thorough test for most sites because it omits things like user-interaction, customization, correct page sequence (private.jsp should redirect to the login page), and so on. However, imagine a dynamically generated catalog site. The on-line catalog contains more than 10,000 products. A link-checking spider could automatically verify that all the product pages are at least displaying—something that even the most dedicated tester would be loath to do. Some added logic to verify the general structure of a product display page could provide testing coverage for more than 90 percent of the accessible pages on such a site.

Java End Sidebar

Using HttpUnit's classes, coding such a spider becomes easy. All we need to do is start at the front page and try to go to every internal link on it. If we encounter a link we have checked before, we ignore it. For each page we successfully retrieve, we check all the links on it, and so on. Eventually, every link will have been checked and the program will terminate. If any page fails to display, HttpUnit will throw an exception and stop the test. We begin by initializing a WebConversation to handle all the requests and a java.util.HashSet to keep track of which links we have followed so far. Then we write a test method that gets the response for a site's homepage and checks all the links on it:

private WebConversation conversation;
private Set checkedLinks;
private String host = "www.sitetotest.com";
public void setUp(){
 conversation = new WebConversation();
 checkedLinks = new HashSet();
}
public void testEntireSite() throws Exception{
 WebResponse response = conversation.getResponse("http://"+host);
 checkAllLinks(response);
 System.out.println("Site check finished. Link's checked: " + checkedLinks.size() + " : " + checkedLinks);
}


The checkAllLinks() method is also simple:

private void checkAllLinks(WebResponse response) throws Exception{
 if(!isHtml(response)){
 return;
 }
 WebLink[] links = response.getLinks();
 System.out.println(response.getTitle() + " -- links found = " + links.length);
 for(int i =0; i < links.length; i++){
 boolean newLink = checkedLinks.add(links[i].getURLString()); if(newLink){
 System.out.println("Total links checked so far: " + checkedLinks.size());
 checkLink(links[i]);
 }
 }
}
private boolean isHtml(WebResponse response){
 return response.getContentType().equals("text/html");
}


The isHtml() method checks to be sure that the response in which we are checking the links is in fact HTML (HttpUnit doesn't parse Flash content, for instance). After the available links in the response have been retrieved, checkAllLinks() iterates over the link array and attempts to put the text of each link into the checkedLinks set. If it is successful (indicating a new link), then checkAllLinks() attempts to verify the link with checkLink():

private void checkLink(WebLink link) throws Exception{
 WebRequest request = link.getRequest();
 java.net.URL url = request.getURL();
 System.out.println("checking link: " + url);
 String linkHost = url.getHost();
 if(linkHost.equals(this.host)){
 WebResponse response = conversation.getResponse(request);
 this.checkAllLinks(response);
 }
}


The java.net.URL is retrieved from the link through getRequest().getURL() (shown in expanded form). If the host part of the URL matches the host under test, then checkLink() retrieves the response (here is where an exception would be thrown) and then attempts checkAllLinks() in it. Finally, all of the links will be checked and the test will terminate.

At this point, the spider is perfectly usable. However, it has one major flaw: All the testing is carried out in one test method. That means that if one link fails, the entire test fails. This approach does not granularize the test output—if .1 percent of product pages (10 of 10,000) in a hypothetical catalog test suffer from a rare bug because of bad data in the database, we don't want the whole test to fail. We want to record the error, but have the test results reflect that 99.9 percent of the site is working.

Spider Development: Second Iteration

Because the spider is a testing class, our benchmark for the newly added functionality is that given the same input as our first iteration, it should yield the same results. With that quick and dirty test in mind, we begin integrating the HttpUnit spider into JUnit's framework. The integration shouldn't affect the core logic of the spider; it will still use a Set to keep track of already spidered links, ignore links outside a specified host, and so on. However, the test we ran as a method will be refactored into a separate object. The setup for the test (which we ran in testEntireSite) now exists in a separate class:

public class SpiderSiteTest {
public static Test suite(){
 String host = "www.eblox.com";
 TestSuite suite = new TestSuite();
 WebRequest homePage = new GetMethodWebRequest("http://" + host);
 SpiderPageTest test = new SpiderPageTest(homePage, new HashSet(), host);
 suite.addTest(test);
 return suite;
 }
 public static void main(String[] args){
 /*
 junit.swingui.TestRunner.main(
 new String[]{"xptoolkit.httpUnitSpiderSiteTest"}
 );
 */
 junit.textui.TestRunner.run(suite());
 }
}


The class declares a static suite method, which allows it to be accessed by JUnit's test runners as demonstrated in the main() method. The suite consists of a single instance of SpiderPageTest—the object that performs all the logic of spidering pages, checking results, and so on. SpiderSiteTest merely serves as a convenient test launcher. It could be replaced with a more sophisticated launcher class that, say, read in the host initial page from a properties file or some such. The SpiderPageTest object is the critical piece of the testing app. Its public interface consists of two constructors that parameterize the behavior of the test as well as the methods specified in junit.framework.Test. The constructors specify the WebRequest (or, for convenience, the WebLink from which the request is derived), a java.util.Set of links that have already been tested, and the host of the site to test (offsite links are ignored). The Test interface specifies two methods: run(junit.framework.TestResult result) and countTestCases(). run() is supposed to execute the Test and report the outcome to the TestResult. We will cover this method in a moment. countTestCases() returns the total number of TestCases run by this Test. In this case, we have to cheat and return 1, because we have no way of knowing in advance how many pages will be tested and, hence, how many tests will run. This clues us in to imperfect integration with JUnit. JUnit's designers expected TestCases to be the objects that actually execute a test (as opposed to aggregate or modify tests). Perhaps SpiderPageTest should be a TestCase. We write this down on our task list as something to investigate after our SpiderSiteTest runs. Once we have all the logic, we may be able to see how to reshape SpiderPageTest for better integration. See the following listing for the constructors and class initialization.

private WebConversation conversation = new WebConversation();
private WebRequest request;
private WebLink link;
private Set alreadyChecked;
private String host;
public SpiderPageTest(WebRequest request, Set alreadyChecked, String host) {
 this.request = request;
 this.alreadyChecked = alreadyChecked;
 this.host = host;
}
public SpiderPageTest(WebLink link, Set alreadyChecked, String host) {
 request = link.getRequest(); this.alreadyChecked = alreadyChecked;
 this.host = host;
}


JUnit's TestRunners (and, later on, the class itself) will call the run() method of SpiderPageTest to execute the object's testable behavior, so it makes sense to examine this method first:

public void run(TestResult result) {
 if(notSameHost()){
 System.out.println(this + " not run because host for test (" + host + ") does not match URL being tested.");
 return;
 }
 WebResponse response = runTest(result);
 if(response != null){
 try{
 spiderPage(response, result);
 }
 catch(SAXException e){
 result.addError(this, e);
 }
 }
}


Its first step is to verify that the host of the request matches the host under test. If it does not, the method returns; external links are neither failures nor successes—they are simply ignored. Then, the runTest() method checks the page and reports the outcome to the test result. If this step is successful, the response from that page is sent to spiderPage(), which acts almost exactly like checkAllLinks() in the first iteration of the spider (we will cover this method in detail in a moment). The runTest() method takes care of accessing the page and logging exceptions (read test failures) that occur in the process:

 private WebResponse runTest(TestResult result){
 WebResponse response = null;
 result.startTest(this);
 try{
 response = this.accessPage();
 }
 catch (ThreadDeath e) { throw e; }
 catch (AssertionFailedError e) {
 result.addFailure(this, e);
 }
 catch (Throwable t) {
 result.addError(this, t);
 }
 /*furture requests are wrapped in their own test,
 so this test ends here*/
 result.endTest(this);
 return response;
}
private WebResponse accessPage() throws Exception{
 return conversation.getResponse(request);
}


First, the start of the test is registered with the result using result.startTest(this). Then the test() method is run and the errors (if any) are registered with the result using the result.addError() and result.addFailure() methods. Finally, result.endTest(this) is called to signal test completion to the result.

Java Start Sidebar
Errors or Failures?

Should we log HttpUnitExceptions (that is, page failures, commonly 404 or 500 status responses) as errors (unanticipated) or failures (anticipated)? There are reasons for either approach. Logging them as failures seems proper because the test is meant to check for exactly this type of exception. Logging them as errors is easier, and we did not specifically raise an AssetionFailedError with an assertion, so the exception could be regarded as unexpected.

Java End Sidebar

The run() method calls spiderPage()if there are no failures in runTest(). The only difference between spiderPage() and its earlier incarnation (checkAllLinks()) is that instead of executing a method on each link it finds, it instantiates a new SpiderPageTest and calls run() on it:

SpiderPageTest linkTest = new SpiderPageTest(links[i], alreadyChecked, host);
linkTest.run(result);


Thus, a call to the run() method of a SpiderPageTest will probably execute calls to the run() methods of several SpiderPageTests. For this reason, SpiderPageTest is more like a combination of a test runner and a test than a pure test. The full code for spiderPage() appears in here:

private void spiderPage(WebResponse response, TestResult result)
 throws SAXException{
 if(!isHtml(response)){
 return;
 }
 WebLink[] links = response.getLinks();
 for(int i =0; i < links.length; i++){
 boolean newLink = alreadyChecked.add(links[i].getURLString());
 if(newLink){
 System.out.println("Total links checked so far: "
 + alreadyChecked.size());
 SpiderPageTest linkTest = new SpiderPageTest(links[i], alreadyChecked, host);
 linkTest.run(result);
 }
 }
}
private boolean isHtml(WebResponse response){
 return response.getContentType().equals("text/html");
}


Future Work on the Spider

As we discovered in implementing countTestCases with a "cheat" return value, this example still has wrinkles to be ironed out. Tighter integration with JUnit would probably be beneficial. Also, we could extend the spider with new launchers. For example, imagine a site where a logged-in user is able to access more or different areas than a non-authenticated user. With a few lines of HttpUnit code, we might write a launcher that spidered the site, logged in, and then spidered the site again.


JaVa
Comments