High Availability

Many organizations will initially attempt to deploy Web services for non-mission-critical apps. For others, who are using Web services to generate revenue, making your services highly available becomes mandatory. Before starting the journey to deploying mission-critical systems, you must consider the cost of downtime, lost opportunities, lost revenue, and any stranded fixed cost your organization would need to pay whether productive or not.

Damage is sometimes harder to calculate. Loss of good will with customers, partners, and suppliers affected by unavailable services will give the impression that your organization is poorly run and incapable of fulfilling their needs. If your organization is in the public health or safety area, an unavailable service could potentially cost lives. An organization must weigh the high cost of downtime. The higher the cost of downtime, the more robust your availability plans must be. Screenshot outlines the relative severity of an impact and its resulting costs. The architects of a Java-based Web service must analyze and document what is affected when a system goes down. Here are some additional thoughts to consider:

Java Click To expand
Screenshot: Downtime cost as a function of impact


Availability refers to the time a service is operational and is expressed as a percentage. High availability usually refers to running for extended periods, exceeding 99.99%, with minimal unplanned outages. . This is referred to as four nines. Similarly, 99.999% is referred to as five nines, which requires all components of your service, including the operating system, network, human errors, and so on to have no more than five minutes of downtime a year (Table 16.4).

Table 16.4: Availability Statistics


Number of nines

Time lost in one year



days (876 hours)



days (87 hours)













To provide a highly available service, every component of your infrastructure must support high availability. The level of availability can be no better than that of the weakest link. Increasing availability requires incremental improvement to each component in your network. This is preferable to seeking 100% availability for a specific component. Many organizations assume that achieving higher levels of availability is expensive. This is not necessarily true. Availability is not free. It requires executive management attention, hard work, and due diligence to assemble the right people, processes, and tools. It starts with reliable, stable components. Higher levels of availability require methodical system integration and detailed app design.

The real cause of many outages tends to be human error. Many organizations have spent millions of dollars on high-availability solutions, only to have a network administrator accidentally unplug a crucial component. Other outages arise from fixing problems that are not the root cause but appear so because of inadequate documentation. To achieve the highest levels of availability, implement a comprehensive change-management strategy that defines repeatable processes.

Service Design Techniques

To increase the availability of your Web service, include your consumers' requirements for performance, reliability, and availability. Availability is hard to incorporate after the fact and must be included initially in app design. Using a Web service to integrate with legacy systems may require these apps to be redesigned. Architects and developers alike must be educated about the cost of outages, to make the appropriate tradeoffs. They should also make sure components are suitable for the negotiated availability needs. The design should keep the scope of a failure small by isolating important processes, such as sessioning, and tightly coupling integration with other components. Those who have worked in mainframe shops may have observed that these machines cannot run a batch cycle while online apps are available. Make sure your service does not depend on other components being available. Also take into account that recovery in case of failure should be fast, intuitive, and should not require a lot of internal or external coordination. The one step any IT shop can take is to employ standard processes and procedures. Naming conventions help reduce errors and improve communications immensely. By having a standardized process such as naming conventions, compilers can serve as a built-in check for change management and other related activities. Managing your Web services should take into account the end-to-end view from a business perspective. All resources the Web service uses should be managed within the context of a business process. This allow administrators and the business community to ascertain and understand the business impact on any one resource. It has the additional advantage of allowing administrators to prioritize their next steps in outages, performance slowdowns, and other business-critical situations based on quantifiable business needs. Another important consideration is a unified view of security. A breach will cause downtime. The main problem is that different roles within an organization, such as architects, developers, security administrators, and operations, typically have their own views of the world. They usually administer security policies on the resources they directly manage. Organizations that don't use single sign-on usually have multiple identities spread across multiple parties. Even the simplest task of disabling a user ID will repeatedly produce inconsistent levels of access throughout the services offered.

Infrastructure Design Techniques

Redundant, reliable components in your infrastructure are the key to availability. Redundancy in components, systems, and data can eliminate single points of failure (Screenshot). Consider incorporating clustering (hardware and/or software) into the infrastructure. The essential principle is to present a common addressing scheme to the underlying components.

Java Click To expand
Screenshot: Redundant infrastructure for high availability

As an example, a load-balancing appliance (e.g. Alteon, Cisco CSS, F5 Big IP) provides a single IP address for a group of servers. A Web service consumer needs only direct a request to the IP address exposed by the load-balancing appliance. The appliance, in turn, directs the request to the server with the least utilization. The appliance also determines when a server in the cluster is not functioning and routes requests away from it. The added benefit of this approach is the ability to take servers offline one at a time for maintenance without affecting users. Many appliances allow you to implement several algorithms for determining where to direct Web service requests, such as load balancing, server utilization, or app affinity. Many white papers shared by industry analysts such as the Gartner Group and Forrester Research cover the business aspects of high availability and help an organization determine whether it requires "five nines." The technology aspect of high availability especially in Java Web service, requires additional thinking and architectural planning. Let us outline some questions you should ask your-self about each tier: