Technorati Profile Blog Flux Local IT Transformation with SOA
>

Friday, March 5, 2010

System Availability & Reliability


Achieving the ability to provide services from any number of loosely coupled servers is essential in facilitating the deployment of redundant systems. Redundancy is the key to continued availability. Traditional mainframe environments were designed to be monolithic and had fewer components. Following the precept that, if put all your eggs in one basket, you better make sure that it’s a good basket, mainframe systems were designed to be extremely reliable and to come embedded with high-availability features. On the other hand, SOA systems tend to include more moving parts. More moving parts means there is a higher possibility of failure. Also, these moving parts may be components that have not been engineered or manufactured with the same high level of quality control applied to the more expensive mainframe. No use debating it: Out of the box, most mainframe systems deliver far higher availability levels.
SOA must overcome these inherent availability issues.  The method used to achieve redundancy in SOA is by introducing redundant elements; usually via clustering. To enable full utilization of the clustering capabilities provided by application server vendors, you should reduce state-dependent services.   This reduction will facilitate the logical decoupling that allows you to design a very resilient system that consists of active-active components in each layer of the stack from dual communication links, to redundant routers and switches, to clustered servers and redundant databases.
In the diagram below, a sample mainframe system has, for the sake of discussion, a 90% availability (mainframe systems usually have much higher availability ratings. I am using this number to simplify the following calculations).

Now, let’s say that you deploy a two-component SOA environment with each component giving 90% availability. . .

In this latter SOA system, you should expect the overall system availability to be no greater than 0.90 * 0.90 = 0.81! That is, by virtue of having added another component to the flow, you have gone from 90% availability to 81%. The reason for this is that both components are in a series and both have to be functional for the system to operate. In SOA you must adjust by adding additional fallback components: