Technorati Profile Blog Flux Local IT Transformation with SOA
>

Friday, February 26, 2010

The Role of Engineering with SOA: The Foundation



Let’s face it, developing a new system can be such a “sexy” undertaking that it’s only natural to want to place most of the focus on the cool stuff such as leading-edge technologies (wireless, social media), design and development of algorithms, flashy user interfaces, and the implementation of complex system features.  This type of focus often results in the neglect of the more “pedestrian” aspects of the actual implementation. It’s not much fun dealing with nuanced matters such as ensuring that back-up processes are in place, that the system actually includes fallback and recoverability capabilities, that the system is truly secured, and that the system is stable. 
It’s true that most of the actual engineering processes tend to come from pre-defined, out of the box vendor products (clustering, default configurations, etc.), but the target operational metrics should come from the enterprise needs and not from the vendor defaults.  From the outset your very own engineering planning should focus on ensuring these targets are met as early as possible.
From a governance perspective you will need to ensure you have a dedicated engineering team, able to tackle all detailed implementation and operational questions and also able to interact with the architecture team in a continuous and equal basis.  The engineering team should be able to push-back on some architecture elements in order to validate that the solutions are sufficiently practical and implementable. In this sense, the engineer is not unlike the building contractor who interprets the architect’s blueprints and guides the building construction via the selection of actual materials, enforcement of building codes, and performance of the necessary detailed adjustments to the design.  Architecture may be an art, but engineering is a science.
Still, in the same veneer as development, engineering needs to be an iterative process.  Engineering must initially deal with high level designs and approaches. However as additional “construction” data is gathered, the engineering process should also adapt to the various fine-tuning variables: capacity metrics, configuration parameters, availability, performance strategies and others.
In the end, the final acceptance test must include testing of engineering aspects as well as software development. That is, the final testing should take a holistic approach to coverage of the system operation as well as to its functionality. Having a system providing nice applications that do not scale cannot be considered a successful outcome. That’s why the engineering objectives are paramount. These healthy engineering key objectives are known in a tongue-in-cheek fashion as the “-ities” of the system: Availability, Security, Serviceability, Reliability, etc.  I will next cover three of the key engineering areas targeting these “-ities”:
·         System Availability and Reliability
·         Security & Continuance
·        Systems Management.

Labels: , , , , , , , , , ,

Friday, February 12, 2010

Performance Planning with Modeling & Simulation


SOA environments are characterized by an eclectic mix of components, service flows, processes and infrastructure systems (servers, local area networks, routers, gateways, etc.). This complexity makes it difficult to predict the capacity of the needed infrastructure.  In addition, trying to evaluate the impact of changes in how services are routed or used is more an exercise in the art of divination than in the scientific method.  Understanding the dynamics of an SOA system usually takes place over a period of time. Having to wait months to optimize a system is not usually a good option.
An alternative approach is to create a model of the system in order to simulate its current and future performance. Depending upon the complexity of the model, you will be able to simulate the actual system latency and the predicted response times of a variety of service flows. 
Simulation can help identify potential bottlenecks and streamline processing times by pinpointing areas where resources can be best optimized.  Imagine knowing the answers to these questions in more concrete terms:
·         What are the transaction response times?
·         How many servers, data bases or links do I actually need?

Without the ability to simulate, system designers and administrators are left with the choice of deploying what they believe to be the best system, praying, and then taking a reactive approach based on the on-going measurement of actual performance data via monitoring tools. By then, it might be too late or too expensive to fix the system.
In general, simulations fall within one of the following levels of details:
Rapid Model (also known as “NapkinSim”). You’ve probably been simulating in this manner for quite some time. If a clerk takes 10 minutes to serve a customer, and on average two customers arrive every 20 minutes, what is the average wait time? “Simple,” you might say, “the answer is zero.”  The answer is, of course, never simple and it is not zero. The answer depends upon how the customers arrive. If two customers arrive at the same time, one of them will have to wait at least 10 minutes. When running a simulation tool you will soon come to realize the importance  inter-arrival distributions have  on simulation results.
Aside from the simplest SOA problems, you cannot predict the desired resource-requester relationship by resorting to simple napkin arithmetic. 
Mathematical Analysis. Mathematics is not entirely helpful either. Significant work has been done to analyze the so called M/M/1 (single queue with exponential arrivals and services) problem. However, most mathematical approaches cannot satisfactorily cope with dynamic or transient effects and quickly become too complex for multi-server environments.  In real life, however, most queuing problems cannot be solved easily by resorting to linear equations. Indeed, the norm is for complexity to quickly drive the problem area to behave like a non-linear system. This in turn requires the assistance of complex mathematics for a reliable solution.  What then is the alternative?
Queuing  Simulation. Regardless of the level of abstraction chosen for the system under simulation, you will want to have the most precise and reliable information for the expected behavior of the system. In this case, simulation known as Queuing Simulation can be the most helpful.
Queuing simulation is particularly suited to SOA because you can simulate almost any process in which a “client” requests a service and a “resource” provides that service. No doubt about it, queuing simulation is the most viable and obvious way to model and predict how an SOA system will behave. 

To be clear, the simulation approach is not a panacea. First of all, you have to learn about the simulation tools. Secondly, detailed modeling can be time consuming. Modeling should not be viewed as a quick way to get answers to questions. You should also keep in mind that simulations yield only approximate answers which—in many cases—are difficult to validate. In the end, simulation is merely a more precise way to venture a guess. You should not accept simulation results as gospel. It is easy to forget that the simulation is an abstraction of reality; not reality itself.  A thorough validation of the results must be made, especially prior to publication of the results. Simulations should be supported by careful experiment design, an understanding of the assumptions, and reliability of the input data used in the model. Despite these caveats, you will find that simulation can be an invaluable tool in your day to day business activities.

While you could develop a simulation by writing a program yourself, you could also use one of the many simulation tools on the market. Today’s simulation tools are not as expensive as in the past, but they do demand the discipline to capture and create the base model and to keep the simulation model current for future simulation runs. A modern simulation tool for SOA should provide a visual interactive modeling and simulation tool for queuing systems that has the following attributes:
General purpose.  You can simulate almost anything that involves a request, a queue and a service, whether this includes a complex computer network or the service times at a fast food counter. This capability will give you the option of simulating the SOA system at various levels of granularity; from the underlying packet-level communications layers to the upper service flows.
Real-time. Unlike other costlier programs, you can view how the resources in your system behave as the simulation progresses.
Interactive. You can dynamically modify some essential parameters to adjust the behavior of the simulated components even as the simulation runs!
Visual Oriented. Allows you to enter the necessary information via a simple, and intuitive user interface, while removing the need to know a computer language. In addition to running the simulation, it also provides you with important information to help you fine tune it.
Discrete oriented.  Discrete-event systems change at discrete points in time, as opposed to continuous systems which change over time. 
Flexible. You can see the dynamic effects of the simulated system, or the accumulated averages representing the overall mean behavior of the system.
 

As a Valentine’s Day gift to the readers of this article, I am making Prophesy—A Complete Workflow Simulation System available for free!
Prophesy is a simulation product that I developed and marketed back in the roaring 90’s (when in retrospect I should have been putting my efforts into developing something for the exploding World Wide Web—but that’s another story). Prophesy meets the requirements listed above, but unfortunately, the product is aged. It’s no longer supported, and it will not run under Windows 7 (“thank” Microsoft for their lack of backward compatibility).
You can visit http://www.abstraction.com/prophesy to download it for free and hopefully to use as a learning tool.
Enjoy!

Labels: , , , , , , , , , , ,

Friday, February 5, 2010

Best Performance Practices


As mentioned earlier, using “thin” services that require multiple trips to the server to obtain a complete response is one of the most common performance mistakes made with SOA.  Most other SOA performance problems occur due to basic engineering errors such as miss-configurations (low memory pools, bad routings, etc.) which can be fixed with relative ease once identified. Performance problems caused by inappropriate initial design are much harder to correct:
·         Inefficient implementation. The advent of high level and object oriented languages does not excuse the need to tighten algorithms. Many performance problems are the result of badly written algorithms or incorrect assumptions about the way high-level languages handle memory and other resources.

·         Inappropriate resource locks and serialization. Just as it is not an good idea to design a four-lane highway that suddenly becomes a one lane bridge, best practice design avoids synchronous resource-locking as much as possible. Its’ best to implement service queues whenever possible to take advantage of the multitasking and load balancing capabilities provided by modern operating systems.  Still, avoid using asynchronous modes for Query/Reply exchanges.

·         Unbalanced workloads. This is a scenario more likely to occur when services must run from a particular server due to the need to keep state or because the services are not configured correctly. The more you can avoid relying on state, the more capable you will be in avoiding unbalanced workloads.

·         Placing the logic in inappropriate places. Don’t let grandma drive that Lamborghini. Emerging web site implementations were developed with an organic view that placed business logic in the front-end portals.  So-called Content Management Systems were developed to provide flexible frameworks for these web portals. Unfortunately, this architecture pattern leads to monolithic, non-scalable designs. Despite the assumed performance overheads implied by modular designs, it is best to put the business logic in back-end engines that can be accessed via services through front-end portals.
Designers aware of SOA’s inherent inefficiencies, tend to architect the system in a traditionally monolithic manner.  However, it is a mistake to shy away from the use of services during the design phase just to “preemptively” alleviate performance concerns. You risk reducing flexibility in the design and this defeats one of the main reasons for the use of SOA.
There are many other, better ways to remedy the performance concerns of SOA:
·         Applying best practices in service design. Watch for service granularity, service flows and the use of superfluous execution paths. For example, avoid “in-band” logging of messages (control messages mixed with the application data-carrying messages). That is, quickly copy the messages to be logged and handle them asynchronously to the main execution path. Make the logging process a lower priority than application work (alerts must be the highest priority!).

·         In SOA, caching is essential. Caching is to SOA what oil is to a car’s engine. Without caching, there is no real opportunity to make SOA efficient and thereby effective. However, provided that the necessary enablers are in place (i.e. ability to use caching heavily), performance is an optimization issue to be resolved during system implementation (remember the dictum: Architecture is about flexibility; engineering about performance.)

·         Finally, with SOA there is a need to proactively measure and project the capacity of the system and the projected workloads. Modeling and Simulation must be a part of the SOA performance management toolkit.

More on each of these next . . .

Labels: , , , ,

Friday, January 29, 2010

Managing SOA—The Control Layer


You should maintain control of your SOA environment by ensuring that all SOA messages in your system comply with a service framework that incorporates a standardized service stub containing necessary control elements for each message.  Whether using a federated ESB or your own canonical approach, you must ensure that every SOA message contains the following elements:
          Versioning. This will enable you to gracefully introduce new versions of services and interfaces. The service routing fabric (often part of the ESB) will be able to use this information to help decide whether to send the service request to one implementation of the service versus another. Clearly, service versioning should be used sparingly and judiciously as it could become a de-facto means of creating new families of services and thus make future control of service implementations more difficult.

          Prioritization. The SOA middleware may be in the position to deliver services under pre-defined level agreements.

          Sequencing/Time-stamping. It’s always a good idea to introduce an ordinal counter for each service request. Ideally, if the response to the service is atomic and can be associated with a request, the response should also incorporate the ordinal number of the request. This type of information can be used for debugging purposes, or even to give the client the ability to associate a response to a request without having to keep state. Time-stamping all services is good way to ensure the potential tracking of performance metrics and the ability to debug message routes.

          Logging level. In principle all service calls should be “log-able”. Once a system has been stabilized, you will probably want to log only a few key service calls. However, given the need, you may want to increase the detail of logging on demand. Setting up a log-level in each service message will enable the middleware to decide whether or not the threshold for logging requires the message to be logged.

          Caching Ability. This setting works in two ways. From a requester’s perspective, the flag may indicate to a caching entity that under no circumstance should there be a cached response to the request.  From a responder’s perspective, the flag might indicate to the caching entity whether or not the response should be cached.
I recommend that you task your architecture group to define the specifics of an Enterprise Service Framework (ESF) to ensure all your applications generate services with the standard headers you’ve defined. The ESF should be instantiated as a common repository of dynamically linked libraries that are a part of your programmers toolkit; one that will have the appropriate headers transparently appended during the service call.
In the end, the establishment of standard headers under an ESF is a foundational practice necessary to support system-wide dashboard monitoring, preventative systems management and proactive performance planning.

Labels: , , , ,

Friday, December 25, 2009

The Data Visibility Exceptions

The Data Sentinel is not unlike the grumpy bureaucrat processing your driver’s license application forms. After ensuring that you comply with what’s sure to be a ridiculously complicated list of required documents, it isolates you from directly accessing the files in the back.
While you, the applicant, the supplicant, cannot go around the counter and check the content of your files directly (not legally, anyway), the DMV supervisor in the back office is able to directly access any of the office files. After all, the supervisor is authorized to bypass the system processes intended to limit the direct access to the data.  Direct supervisory access to data is one of the exceptions to the data visibility constrains mentioned earlier. 
Next is the case of ETLs (Extract Transform Loads) of large sets of data as well as its reporting. These cases require batch level access to data in order to process or convert millions of data records and can wreck performance if carelessly implemented. Reporting jobs should ideally run against offline replicated databases; not the on-line production data bases. Better yet is to plan for a proper Data Warehousing strategy that allows you to run business intelligence processes independently of the main Operational Data Store (ODS). Never the less, on occasion, you will need to run summary reports or data-intensive real-time processes against the production database. When the report tool is allowed to access the database directly, bypassing the service layer provided by the Data Sentinel, you will need to ensure this access is well-behaved and that it runs as a low priority process and under restricted user privileges. The same control is required for the ETL processes.  Operationally, you should always schedule batch-intensive processes for off-peak times such as nightly runs.
A third potential cause for exception to data visibility is implied by the use of off-the-shelf transaction monitors, requiring direct access to the databases in order to implement the ACID logic discussed earlier.
A fourth exception is demanded by the need to execute large data matching processes. If there is an interactive need to run a process against a large data base set with matching keys in a separate data base (“for all customers with sales greater than an $X amount, apply a promotion flag equal to the percentage corresponding to the customer’s geographic location in the promotion database”), then it makes no sense trying to implement each step via discrete services. Such an approach would be extremely contrived and inefficient. Instead, use of a Table-Joiner super-service will be required. More on that next.

Labels: , , , , , , , , , , ,

Friday, October 2, 2009

On the Granularity of Services


You’re seated in a fancy restaurant ready to enjoy a nice gourmet meal.  The waiter shows up with the menu, but instead of a list of entrees and appetizers, you are confronted with a catalogue of recipes. You order a Tuna Tartare as appetizer. The waiter stares at you with a bewildered expression on his face. “Pardon?” he asks. “I’d like a Tuna Tartare,” you insist. He doesn’t understand and it finally hits you, he’s expecting you to guide him through each step of the recipe. “Heck,” you think, this must be some kind of novelty gimmick, like Kramer’s make-your-own-pizza idea in a classic Seinfeld episode, and so you begin the painstaking process of preparing for the appetizer:
“Please get 3 ¾ pounds of very fresh tuna. Dice the tuna into 1/4-inch cubes and place it in a large bowl.” The waiter scribbles furiously. “Got this part, sir, I’ll be right back!” he says as he dashes to the kitchen to begin preparing your order.
Reading from the menu, you continue when he returns by requesting that he combine1 ¼ cups of olive oil, 5 limes zests grated and 1 cup of freshly squeezed lime juice in a separate bowl. He runs back to the kitchen before you get a chance to tell him to also add wasabi, soy sauce, hot red pepper sauce, salt, and pepper to the bowl. . .
You get the idea.  There are different ways to ask for services. Let’s think of a more realistic computer design choice. Say you need to calculate the day of the week (What day does 10/2/2009 falls on?). If you were to define “Calculate-Day-of-the-Week” as a service, then you would be expected to allow this service to run in any computer, anywhere in the world (remember the transparency credo I covered earlier!), and to be reachable via a decoupled interface call.  If you were to answer, “Okay! No problem”, I would have to then ask you whether this is actually a sensible option. What would be the potential performance impact of having to reach out to a distant computer every time a day of the week calculation is needed?
Remembering the definition of services that I provided earlier, you insist that “Calculate-Day-of-the-Week” is definitely a service that provides a direct business value.
For SOA purposes a service represents a unit of work that a non-technical person can understand as providing a valuable stand-alone capability
You can argue that “Calculate-Day-of-the-Week” is in fact a unit of work that the salesperson, a non-technical person, can understand and that she will need to access with her Blackberry. In that case, I would then yield to the argument because you have shown that the calculation has business logic that is relevant to your company.
If, on the other hand, “Calculate-Day-of-the-Week” is needed only by programmers, and there is no requirement for it to be directly accessed by anyone in the business group, then this is something that should be handled as a programming function and not as a service. 
If the reason “Calculate-Day-of-the-Week” is needed is because the calculation is part of a broader computation, say to find out whether a discount applies to a purchase (“10% off on Wednesdays!”), then the real service ought to be “Determine-Discount” and not a day of week calculation. You see, defining what constitutes a service can be somewhat subjective.
Your team should apply similar reasoning when determining services: Calculating the hash value of a field is a function; not a service.  Obtaining passenger information from an airline reservation system is a service, but appending the prefix “Mr.” or “Ms.” to a name should not be considered a service.
Now, to be fair, there will always be those fuzzy cases that will demand your architecture team to make a call on a case-by-case basis.  If obtaining a customer name is needed for a given business flow, then it can be considered a service. However, if obtaining the customer name is part of a business process that is a part of assembling all customer information (address, phone number, etc.) you should really have a “Get-Customer-Information” service so as not to oblige the client to request each information field separately. 
In general, when it comes to services, it is better to start with fewer, coarser services and then move on to less coarse services on a need by need basis. In other words, it’s better to err on the side of being coarse than to immediately expose services that are too granular. It’s ultimately all about using common sense. Remember the restaurant example. When you order food in a restaurant it’s better to simply look at the menu and order a dish by its name.
Finally, even if a function is determined not to be a service, and therefore does not need to be managed with the more comprehensive life-cycle process used for services, there is no excuse for not following best-practices when implementing it. Just as with services, make certain the function is reusable, that it does not have unnecessary inter-dependencies, and that it is well tested. You never know when you may need to elevate a function to become a service.
But most importantly, the secret sauce in this SOA recipe is the interface: both, services and functions must have well defined interfaces.
More on this next week!

Labels: , , , , , , ,