Technorati Profile Blog Flux Local IT Transformation with SOA
>

Friday, January 1, 2010

Data Matching and Integration Engines


Encapsulation of data via data services via Data Sentinel works well when the data is being accessed intermittently and discretely. However, there are cases where the data access pattern requires matching large amounts of data records from one data base to large data volumes in another data base. An example could be a campaign management application with a need to combine the contents of a customer database with a promotion data base defining discount rates based on the customer’s place of residence.  Clearly, the idea to have this service call a data service for every customer record when performing promotional matches would be unsound and impractical from a performance perspective. The alternative, to allow applications to perform direct data base joins against the various data bases is not an ideal one either. This latter approach would violate many of the objectives SOA tries to solve by forcing applications to be directly aware and dependant of specific data schemas and data base technologies.
Yet another example is when implementing data extraction via an algorithm such as MapReduce that necessitates the orchestration of a large number of backend data clusters. This type of complex orchestration against potentially large sets of data cannot be left to the service requester and is best provided by sophisticated front end servers.
Both examples show the need to make these bulk data matching processes part of the service fabric, available as coarse data services. The solution then is to incorporate an abstraction layer service for this type of bulk data join process. Applications can then trigger the process by calling this broadly-coarse service. In practical terms, this means that when implementing the SOA system you should consider the design and deployment of data matching and integration engines needed to efficiently and securely implement this kind of coarsely defined services.  In fact, you are likely to find off-the-shelf products that at heart are instances of Data Matching Engines: Campaign Management Engines, Business Intelligence systems, Reporting Engines servicing users by generating multi-view reports.
Now, using off-the-shelf solutions has tremendous benefits but the use of external engines is likely to introduce varied data formats and protocols to the mix. Non withstanding the ideal to have a canonical data format all throughout, there will always be a need to perform data transformations.  That’s the next topic.

Labels: , , , , , , , ,

Friday, December 25, 2009

The Data Visibility Exceptions

The Data Sentinel is not unlike the grumpy bureaucrat processing your driver’s license application forms. After ensuring that you comply with what’s sure to be a ridiculously complicated list of required documents, it isolates you from directly accessing the files in the back.
While you, the applicant, the supplicant, cannot go around the counter and check the content of your files directly (not legally, anyway), the DMV supervisor in the back office is able to directly access any of the office files. After all, the supervisor is authorized to bypass the system processes intended to limit the direct access to the data.  Direct supervisory access to data is one of the exceptions to the data visibility constrains mentioned earlier. 
Next is the case of ETLs (Extract Transform Loads) of large sets of data as well as its reporting. These cases require batch level access to data in order to process or convert millions of data records and can wreck performance if carelessly implemented. Reporting jobs should ideally run against offline replicated databases; not the on-line production data bases. Better yet is to plan for a proper Data Warehousing strategy that allows you to run business intelligence processes independently of the main Operational Data Store (ODS). Never the less, on occasion, you will need to run summary reports or data-intensive real-time processes against the production database. When the report tool is allowed to access the database directly, bypassing the service layer provided by the Data Sentinel, you will need to ensure this access is well-behaved and that it runs as a low priority process and under restricted user privileges. The same control is required for the ETL processes.  Operationally, you should always schedule batch-intensive processes for off-peak times such as nightly runs.
A third potential cause for exception to data visibility is implied by the use of off-the-shelf transaction monitors, requiring direct access to the databases in order to implement the ACID logic discussed earlier.
A fourth exception is demanded by the need to execute large data matching processes. If there is an interactive need to run a process against a large data base set with matching keys in a separate data base (“for all customers with sales greater than an $X amount, apply a promotion flag equal to the percentage corresponding to the customer’s geographic location in the promotion database”), then it makes no sense trying to implement each step via discrete services. Such an approach would be extremely contrived and inefficient. Instead, use of a Table-Joiner super-service will be required. More on that next.

Labels: , , , , , , , , , , ,

Friday, December 4, 2009

Taming the SOA Complexities


Remember when I used to say, “Architect for Flexibility; Engineer for Performance”? Well, this is where we begin to worry about engineering for performance. This section, together with the following SOA Foundation section represents the Level III architecture phase. Here we endeavor to solve the practical challenges associated with SOA architectures via the application of pragmatic development and engineering principles.


On the face of it, I wish SOA were as smooth as ice cream. However, I regret to inform you that it is anything but.  In truth, SOA is not a panacea, and its use requires a fair dose of adult supervision. SOA is about flexibility, but flexibility also opens up the different ways one can screw up (remember when you were in college and no longer had to follow a curfew?).  Best practices should be followed when designing a system around SOA, but there are also some principles that may be counter-intuitive to the “normal” way of doing architecture. So, let me wear the proverbial devil’s advocate hat and give you a list from “The Proverbial Almanac of SOA Grievances & Other Such Things Thusly Worrisome & Utterly Confounding”:
·         SOA is inherently complex. Flexibility has its price. By their nature, distributed environments have more “moving” pieces; thereby increasing their overall complexity.
·         SOA can be very fragile. SOA has more moving parts, leading to augmented component interdependencies.  A loosely coupled system has potentially more points of failure.
·         It’s intrinsically inefficient. In SOA, computer optimization is not the goal. The goal is to more closely mirror actual business processes. The pursuit of this worthy objective comes at the price of SOA having to “squander” computational resources. 
The way to deal with SOA’s intrinsic fragility and inefficiency is by increasing its robustness.  Unfortunately, increasing robustness entails inclusion of fault-tolerant designs that are inherently more complex.  Why? Robustness implies deployment of redundant elements. All this runs counter to platonic design principles, and it runs counter to the way the Level I architecture is usually defined. There’s a natural tension because high-level architectures tend to be highly optimized, generic, and abstract, referencing only the minimum detail necessary to make the system operate. That is, high level architectures are usually highly idealized—nothing wrong with it. Striving for an imperfect high level architecture is something only Homer Simpson would do. But perfection is not a reasonable design goal when it comes to practical SOA implementations.  In fact, perfection is not a reasonable design goal when it comes to anything.
Consider how Mother Nature operates.  Evolution’s undirected changes often result in non-optimal designs. Nature solves the problem by “favoring” a certain amount of redundancy to better respond to sudden changes and to better ensure the survival of the organism. “Perfect” designs are not very robust. A single layered roof, for example, will fail catastrophically if a single tile fails. A roof constructed with overlapping tiles can better withstand the failure of a single tile. 
A second reason SOA is more complex is explained by the “complexity rule” I covered earlier: the more simplicity you want to expose, the more complex the underlying system has to be. Primitive technology solutions tend to be difficult to use, even if they are easier to implement.  The inherent complexity of the problem they try to solve is more exposed to the user. If you don’t believe me consider the following instructions from an old Model T User Manual from Ford:
 “How are Spark and Throttle Levers Used? Answer: under the steering wheel are two small levers. The right- hand (throttle) lever controls the amount of mixture (gasoline and air) which goes into the engine. When the engine is in operation, the farther this lever is moved downward toward the driver (referred to as “opening the throttle”) the faster the engine runs and the greater the power furnished. The left-hand lever controls the spark, which explodes the gas in the cylinders of the engine.”
Well, you get the idea. SOA is all about simplifying system user interactions and about mirroring business processes.  These goals force greater complexity upon SOA. There is no way around this law.
There are myriad considerations to take into account when designing a services-oriented system.  Based on my experience I have come up with a list covering some of the specific key techniques I have found effective in taming the inherent SOA complexities.  The techniques relate to the following areas that I will be covering next:
State-Keeping/State Avoidance. Figuring out under what circumstances state should be kept has a direct relevance in determining the ultimate flexibility of the system.
Mapping & Transformation. Even if the ideal is to deploy as homogenous a system as possible, the reality is that we will eventually need to handle process and data transformations in order to couple diverse systems. This brings up the question as to where is best to perform such transformations.
Direct Access Data Exceptions. As you may recall from my earlier discussion on the Data Sentinel, ideally all data would be brokered by an insulating services layer. In practice, there are cases where data must be accessed directly. The question is how to handle these exceptions.
 Handling Bulk Data. SOA is ideal for exchanging discrete data elements. The question is how to handle situations requiring the access, processing and delivery of large amounts of data.
Handling Transactional Services.  Formalized transaction management imposes a number of requirements to ensure transactions have integrity and coherence. Matching a transaction-based environment to SOA is not obvious.
Caching. Yes, there’s a potential for SOA to exhibit a slower performance than grandma driving her large 8-cylinder car on a Sunday afternoon. The answer to tame this particular demon is to apply caching extensively and judiciously.
All the above techniques relate to the actual operational effectiveness of SOA. Later on I will also cover the various considerations related to how to manage the SOA operations.
Let’s begin . . .

Labels: , , , , , , ,