Technorati Profile Blog Flux Local IT Transformation with SOA
>

Friday, November 20, 2009

The Data Sentinel


Data is what we put into the system and information is what we expect to get out of it (actually, there’s an epistemological argument that what we really crave is knowledge. For now, however, I’ll use the term ‘information’ to refer to the system output). Data is the dough; Information the cake. When we seek information, we want it to taste good, to be accurate, relevant, current, and understandable. Data is another matter. Data must be acquired and stored in whatever is best from a utilitarian perspective. Data can be anything. This explains why two digits were used to store the date years in the pre-millennium system, leading to the big Y2K brouhaha (more on this later).  Also, data is not always flat and homogeneous. It can have a hierarchical structure and come from multiple sources. In fact, data is whatever we choose to call the source of our information.
Google has reputedly hundreds of thousands of servers with Petabytes of data (1 Petabyte = 1,024 Terabytes), which you and I can access in a manner of milliseconds by typing free context searches. For many, a response from Google represents information, but to others this output is data to be used in the cooking of new information. As a matter of fact, one of the most exciting areas of research today is the emergence of Collective Intelligence via the mining of free text information on the web. Or consider the very promising WolframAlpha knowledge engine effort (wolframalpha.com) which very ambitiously taps a variety of databases to provide consolidated knowledge to users. There are still other mechanisms to provide information that rely on the human element as a source of data. Sites such as Mahalo.com or Chacha.com actually use carbon-based intelligent life forms to respond to questions.
Data can be stored in people’s neurons, spreadsheets, 3 x 5 index cards, papyrus scrolls, punched cards, magnetic media, optical disk or futuristic quantum storage. The point is that the user doesn’t care how the data is stored or how it is structured. In the end, Schemas, SQL, Rows, Columns, Indexes, Tables, are the ways we IT people store and manage data for our own convenience. But as long as the user can access data in a reliable, prompt, and comprehensive fashion, she could care less whether the data comes from a super-sophisticated object oriented data base or from a tattered printed copy of the World Almanac.
How should data be accessed then? I don’t recommend handling data in an explicit manner the way RDBMs vendors tell you to handle it. Data is at the core of the enterprise, but it does not have to be a “visible” core. You don’t visualize data with SQL. Instead, I suggest that you handle all access to data in an abstract way. You visualize data with services and this brings up the need via a Data Sentinel Layer. This layer should be, you guessed it, an SOA enabled component providing data accesses and maintenance services.
To put it simply, the Data Sentinel is the gatekeeper and abstraction layer for data. Nothing goes into the data storages without the Sentinel first passing it through; nothing gets out without the Sentinel allowing it. Furthermore, the Sentinel allows decoupling of how the data is ultimately stored from the way the data is perceived to be stored. Depending upon your needs, you may choose consolidated data storages or, alternatively, you may choose to follow a federated approach to heterogeneous data. It doesn’t matter. The Data Sentinel is responsible for presenting a common SOA façade to the outside world. 
Clearly, a key tenet should be to not allow willy-nilly access to data by bypassing the Sentinel. You should not allow applications or services (whether atomic or composite) to fire their own SQL statements against a data base. If you want to maintain the integrity of your SOA design, make sure to access data via the data abstraction services provided by the Sentinel services only.
Then again, this being a world filled with frailty, there are three exceptions where you will have to allow SOA entities to bypass the abstraction layer provided by the Sentinel. Every castle has secret passageways. I will cover the situations where exceptions may apply later: Security/Monitoring, Batch/Reporting, and the Data Joiner Pattern.
Obviously, data abstraction requires attention to performance, data persistence, and data integrity aspects. Thankfully, there are off-the-shelf tools to help facilitate this abstraction and the implementation of a Sentinel layer, such as Object-Relational mapping, automated data replication, and data caching products (e.g. Hibernate). Whether you choose to use an off-the-shelf tool or to write your own will depend upon your needs, but the use of those tools is not always sufficient to implement a proper Sentinel.  Object-Relational mapping or use of Stored Procedures, for example, are means to more easily map data access into SOA-like services, but you still need to ensure that the interfaces comply with the SOA interface criteria covered earlier. In the end, the use of a Data Sentinel Layer is a case of applying abstraction techniques to deal with the challenges of an SOA-based system, but one that also demands engineering work in order to deploy the Sentinel services in front of the Data Bases/Sources. There are additional techniques and considerations that also apply, and these will be discussed later on.

Labels: , , , , , , , , , , , , , , , ,

Thursday, August 20, 2009

The SOA Distributed Processing Pattern

It’s said that one of the keys to human intelligence is the ability for abstract thought and to instinctively rely on patterns. By expediently matching new situations to a “library” of pre-existing patterns normally referred to as “experience”, humans have been able to react more quickly in the face of new challenges. The sky is covered with dark clouds? No matter the shape of the clouds, their darkness and conglomeration indicate a storm is on its way. A large animal growls and salivates as it menacingly stares at you? I doubt you will stop to investigate what’s this is all about. If you did, your chances at reproducing would be as low as those of an ascetic monk. There’s no question that pattern recognition has been a key to our survival as a species.

Patterns have hierarchies, and the highest level pattern hierarchy deals with the overall system structure. I will discuss more about the use of patterns specific to SOA later, but first I want to discuss the broader Distributed Processing Pattern because the introduction of SOA has forced a rethink on how this pattern is defined. Just as a typical DMV office has the frowning employee at the window, the sullen clerk riffling and stamping papers in the back, and the rack with files along the back wall, most traditional distributed systems models have converged to a pattern consisting of these three tiers:

1. A Presentation tier which displays the program’s output and allows the user’s input.

2. A Business-Process tier that deals with the “heart” of the application. The actual business rules and processes are performed here.

3. The Data tier. Applications request user data via the presentation; then process the request within the business-processing area and interact with data as appropriate.

This three-component pattern has traditionally been referred to as 3-Tier architecture. Furthermore, traditional proponents of distributed processing use this 3-Tier architecture term to physically map each of the parts with actual distributed components. In this very literal interpretation of the model, the desktop devices perform presentation functions, and an intermediate server computer does some processing and then accesses data, usually via SQL or Store procedures. This fixed distributed model is typical of what was originally promoted by Data Base vendors as part of their preferred architectural model (e.g. Oracle Forms, using PL/SQL). The problem with this view of distributed processing is that it takes such a physical view of the distributed system that it soon becomes very static and inflexible, failing to accommodate new technology capabilities.

Because the PCs emerged outside the realm of the mainframe priesthood, the sad reality is that just as with a very intelligent blonde (not an oxymoron, all joking aside!) desperately trying to get a date, PCs had to sneak into the corporate world by pretending to be dumb terminals, a good fit within the static boundaries of a traditional presentation device. Also, while old intermediate systems were mostly used as communication switches or to act as specialized gateways, the physical view of the 3-Tier model tended to view today’s Servers just as database front-ends. Things have changed significantly. "Access" devices such as today's personal computers and wireless devices like your phone have tremendous power. The traditional 3-Tier view can’t accommodate their broader use.

Whereas the traditional distributed processing pattern separated processing into three physical tiers (presentation, business processing, and data), in reality, data rarely resides in a single source, and business processes cannot always be executed from a single server. Also, in real life, computation can take place anywhere, and even though organizations tend to be hierarchical, the actual business flows look more like a network than a strict hierarchy.

If SOA is to mirror this meshed topology then we must shift the paradigm somewhat. A proper SOA design should support true distributed environments; not just three tiers, but rather an n-Tier meshed topology with an intrinsic 3-Layer logical pattern.

The fundamental distributed pattern with SOA is that there are three layers; and multiple tiers—something I describe as the n-Tier/3-Layer SOA Distributed Processing pattern. The shift from Tiers to Layers has important implications: the layers in SOA are logical and are not meant to directly represent the underlying physical systems.

A typical SOA scenario is shown below:


This n-Tier/3-Layer pattern exists independently of the actual number of computers or entities. For example, imagine that the service pattern above depicts airport Kiosks displaying flight information. The user inputs the desired airline via the touch-screen terminal P1. This entry originates a service request to business process B1. Business process B1 logs the request by calling an authentication and log service that front-ends the database D1. Once the request has been authenticated, B1 requests the assistance of business process B2 (either one of the two B2’s shown). Process B2 may call the assistance of B3 for as many services as needed. It then extracts the flight information for the selected airline by calling service front-ending database D2. Finally, B2 returns the information to B1 which then passes the result onto P1 to output the requested flight information.

When dealing with this level of system design, little is assumed about the physical nature of the environment. It might well be that, initially, all business processes depicted (B1, B2, B3) execute in the same machine in which databases D1 and D2 reside. A second instance could have B2 running in a separate server, and so on.

The system can be scaled up by allowing the deployment of multiple service instances on different systems. Multiple instances also happen to improve the system robustness. The presentation services P1 and P2 may or may not reside in separate computers (remember the transparency tenets discussed earlier). Furthermore, assume there is an increase in the number of transactions going to the computer handling the business processes, and so we now wish to move business processes B2 and B3 to another machine. No problem. A key attribute of the n-Tier/3-Layer service oriented pattern is that there is no need to change applications when deploying services in separate computers.

Say we find a vendor is offering a cheaper and faster way to do things than our own B3 service. No problem. B3 can then run from the external vendor’s system. As a final note, you may have noticed that not once have I mentioned whether these computers run on Microsoft or Linux software, or are a mainframe or PC. Why not? Because the technology transparency tenet is that all software should be able to run on any given platform.

The concept of Cloud Computing, based on computer infrastructure available as a virtualized computing service via Internet-like mechanisms, is emerging as one of IT’s future directions. The idea is that the higher penetration of standards and the convergence of technologies is driving commoditization to the point that we don’t much care about what kind of technology provides the service we receive. Having an n-Timer/3-Layer pattern is a necessary (but not sufficient) condition to allow your solution to eventually garner the benefits of cloud computing in the future.

Having said this, while it’s easy to appreciate the flexibility that this type of architecture provides, keep in mind that it does have its drawbacks! For starters, there could be overhead in computing processing and message delivery latencies. This type of architecture is not designed for performance but for flexibility. On the plus side, a smart service-oriented design can optimize the way services are called and how data is passed between components via judicious use of caching techniques. Secondly, n-Tier/3-Layer can be complex, especially when deployed in a distributed fashion. SOA demands an extra focus on management and control. Thirdly, you’ll need to tighten your deployment guidelines or you might end up with a zoo of redundant services, just like when you see a traffic cop signaling traffic even though the semaphores are working just fine. Lastly, we began with patterns and end with a reaffirmation for their need.

A meshed system like the one shown has an exponential number of combinations, and it would not make sense to try and architect specific SOA arrangements over and over. Instead, the industry has now defined a series of SOA patterns that system architects can apply. Managing and taming the complexity of an SOA solution demands a disciplined use of patterns.

The story of how to make SOA work on the face of these challenges will be my next topic.

Labels: , , , , , , ,