The Data Sentinel is not unlike the grumpy bureaucrat processing your driver’s license application forms. After ensuring that you comply with what’s sure to be a ridiculously complicated list of required documents, it isolates you from directly accessing the files in the back.
While you, the applicant, the supplicant, cannot go around the counter and check the content of your files directly (not legally, anyway), the DMV supervisor in the back office is able to directly access any of the office files. After all, the supervisor is authorized to bypass the system processes intended to limit the direct access to the data. Direct supervisory access to data is one of the exceptions to the data visibility constrains mentioned earlier.
Next is the case of ETLs (Extract Transform Loads) of large sets of data as well as its reporting. These cases require batch level access to data in order to process or convert millions of data records and can wreck performance if carelessly implemented. Reporting jobs should ideally run against offline replicated databases; not the on-line production data bases. Better yet is to plan for a proper Data Warehousing strategy that allows you to run business intelligence processes independently of the main Operational Data Store (ODS). Never the less, on occasion, you will need to run summary reports or data-intensive real-time processes against the production database. When the report tool is allowed to access the database directly, bypassing the service layer provided by the Data Sentinel, you will need to ensure this access is well-behaved and that it runs as a low priority process and under restricted user privileges. The same control is required for the ETL processes. Operationally, you should always schedule batch-intensive processes for off-peak times such as nightly runs.
A third potential cause for exception to data visibility is implied by the use of off-the-shelf transaction monitors, requiring direct access to the databases in order to implement the ACID logic discussed earlier.
A fourth exception is demanded by the need to execute large data matching processes. If there is an interactive need to run a process against a large data base set with matching keys in a separate data base (“for all customers with sales greater than an $X amount, apply a promotion flag equal to the percentage corresponding to the customer’s geographic location in the promotion database”), then it makes no sense trying to implement each step via discrete services. Such an approach would be extremely contrived and inefficient. Instead, use of a Table-Joiner super-service will be required. More on that next.
Related to the issue of Session-Keeping is how to ensure that complex business transactions take place in order to meet the following so-called ACID properties:
·Be Atomic. The transaction is indivisible and it either happens or does not.
·Be Consistent. When the transaction is completed all data changes should be accountable. For example, if we are subtracting money from one bank account and transferring it to another account, the transaction should guarantee that the money added to the new account has been subtracted from the original account.
·Act in Isolation. I like to call this the sausage-making rule. No one should be able to see what’s going on during the execution of a transaction. No other transaction should be able to find the backend data in a half-done state. Isolation implies serialization of transactions.
·Be Durable. When the transaction is done, the changes are there and they should not disappear. Having a transaction against a cache that fails to update the data base is an example of non-durability.
Since we are dealing with a distributed processing environment based on services, the main method used to ensure that ACID is met is a process known as Two-Phase Commit. Essentially, a Two-Phase Commit establishes a transaction bracket prior to executing changes, performs the changes, and after ascertaining that all needed changes have occurred a commit to finalize the changes by closing the transaction bracket. If during the process, the system is unable to perform one or more of the necessary changes, a rollback process will occur to undo any prior partial transaction change. This is needed to ensure that, if unsuccessful, the transaction will, at the very least, return the system to its original state. This process is so common-sense that, in fact, all this business of transaction processing has been standardized. The OpenGroup[1] consortium defines transactional standards, and in particular the so-called X/Open protocol and XA compliance standards.
However, transactional flows under SOA tend to be non-trivial. This is because a transaction flow requires the keeping of session states throughout the life of the transaction and, as earlier discussed, state-keeping is to SOA what Kryptonite is to Superman. Say you want to transfer money from one checking account to another. You access the service Subtract X from Account; then you create another service, Add X to Account Y. This simple example puts the burden of transactional integrity on the client of the services. The client should ensure that the Add to Account service has succeeded before subtracting the money from the original account. An approach like this breeds as much complexity as a cat tangling a ball of yarn, and it should be avoided at all costs. Far simpler is to create a service, Transfer X from Account X to Account Y, and then let the service implementation worry about ensuring the integrity of the operation. The question then is what type of implementation is most appropriate.
While SOA based transactional standards are in place [2], actual vendor-based implementations supporting these standards don’t yet exist in the way mature Database XA compliant implementations exist. In general, you’d be better off leveraging the backend transaction facilities provided by RDBMS vendors or by off-the-shelf transaction monitors such as CICS, MTS, or Tuxedo. All in all, it’s probably best to encapsulate these off-the-shelf transaction services behind a very coarse meta-service whenever possible, rather than attempting to re-implement the ACID support via Two-Phase Commit at the services layer.
It should be noted that what I am essentially recommending is an exception to the encapsulating databases via a Data Sentinel when it comes down to implementing transactional services. The reasoning behind this is that integrating with off-the-shelf transactional services will likely require direct database access in order to leverage the XA capabilities of the database vendor.
As more actual off-the-shelf transactional service solutions for SOA appear in the future, we can then remove the exception.
More on the Data Visibility Exceptions will follow. . .
This blog covers the practical techniques, trials and tribulations associated with the transformation of IT systems from legacy technologies to systems using SOA and modern open systems. I also include the occasional interlude with rants about technology in general.
About Me
Name: Israel del Rio
Location: Atlanta, GA, United States
Israel has been recognized by Computerworld as one of their Premiere 100 IT honorees. Israel is a business and technology leader who has contributed the technology vision as key strategist and designer behind the enterprise technology roadmaps of large hospitality and travel companies. Israel has also have developed and deployed various mission-critical systems and in the process, he's been instrumental in creating and building effective and skilled development organizations.