Technorati Profile Blog Flux Local IT Transformation with SOA
>

Friday, January 15, 2010

Data Mapping & Transformation—Part II



Last week, I outlined the various mapping options. The question I left hanging was this: Which approach is the best one?

My experience is that the transformation should take place as soon as possible. For starters, this means that broker-mediated transformations should be avoided, if possible.  The entity doing the transformation must have an understanding of the business processes being mapped and intermediate brokers usually lack this knowledge.
Best is to establish a canonical (i.e. standard) format and then allow both the receiver and the sender translate their respective formats into the chosen canonical form (performance considerations can be dealt with later).  For example, in the modern world, English can be seen as the standard used by all people—A German can communicate with Japanese in English.  In SOA terms, this canonical form may well be a specific set of XML structures.
If a standard protocol is feasible, you will need to decide whether this format will be a subset (a lowest common denominator) of all formats, or whether you will allow the format to carry functions that exceed the capabilities of either one or both of communicating entities.   If the former, you will be forced to “dumb-down” the functionality; if the latter, you will need to restrict the information conveyed by the canonical format in a case-by-case basis. Still, it’s best to make the standard format as comprehensive as possible. It’s always easier to restrict usage of excess functionality than it is to introduce new features during implementation.
If no standard format is feasible because you can’t control the sender or receiver, then you should adopt either a Sender-Makers-Right or a Receiver-Makes-Right approach. In general, the entity that has the better understanding of the business process should take ownership of the mapping.  For example, if you a tourist in another country and use of a canonical language (aka “English) is not possible, then it behooves you to try to speak their local language (i.e. Sender-Makes-Right). After all, it’s unrealistic to expect the local folks will speak your language. On the other hand, if you are visiting the tourism board in a foreign country then you may reasonably assume someone there might speak your language.
Typically, the Sender has a better understanding of the meaning (i.e. “semantics”) of a request. Consider the example where the requester searches for an employee record using the name. The name is in a structured fashion: LastName, FirstName. The server, on the other hand, expects to get the request with a string that contains the “last_name+first_name” (this is a common scenario when the server is a legacy application). The scenario is obvious (I mentioned this was a trivial example!). The requestor (the sender) should create the necessary string. Building the string is much easier for the sender than it is for the receiver. The sender knows the true nature of the last name, while the server’s logic could fail if it tried to derive the last name from regular expression parsing.  (I can’t tell you the number of times I have encountered systems that assume that DEL is my middle name!) Cleary the simple parser used by such software fails to understand that some last names have a space.
This recommendation still leaves open the question of where to do the mapping everything else being equal. My personal view is that when everything is equal, you should put the mapping logic in the server of the request (i.e. Receiver-Makes-Right), simply because it gives you a centralized, single point of control for the mappings. Relying on a Sender-makes-Right scenario places much of the burden on what could eventually become an unmanageable variety of clients. Also, I do suggest that if you decide for one or the other, that you don’t ever mix the approach. That is, if you decide to do a Sender-Makes-Right, do so throughout the system, or vice versa. The hybrid case with mixing Receive-Making-Right with Sender-Making-Right can make the system far too complex and unmanageable.
The corollary to this discussion is that there is a hybrid approach that I believe provides the most flexibility and solves the great majority of transformation needs: using a comprehensive canonical form combined with a Receiver-Makes-Right for cases where the super-set capability exceeds the receiver’s ability. The logic to this approach is that it is easier to down-scope features than it is to second-guess a more powerful capability.
Consider a typical search application scenario: A client sends a search request and the server then prepares a response which includes the found elements; plus ranking scores related to each item returned. The Sender converts the ranking weight factors from a relational database into a “standardized” ranking score system defined by the canonical form. Now, let’s assume the client (the receiver of the response) is not prepared to get or use this extra information. The receiver simply discards the extra information. The down-scoped information loses some of its value, but the client will still be able to present the search results, even if not in a ranked fashion. As long the key results are obtained no major harm occurs. A future, more competent client will be able to use the ranking information. Note that this approach only works if the information being ignored is not essential to the response. If you have a need to ensure essential information is not discarded, you’ll have to define this information as core to the canonical standards.
Yes. Transformation work is sure to have an impact on performance. Next I will cover a technique used to remediate this problem: Caching.

Labels: , , , ,

Friday, January 8, 2010

Data Mapping & Transformation—Part I


A basic tenet should be to keep transformations to a minimum. However, it is not always feasible to create completely homogeneous systems. Even if one wishes to use standard communication means between systems, the reality is that there will be sometimes a need to handle data and protocol mismatches. We frequently must interface the new system with legacy components, or support a federated environment with differing protocols and presentations. More often than not, the system will need to interface with a third party component that does not use the standard format, semantics or protocols.  The question that now emerges is how best to make these components talk to each other? In what components should we deploy the transformation logic?
There are various schools of thought about how to approach the thorny subject of transformation. We end up with the following alternatives: 
·      Broker makes right
·      Sender makes right
·      Receiver makes right
·      Sender and Receiver use a common (“canonical”) format
Broker-Makes-Right is akin to the United Nations with a diplomat giving a speech in, say, Russian, and then having the various translators translating the language for the listeners.
While this works in the UN because the translators are actual human beings capable of human understanding, when relying on automation, the most you can expect the intermediate to perform is straightforward X-to-Y transformations by following pre-defined mapping rules. Unlike UN translators, automated brokers lack the understanding to intelligently optimize the mapping based on cognition.  Broker mediated transformations only make sense when mapping low-level formatting mismatches. Do you need to convert an integer to a string? No problem. Use an intermediate component. Want to append a null termination to strings coming from A and destined for B? Again, fine.
Then there are cases where it is never advisable to let the broker perform automated conversions. For instance, currency conversions from Euros to Dollars may require the knowledge of specific exchange discounts or the application of exchange fees, or anything else that can be dreamed up by government bureaucracies. These types of “business-based” conversions must be placed in the upper layers that are capable of handling business rules, and not in an intermediate broker.
Sender-Makes-Right advocates argue that the sender of the message should know the capabilities of the recipient and adjust the format and characteristics of the delivery to match the recipient’s capabilities. An analogy is that of a teacher communicating to a group of kindergarteners. The teacher will not use complex words and will make an effort to match her language in a way the children will understand. Sender-Makes-Right assumes the sender typically knows-it-all and has the power to adapt its messages to nearly any recipient.
Receiver-Makes-Right proponents believe there is no way a sender will always know the capabilities and limitations of the receiver. Secondly, they argue that the sender is not necessarily the most powerful component of the system. Receiver-Makes-Right proponents argue that the recipient of a message should be able to extract what is needed and transform, as appropriate, the sender’s format.  Obviously, if the scope of information delivered by the sender exceeds what can be handled by the receiver, it is the receiver’s prerogative to dispose of the excess.  If the sender has less knowledge than the receiver, then it is easier for the receiver to map the sender’s format and complement the needed information through other means. An example is the manner in which Google applies complex heuristics to infer the user requests.
A final form is what I call the Esperanto approach, more commonly referred to as the canonical style. Here, both the sender and the receiver agree to use a common language, and both take responsibility for translating their respective formats into this common standard.
Which is the best approach? Clearly each method has both unique issues and advantages. Nest week I’ll go over the recommended approaches.

Labels: , , , , ,