Friday, January 18, 2008

Dataundermining

Suppose Bob is your investment agent. Bob uses many tools and algorithms to determine the best investment for your money; he always seems to make good decisions and you profit from his skill and judgment. Bob's investing ability is surpassed only by his technological ineptitude. He knows nothing of computers, yet remains meticulous with his record-keeping. After every day of investing, he places a paper log of his investment activities in the second drawer of his desk.

Suppose further that Sue is your accountant. She, too, is great at what she does, as long as she is provided with complete and accurate information. The trouble resides with you and your inability to provide her with the information she requires on the part of your investments (since Bob takes care of that for you). Luckily, Bob is a trusting soul, and he has provided you with a key to his office in case of emergencies. You know about the log in his desk drawer and realize how much easier your life would be if Sue could obtain your investment activity information straight from the source. In a shady move, you give a copy of the key to Sue and inform her about the information in this log. Periodically, Sue lets herself into Bob's office and searches the log for investment activity he's done on your part. She uses this information to keep accurate tabs on your investments without having to badger you.

A while later, Bob completes a computer course at the local college. He now recognizes the productivity benefits of electronic record keeping and transfers the records from the log to his PC. Bob realizes that maintaining a physical copy of his records has no merit and only takes time away from doing his job; he shreds the log, and against his nutritionist's better judgment, fills the desk drawer with soda and snacks.

Later that evening, Sue comes to inspect the log. She's alarmed to find the log has been replaced with soda, candy bars and other goodies. You get a phone call from an upset accountant, demanding that you have Bob continue to put the daily log in his desk drawer, "... or we're back where we started", she says.

This is an example of dataundermining; Bob and his record-keeping system have been undermined, and by my count, you have roughly four options:

  1. Demand Bob continue with his paper logging practices so as not to keep Sue from efficiently keeping track of your investment activity. (This would also lead to a confession about your giving Sue the proverbial key to the kingdom and her sneaking data out from under Bob.)

  2. Revert back to your having to keep Sue manually apprised of Bob's investment activities on your behalf.

  3. Do nothing, letting Bob continue on oblivious to the situation and forcing Sue to attempt her job with incomplete information.

  4. Force Bob and Sue to communicate with each other.

I personally don't like any of options 1 through 3.

Option 1 requires Bob to either abandon his path of evolution or do double the work by keeping both his electronic records and his physical records up-to-date. Neither suit your needs from an investment point of view. You want Bob to be able to grow and evolve with the rest of the world; after all, he's making your bank. You also want Bob to spend the maximum amount of time and resources furthering your investment success; maintaining two forms of records detracts from his ability to do that.

Option 2 requires extra effort on your part. Your keeping Sue updated on your investment activities is just not in the cards. The speed and accuracy with which you could deliver information to Sue severely handicaps her ability to keep your financial records straight. After all, let's face it, you're no expert on your investments, that's why you have Bob.

Option 3 is what I call classic. You recognize that you have a broken system, but too much effort is involved in keeping everybody happy, so you just sit on your hands and accept the fact that one piece of the equation is just not going to be broken.

Clearly, I saved the appropriate option for last.
To me, option 4 is the option that should have been chosen in the first place. When Bob and Sue were hired, they should have been informed of the need for your financial information to be shared and communicated to other entities (including each other) upon request by one of those entities. In short, they need to open themselves to integration.

The preceding is an analogy for a problem with which I've wrestled countless times. A shop creates an application to solve a particular business problem. This application requires a database for persisting data relating to the problem domain, and consequently one is created along-side the application's development. Sooner or later, this application is deployed, beginning its life in the wild. We'll refer to this application as "Application 1".

Later, another application is developed to solve one of the many other problems the business faces. Nothing is really special about this application, which we will call "Application 2" for lack of a better name, except that in order to fulfill its use-cases efficiently, it requires information created and managed by Application 1. No problem, we'll just hook in to Application 1's database and retrieve some of this data. Herein lies our problem. We've just undermined Application 1 in the same way that Sue undermined Bob in our analogy. We don't yet notice any adverse effects, and Application 2 is completed and released to the wild.

Sometime later, the business has acquired a list of new features and use-cases for Application 1, along with some obsolete use-cases that can be removed. After inspection of the requirements, a design for the new version of Application 1 has been formulated. This design calls for a few additions to existing structures and a few changes in how existing data is structured, greatly simplifying the application development work.

Suppose the development team quickly realizes that Application 2 depends on the data structures of Application 1's database. Dataundermining at its finest. Updating Application 1 is not going to be as easy as the design suggested, and they have a very tough choice to make. Do they update Application 2 as a part of the Application 1 upgrade? Sounds like scope-creep to me, and the business may not have budgeted for multiple applications to come under the knife. Do they re-think their approach and make updates to Application 1's database and use-cases to "work-around" the dependency that's been created? This would keep the data structures that are depended upon by Application 2 intact, but could potentially create other illogical data structures and increase the overhead & complexity for the interaction of Application 1 and its database. It would also make further evolution of Application 1 more difficult. I hope its clear to see this scenario resembles option 1 of our analogy.

Suppose again the development team realizes the dependency situation on Application 1's database. Another solution consists of an attempt to remove the integration piece from Application 2 and for the user to consult Application 1 prior to use, but this brings about a greater potential for error and a loss of productivity. In this case, we're closely matching option 2 of our analogy.

In a different light, suppose the development team is oblivious to processes outside of Application 1. Development ensues, the structures are modified, the application update is complete and Application 1 v2.0 is released. Now, the effects of dataundermining are manifest after the fact. Application 2 crashes. What happened? It's rather simple; Application 2 is looking in Application 1's database for a data structure shaped like v1.0 and got a data structure shaped like v2.0. It follows naturally that issues should arise from a scenario like this, a scenario very closely mirroring option 3 from our analogy.



The illustration above exemplifies how dataundermining becomes a problem. Application 2 has accessed Application 1's database to fulfill its use-cases, inhibiting Application 1's ability to evolve. The most robust solution uses techniques of SOA to allow each application to communicate without acquiring illegitimate access to the other's data store. Whether that be through text-based communication, like Web Services, or binary-based communication like CORBA, DCOM or RMI, the best scenario for all involved requires letting each application be solely responsible for its own data, and building into each app the communication abilities for access to its data.

1 comment:

Ethan Vizitei said...

Nice work, man. The analogy is great, and I think it holds up well to scrutiny.