Components of a Real-Time Data Distribution Environment

1. Real-Time Capture Data capture functionality is provided by the application or the DBMS on a transaction-by-transaction basis. Several different mechanisms are currently available.
a) Template-Enabled Event Capture: Data is captured both from an active transaction and from one or more associated data tables as the transaction is processed. It is immediately written to a data table or a message queue based on a predefined mapping template. This method has the advantages of being able to capture related context from the database while allowing precise control over output layout. It may also provide visibility before and after changes, thus supporting full delta (changed data) processing. SAP’s IDOC facility is an example of this capability.
b) Transaction Mirroring: The active transaction is written to an external data table or message queue as it is processed. This method only captures data content embedded in the transaction. It provides neither context from the database nor on-the-fly delta processing. Many off-the-shelf application packages (can be configured to) support this functionality.
c) DBMS Triggers: Changed data can be captured using database supported triggers or rules. The outbound data element mapping is implemented by coding the stored procedure or rule. Additional contextual data can be added via additionally embedded lookups, but it is often not done for performance reasons. These DBMS mechanisms are designed for real-time capture but not real-time delivery. Unless they support output to a message queue, an external periodic polling mechanism is required to move data into the delivery pipeline.
d) Custom Event Traps: Legacy applications that do not support any of the other methods can be modified to capture real-time events. If an application is well structured with centralized transaction management or modularized data writes, a customized code block can be embedded to capture and write event details to a queue.
2. Real-Time Delivery The prototypical pipeline of the real-time delivery mechanism is a cross-platform message queue such as that provided by Oracle or IBM. Riding on top of this pipeline is a dispatch mechanism that directs each specific event (message) to the multiple targets that requested it. New Era of Networking (NEON) provides a form of publish and subscribe dispatch that extends IBM’s MQ Series functionality.
Concerns exist in some quarters regarding how robust and scalable message queue products are today. ERP vendors offer proprietary pipelines with all the limitations that exist with such closed solutions. EAI vendors are attempting to move their more generalized, multi-point technologies into the real-time arena. We believe real-time delivery is a primal capability that will be driven by the infrastructure vendors to be as powerful, extensible and bulletproof as the market demands.
3. Transformation Engine Data is transformed instance by instance by this continuous processing engine. The engine provides physical format conversion, calculations or derivations, code translation, table lookups and other forms of row-level transforms. When you talk about a transformation engine today, most people think of the extract/transform/load (ETL) tools such as ETI and Constellar and data mart utilities (such as Informatica, Sagent, Ardent and others). All these tools are batch processors, not real-time engines. Most of these products can be upgraded and repositioned to attack this need if the vendors see this trend in time. NEON provides basic transformation as a part of the delivery process. Look for more real-time capability from the new wave of enterprise application integration vendors.
4. Message-Ready Targets Applications that are "message-ready" are one target for the real-time data distribution environment. These applications are more loosely coupled than application suites while being more intimately interrelated and responsive than batch-interfaced systems. The live feed into these applications can contain raw events as written by the source or transformed facts produced by value-added processing in the transformation engine.
5. Legacy Systems All existing applications and data warehouse environments are designed to collect data via batch interfaces. They can be incorporated into the RT/DDE without modification by building a flat file data staging area from the message queue. The flat files are cycled to simulate the load frequency of the target such as an existing operational data store.
6. The Real-Time Data Warehouse The magic of the RTDW comes from collecting data continuously in a real-time partition while sweeping a consistent snapshot into the static partition at regular intervals. The real-time partition is also the host for incremental aggregation that allows key summary metrics to be built on the fly. The static partition functions as a traditional data warehouse. It maintains consistent history of atomic details with persistent storage of incremental net changes.
The term "partition" is used in both a strict and a loose fashion. In the loose sense, the real-time partition could be a separate database or just separate tables in the same database. However either of these methods results in much more complex data designs and creates more cumbersome synchronization problems. The preferred method involves table-level partitioning where the DBMS supports a single logical image (one design) while enforcing performance isolation between the real-time and static partitions. The ideal solution provides DBMS managed "consolidation" of real-time data rows into the static data partition.
7. Incremental Aggregator This is the engine that aggregates on the fly, as noted earlier. It supports the dynamic maintenance of multiple levels of aggregation simultaneously. Microsoft’s OLAP Services for SQL Server contains the essential core of this capability.
8. Preparation Engine Preparation is a batch process of selecting, collecting, aggregating and projecting data to create specialized access-optimized tables. This engine reads data from the static partition of the data warehouse to populate consumer-specific data marts. The previously mentioned data mart utilities provide these functions.
9. Data Marts The real-time data warehouse provides seamless coverage for a broader range of analytic needs than a traditional load-and-go snapshot warehouse. Dependent data marts remain as an essential component of the complete solution. They continue to provide the best means of supporting multiple consumer-specific data slices and time frames. A data mart can be built using data as recent as last moment to a 10-year trend in the most optimal form for the task.