From an architectural perspective, you can transform your data in two ways:
■ Multistage Data Transformation
■ Pipelined Data Transformation
Multistage Data Transformation
The data transformation logic for most data warehouses consists of multiple steps. For
example, in transforming new records to be inserted into a sales table, there may be
separate logical transformation steps to validate each dimension key.
When using Oracle Database as a transformation engine, a common strategy is to
implement each transformation as a separate SQL operation and to create a separate,
temporary staging table (such as the tables new_sales_step1 and new_sales_
step2 in Figure ) to store the incremental results for each step. This
load-then-transform strategy also provides a natural checkpointing scheme to the
entire transformation process, which enables to the process to be more easily
monitored and restarted. However, a disadvantage to multistaging is that the space
and time requirements increase.
It may also be possible to combine many simple logical transformations into a single
SQL statement or single PL/SQL procedure. Doing so may provide better
performance than performing each step independently, but it may also introduce
difficulties in modifying, adding, or dropping individual transformations, as well as
recovering from failed transformations.
Pipelined Data Transformation
The ETL process flow can be changed dramatically and the database becomes an
integral part of the ETL solution.
The new functionality renders some of the former necessary process steps obsolete
while some others can be remodeled to enhance the data flow and the data
transformation to become more scalable and non-interruptive. The task shifts from
serial transform-then-load process (with most of the tasks done outside the database)
or load-then-transform process, to an enhanced transform-while-loading.
Oracle offers a wide variety of new capabilities to address all the issues and tasks
relevant in an ETL scenario. It is important to understand that the database offers
toolkit functionality rather than trying to address a one-size-fits-all solution. The
underlying database has to enable the most appropriate ETL process flow for a specific customer need, and not dictate or constrain it from a technical perspective.
Figure below illustrates the new functionality
Big Data Analytics
Transformation Flows
11:13 PM
divjeev