Informatica Tutorials

Big Data Analytics

Transformation Flows

From an architectural perspective, you can transform your data in two ways:

■ Multistage Data Transformation
■ Pipelined Data Transformation

Multistage Data Transformation
The data transformation logic for most data warehouses consists of multiple steps. For
example, in transforming new records to be inserted into a sales table, there may be
separate logical transformation steps to validate each dimension key.



When using Oracle Database as a transformation engine, a common strategy is to
implement each transformation as a separate SQL operation and to create a separate,
temporary staging table (such as the tables new_sales_step1 and new_sales_
step2 in Figure ) to store the incremental results for each step. This
load-then-transform strategy also provides a natural checkpointing scheme to the
entire transformation process, which enables to the process to be more easily
monitored and restarted. However, a disadvantage to multistaging is that the space
and time requirements increase.

It may also be possible to combine many simple logical transformations into a single
SQL statement or single PL/SQL procedure. Doing so may provide better
performance than performing each step independently, but it may also introduce
difficulties in modifying, adding, or dropping individual transformations, as well as
recovering from failed transformations.


Pipelined Data Transformation

The ETL process flow can be changed dramatically and the database becomes an
integral part of the ETL solution.
The new functionality renders some of the former necessary process steps obsolete
while some others can be remodeled to enhance the data flow and the data
transformation to become more scalable and non-interruptive. The task shifts from
serial transform-then-load process (with most of the tasks done outside the database)
or load-then-transform process, to an enhanced transform-while-loading.
Oracle offers a wide variety of new capabilities to address all the issues and tasks
relevant in an ETL scenario. It is important to understand that the database offers
toolkit functionality rather than trying to address a one-size-fits-all solution. The
underlying database has to enable the most appropriate ETL process flow for a specific customer need, and not dictate or constrain it from a technical perspective.

Figure below illustrates the new functionality

Related Posts Plugin for WordPress, Blogger...

Please Share

Twitter Delicious Facebook Digg Stumbleupon Favorites More

 
Follow TutorialBlogs
Share on Facebook
Tweet this Blog
Add Blog to Technorati
Home