Informatica Tutorials

Big Data Analytics

Transformation Mechanisms

You have the following choices for transforming data inside the database:

■ Transforming Data Using SQL
■ Transforming Data Using PL/SQL
■ Transforming Data Using Table Functions

Transforming Data Using SQL

Once data is loaded into the database, data transformations can be executed using SQL
operations. There are four basic techniques for implementing SQL data
transformations:

■ CREATE TABLE ... AS SELECT And INSERT /*+APPEND*/ AS SELECT
■ Transforming Data Using UPDATE
■ Transforming Data Using MERGE
■ Transforming Data Using Multitable INSERT


CREATE TABLE ... AS SELECT And INSERT /*+APPEND*/ AS SELECT
The CREATE TABLE ... AS SELECT statement (CTAS) is a powerful tool for
manipulating large sets of data. As shown in the following example, many data
transformations can be expressed in standard SQL, and CTAS provides a mechanism
for efficiently executing a SQL query and storing the results of that query in a new
database table. The INSERT /*+APPEND*/ ... AS SELECT statement offers the same
capabilities with existing database tables.

In a data warehouse environment, CTAS is typically run in parallel using NOLOGGING
mode for best performance. A simple and common type of data transformation is data substitution. In a data substitution transformation, some or all of the values of a single column are modified.

For example, our sales table has a channel_id column. This column indicates
whether a given sales transaction was made by a company's own sales force (a direct
sale) or by a distributor (an indirect sale).
You may receive data from multiple source systems for your data warehouse. Suppose
that one of those source systems processes only direct sales, and thus the source
system does not know indirect sales channels. When the data warehouse initially
receives sales data from this system, all sales records have a NULL value for the
sales.channel_id field. These NULL values must be set to the proper key value.
For example, you can do this efficiently using a SQL function as part of the insertion into the target sales table statement. The structure of source table sales_activity_ direct is as follows:

DESC sales_activity_direct
Name Null? Type
------------ ----- ----------------
SALES_DATE DATE
PRODUCT_ID NUMBER
CUSTOMER_ID NUMBER
PROMOTION_ID NUMBER
AMOUNT NUMBER
QUANTITY NUMBER

The following SQL statement inserts data from sales_activity_direct into the
sales table of the sample schema, using a SQL function to truncate the sales date
values to the midnight time and assigning a fixed channel ID of 3.

INSERT /*+ APPEND NOLOGGING PARALLEL */
INTO sales SELECT product_id, customer_id, TRUNC(sales_date), 3,
promotion_id, quantity, amount
FROM sales_activity_direct;

Transforming Data Using UPDATE
Another technique for implementing a data substitution is to use an UPDATE statement
to modify the sales.channel_id column. An UPDATE will provide the correct
result. However, if the data substitution transformations require that a very large
percentage of the rows (or all of the rows) be modified, then, it may be more efficient to use a CTAS statement than an UPDATE.


Transforming Data Using MERGE
Oracle Database's merge functionality extends SQL, by introducing the SQL keyword
MERGE, in order to provide the ability to update or insert a row conditionally into a
table or out of line single table views. Conditions are specified in the ON clause. This is,besides pure bulk loading, one of the most common operations in data warehouse synchronization.


Merge Examples The following discusses various implementations of a merge. The
examples assume that new data for the dimension table products is propagated to the
data warehouse and has to be either inserted or updated. The table products_delta
has the same structure as products.

Example Merge Operation Using SQL

MERGE INTO products t USING products_delta s
ON (t.prod_id=s.prod_id)
WHEN MATCHED THEN UPDATE SET
t.prod_list_price=s.prod_list_price, t.prod_min_price=s.prod_min_price
WHEN NOT MATCHED THEN INSERT (prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc, prod_category, prod_category_desc, prod_status,
prod_list_price, prod_min_price)
VALUES (s.prod_id, s.prod_name, s.prod_desc, s.prod_subcategory,
s.prod_subcategory_desc, s.prod_category, s.prod_category_desc,
s.prod_status, s.prod_list_price, s.prod_min_price);


Transforming Data Using Multitable INSERT
Many times, external data sources have to be segregated based on logical attributes for insertion into different target objects. It is also frequent in data warehouse
environments to fan out the same source data into several target objects. Multitable
inserts provide a new SQL statement for these kinds of transformations, where data
can either end up in several or exactly one target, depending on the business
transformation rules. This insertion can be done conditionally based on business rules or unconditionally.

It offers the benefits of the INSERT ... SELECT statement when multiple tables are
involved as targets. In doing so, it avoids the drawbacks of the two obvious
alternatives. You either had to deal with n independent INSERT … SELECT
statements, thus processing the same source data n times and increasing the
transformation workload n times. Alternatively, you had to choose a procedural
approach with a per-row determination how to handle the insertion. This solution
lacked direct access to high-speed access paths available in SQL.
As with the existing INSERT ... SELECT statement, the new statement can be
parallelized and used with the direct-load mechanism for faster performance.



Example Unconditional Insert

The following statement aggregates the transactional sales information, stored in
sales_activity_direct, on a per daily base and inserts into both the sales and
the costs fact table for the current day.

INSERT ALL
INTO sales VALUES (product_id, customer_id, today, 3, promotion_id,
quantity_per_day, amount_per_day)
INTO costs VALUES (product_id, today, promotion_id, 3,
product_cost, product_price)
SELECT TRUNC(s.sales_date) AS today, s.product_id, s.customer_id,
s.promotion_id, SUM(s.amount) AS amount_per_day, SUM(s.quantity)
quantity_per_day, p.prod_min_price*0.8 AS product_cost, p.prod_list_price
AS product_price
FROM sales_activity_direct s, products p
WHERE s.product_id = p.prod_id AND TRUNC(sales_date) = TRUNC(SYSDATE)
GROUP BY TRUNC(sales_date), s.product_id, s.customer_id, s.promotion_id,
p.prod_min_price*0.8, p.prod_list_price;



Example Conditional ALL Insert

The following statement inserts a row into the sales and costs tables for all sales
transactions with a valid promotion and stores the information about multiple
identical orders of a customer in a separate table cum_sales_activity. It is possible
two rows will be inserted for some sales transactions, and none for others.

INSERT ALL
WHEN promotion_id IN (SELECT promo_id FROM promotions) THEN
INTO sales VALUES (product_id, customer_id, today, 3, promotion_id,
quantity_per_day, amount_per_day)
INTO costs VALUES (product_id, today, promotion_id, 3,
product_cost, product_price)
WHEN num_of_orders > 1 THEN
INTO cum_sales_activity VALUES (today, product_id, customer_id,
promotion_id, quantity_per_day, amount_per_day, num_of_orders)
SELECT TRUNC(s.sales_date) AS today, s.product_id, s.customer_id,
s.promotion_id, SUM(s.amount) AS amount_per_day, SUM(s.quantity)
quantity_per_day, COUNT(*) num_of_orders, p.prod_min_price*0.8
AS product_cost, p.prod_list_price AS product_price
FROM sales_activity_direct s, products p
WHERE s.product_id = p.prod_id
AND TRUNC(sales_date) = TRUNC(SYSDATE)
GROUP BY TRUNC(sales_date), s.product_id, s.customer_id,
s.promotion_id, p.prod_min_price*0.8, p.prod_list_price;


Example Conditional FIRST Insert

The following statement inserts into an appropriate shipping manifest according to the total quantity and the weight of a product order. An exception is made for high value orders, which are also sent by express, unless their weight classification is not too high. All incorrect orders, in this simple example represented as orders without a quantity, are stored in a separate table. It assumes the existence of appropriate tables large_freight_shipping, express_shipping, default_shipping, and
incorrect_sales_order.

INSERT FIRST WHEN (sum_quantity_sold > 10 AND prod_weight_class < 5) AND
sum_quantity_sold >=1) OR (sum_quantity_sold > 5 AND prod_weight_class > 5) THEN
INTO large_freight_shipping VALUES
(time_id, cust_id, prod_id, prod_weight_class, sum_quantity_sold)
WHEN sum_amount_sold > 1000 AND sum_quantity_sold >=1 THEN
INTO express_shipping VALUES
(time_id, cust_id, prod_id, prod_weight_class,
sum_amount_sold, sum_quantity_sold)
WHEN (sum_quantity_sold >=1) THEN INTO default_shipping VALUES
(time_id, cust_id, prod_id, sum_quantity_sold)
ELSE INTO incorrect_sales_order VALUES (time_id, cust_id, prod_id)
SELECT s.time_id, s.cust_id, s.prod_id, p.prod_weight_class,
SUM(amount_sold) AS sum_amount_sold,
SUM(quantity_sold) AS sum_quantity_sold
FROM sales s, products p
WHERE s.prod_id = p.prod_id AND s.time_id = TRUNC(SYSDATE)
GROUP BY s.time_id, s.cust_id, s.prod_id, p.prod_weight_class;


Example Mixed Conditional and Unconditional Insert
The following example inserts new customers into the customers table and stores all
new customers with cust_credit_limit higher then 4500 in an additional,
separate table for further promotions.

INSERT FIRST WHEN cust_credit_limit >= 4500 THEN INTO customers
INTO customers_special VALUES (cust_id, cust_credit_limit)
ELSE INTO customers


Transforming Data Using PL/SQL

In a data warehouse environment, you can use procedural languages such as PL/SQL
to implement complex transformations in the Oracle Database. Whereas CTAS
operates on entire tables and emphasizes parallelism, PL/SQL provides a row-based
approached and can accommodate very sophisticated transformation rules. For
example, a PL/SQL procedure could open multiple cursors and read data from
multiple source tables, combine this data using complex business rules, and finally
insert the transformed data into one or more target table. It would be difficult or
impossible to express the same sequence of operations using standard SQL statements.

Using a procedural language, a specific transformation (or number of transformation
steps) within a complex ETL processing can be encapsulated, reading data from an
intermediate staging area and generating a new table object as output. A previously
generated transformation input table and a subsequent transformation will consume
the table generated by this specific transformation. Alternatively, these encapsulated transformation steps within the complete ETL process can be integrated seamlessly, thus streaming sets of rows between each other without the necessity of intermediate staging. You can use table functions to implement such behavior.


Transforming Data Using Table Functions

Table functions provide the support for pipelined and parallel execution of
transformations implemented in PL/SQL, C, or Java. Scenarios as mentioned earlier
can be done without requiring the use of intermediate staging tables, which interrupt
the data flow through various transformations steps.

What is a Table Function?

A table function is defined as a function that can produce a set of rows as output.
Additionally, table functions can take a set of rows as input. Prior to Oracle9i, PL/SQL
functions:
■ Could not take cursors as input.
■ Could not be parallelized or pipelined.
Now, functions are not limited in these ways. Table functions extend database
functionality by allowing:
■ Multiple rows to be returned from a function.
■ Results of SQL subqueries (that select multiple rows) to be passed directly to
functions.
■ Functions take cursors as input.
■ Functions can be parallelized.
■ Returning result sets incrementally for further processing as soon as they are
created. This is called incremental pipelining
Table functions can be defined in PL/SQL using a native PL/SQL interface, or in Java
or C using the Oracle Data Cartridge Interface (ODCI).

Related Posts Plugin for WordPress, Blogger...

Please Share

Twitter Delicious Facebook Digg Stumbleupon Favorites More

 
Follow TutorialBlogs
Share on Facebook
Tweet this Blog
Add Blog to Technorati
Home