Informatica Tutorials

Big Data Analytics

Partitioning for Refreshing Data Warehouses

Refresh Scenario 1
Data is loaded daily. However, the data warehouse contains two years of data, so that
partitioning by day might not be desired.

The solution is to partition by week or month (as appropriate). Use INSERT to add the
new data to an existing partition. The INSERT operation only affects a single partition, so the benefits described previously remain intact. The INSERT operation could occur while the partition remains a part of the table. Inserts into a single partition can be parallelized:

INSERT /*+ APPEND*/ INTO sales PARTITION (sales_01_2001)
SELECT * FROM new_sales;

The indexes of this sales partition will be maintained in parallel as well. An
alternative is to use the EXCHANGE operation. You can do this by exchanging the
sales_01_2001 partition of the sales table and then using an INSERT operation.
You might prefer this technique when dropping and rebuilding indexes is more
efficient than maintaining them.

Refresh Scenario 2
New data feeds, although consisting primarily of data for the most recent day, week,
and month, also contain some data from previous time periods.

Solution 1 Use parallel SQL operations (such as CREATE TABLE ... AS SELECT) to
separate the new data from the data in previous time periods. Process the old data
separately using other techniques.

New data feeds are not solely time based. You can also feed new data into a data
warehouse with data from multiple operational systems on a business need basis. For example, the sales data from direct channels may come into the data warehouse
separately from the data from indirect channels. For business reasons, it may
furthermore make sense to keep the direct and indirect data in separate partitions.

Solution 2 Oracle supports composite range-list partitioning. The primary partitioning strategy of the sales table could be range partitioning based on time_id as shown in the example. However, the subpartitioning is a list based on the channel attribute. Each subpartition can now be loaded independently of each other (for each distinct channel) and added in a rolling window operation as discussed before. The
partitioning strategy addresses the business needs in the most optimal manner.

Related Posts Plugin for WordPress, Blogger...

Please Share

Twitter Delicious Facebook Digg Stumbleupon Favorites More

Follow TutorialBlogs
Share on Facebook
Tweet this Blog
Add Blog to Technorati