Informatica Tutorials

Big Data Analytics

Performance Issues for Range, List, Hash, and Composite Partitioning

When to Use Range Partitioning
Range partitioning is a convenient method for partitioning historical data. The
boundaries of range partitions define the ordering of the partitions in the tables or
indexes.
Range partitioning is usually used to organize data by time intervals on a column of
type DATE. Thus, most SQL statements accessing range partitions focus on timeframes.
An example of this is a SQL statement similar to "select data from a particular period in time." In such a scenario, if each partition represents data for one month, the query "find data of month 98-DEC" needs to access only the December partition of year 98.
This reduces the amount of data scanned to a fraction of the total data available, an
optimization method called partition pruning. Range partitioning is also ideal when you periodically load new data and purge old data. It is easy to add or drop partitions. It is common to keep a rolling window of data, for example keeping the past 36 months' worth of data online. Range partitioning simplifies this process. To add data from a new month, you load it into a separate table, clean it, index it, and then add it to the range-partitioned table using the EXCHANGE PARTITION statement, all while the original table remains online. Once you add the new partition, you can drop the trailing month with the DROP PARTITION statement. The alternative to using the DROP PARTITION statement can be to archive the partition and make it read only, but this works only when your partitions are in separate tablespaces.

In conclusion, consider using range partitioning when:
■ Very large tables are frequently scanned by a range predicate on a good
partitioning column, such as ORDER_DATE or PURCHASE_DATE. Partitioning the
table on that column enables partition pruning.
■ You want to maintain a rolling window of data.
■ You cannot complete administrative operations, such as backup and restore, on
large tables in an allotted time frame, but you can divide them into smaller logical
pieces based on the partition range column.
The following example creates the table salestable for a period of two years, 1999
and 2000, and partitions it by range according to the column s_salesdate to
separate the data into eight quarters, each corresponding to a partition.

CREATE TABLE salestable
(s_productid NUMBER,
s_saledate DATE,
s_custid NUMBER,
s_totalprice NUMBER)
PARTITION BY RANGE(s_saledate)
(PARTITION sal99q1 VALUES LESS THAN (TO_DATE('01-APR-1999', 'DD-MON-YYYY')),
PARTITION sal99q2 VALUES LESS THAN (TO_DATE('01-JUL-1999', 'DD-MON-YYYY')),
PARTITION sal99q3 VALUES LESS THAN (TO_DATE('01-OCT-1999', 'DD-MON-YYYY')),
PARTITION sal99q4 VALUES LESS THAN (TO_DATE('01-JAN-2000', 'DD-MON-YYYY')),
PARTITION sal00q1 VALUES LESS THAN (TO_DATE('01-APR-2000', 'DD-MON-YYYY')),
PARTITION sal00q2 VALUES LESS THAN (TO_DATE('01-JUL-2000', 'DD-MON-YYYY')),
PARTITION sal00q3 VALUES LESS THAN (TO_DATE('01-OCT-2000', 'DD-MON-YYYY')),
PARTITION sal00q4 VALUES LESS THAN (TO_DATE('01-JAN-2001', 'DD-MON-YYYY')));


When to Use Hash Partitioning
The way Oracle Database distributes data in hash partitions does not correspond to a
business or a logical view of the data, as it does in range partitioning. Consequently, hash partitioning is not an effective way to manage historical data. However, hash partitions share some performance characteristics with range partitions. For example, partition pruning is limited to equality predicates. You can also use partition-wise joins, parallel index access, and parallel DML.

As a general rule, use hash partitioning for the following purposes:
■ To improve the availability and manageability of large tables.
■ To avoid data skew among partitions. Hash partitioning is an effective means of
distributing data because Oracle hashes the data into a number of partitions, each
of which can reside on a separate device. Thus, data is evenly spread over a
sufficient number of devices to maximize I/O throughput. Similarly, you can use
hash partitioning to distribute evenly data among the nodes of an MPP platform
that uses Oracle Real Application Clusters.
■ If it is important to use partition pruning and partition-wise joins according to a
partitioning key that is mostly constrained by a distinct value or value list.

If you add or merge a hashed partition, Oracle automatically rearranges the rows to
reflect the change in the number of partitions and subpartitions. The hash function
that Oracle uses is especially designed to limit the cost of this reorganization. Instead of reshuffling all the rows in the table, Oracles uses an "add partition" logic that splits one and only one of the existing hashed partitions. Conversely, Oracle coalesces a partition by merging two existing hashed partitions.

Although the hash function's use of "add partition" logic dramatically improves the
manageability of hash partitioned tables, it means that the hash function can cause a
skew if the number of partitions of a hash partitioned table, or the number of
subpartitions in each partition of a composite table, is not a power of two. In the worst case, the largest partition can be twice the size of the smallest. So for optimal performance, create a number of partitions and subpartitions for each partition that is a power of two. For example, 2, 4, 8, 16, 32, 64, 128, and so on.
The following example creates four hashed partitions for the table sales_hash using
the column s_productid as the partition key:

CREATE TABLE sales_hash
(s_productid NUMBER,
s_saledate DATE,
s_custid NUMBER,
s_totalprice NUMBER)
PARTITION BY HASH(s_productid)
PARTITIONS 4;

Specify partition names if you want to choose the names of the partitions. Otherwise,
Oracle automatically generates internal names for the partitions. Also, you can use the STORE IN clause to assign hash partitions to tablespaces in a round-robin manner.


When to Use List Partitioning
You should use list partitioning when you want to specifically map rows to partitions
based on discrete values.Unlike range and hash partitioning, multi-column partition keys are not supported for list partitioning. If a table is partitioned by list, the partitioning key can only consist of a single column of the table.


When to Use Composite Range-Hash Partitioning

Composite range-hash partitioning offers the benefits of both range and hash
partitioning. With composite range-hash partitioning, Oracle first partitions by range. Then, within each range, Oracle creates subpartitions and distributes data within them using the same hashing algorithm it uses for hash partitioned tables.
Data placed in composite partitions is logically ordered only by the boundaries that
define the range level partitions. The partitioning of data within each partition has no logical organization beyond the identity of the partition to which the subpartitions belong. Consequently, tables and local indexes partitioned using the composite range-hash method:
■ Support historical data at the partition level.
■ Support the use of subpartitions as units of parallelism for parallel operations such as PDML or space management and backup and recovery.
■ Are eligible for partition pruning and partition-wise joins on the range and hash
partitions.


Using Composite Range-Hash Partitioning
Use the composite range-hash partitioning method for tables and local indexes if:
■ Partitions must have a logical meaning to efficiently support historical data
■ The contents of a partition can be spread across multiple tablespaces, devices, or
nodes (of an MPP system)
■ You require both partition pruning and partition-wise joins even when the
pruning and join predicates use different columns of the partitioned table
■ You require a degree of parallelism that is greater than the number of partitions
for backup, recovery, and parallel operations
Most large tables in a data warehouse should use range partitioning. Composite
partitioning should be used for very large tables or for data warehouses with a
well-defined need for these conditions. When using the composite method, Oracle
stores each subpartition on a different segment. Thus, the subpartitions may have
properties that differ from the properties of the table or from the partition to which the subpartitions belong.
The following example partitions the table sales_range_hash by range on the
column s_saledate to create four partitions that order data by time. Then, within
each range partition, the data is further subdivided into 16 subpartitions by hash on
the column s_productid:
CREATE TABLE sales_range_hash(
s_productid NUMBER,
s_saledate DATE,
s_custid NUMBER,
s_totalprice NUMBER)
PARTITION BY RANGE (s_saledate)
SUBPARTITION BY HASH (s_productid) SUBPARTITIONS 8
(PARTITION sal99q1 VALUES LESS THAN (TO_DATE('01-APR-1999', 'DD-MON-YYYY')),
PARTITION sal99q2 VALUES LESS THAN (TO_DATE('01-JUL-1999', 'DD-MON-YYYY')),
PARTITION sal99q3 VALUES LESS THAN (TO_DATE('01-OCT-1999', 'DD-MON-YYYY')),
PARTITION sal99q4 VALUES LESS THAN (TO_DATE('01-JAN-2000', 'DD-MON-YYYY')));
Each hashed subpartition contains sales data for a single quarter ordered by product
code. The total number of subpartitions is 4x8 or 32.
In addition to this syntax, you can create subpartitions by using a subpartition
template. This offers better ease in naming and control of location for tablespaces and
subpartitions. The following statement illustrates this:
CREATE TABLE sales_range_hash(
s_productid NUMBER,
s_saledate DATE,
s_custid NUMBER,
s_totalprice NUMBER)
PARTITION BY RANGE (s_saledate)
SUBPARTITION BY HASH (s_productid)
SUBPARTITION TEMPLATE(
SUBPARTITION sp1 TABLESPACE tbs1,
SUBPARTITION sp2 TABLESPACE tbs2,
SUBPARTITION sp3 TABLESPACE tbs3,
SUBPARTITION sp4 TABLESPACE tbs4,
SUBPARTITION sp5 TABLESPACE tbs5,
SUBPARTITION sp6 TABLESPACE tbs6,
SUBPARTITION sp7 TABLESPACE tbs7,
SUBPARTITION sp8 TABLESPACE tbs8)
(PARTITION sal99q1 VALUES LESS THAN (TO_DATE('01-APR-1999', 'DD-MON-YYYY')),
PARTITION sal99q2 VALUES LESS THAN (TO_DATE('01-JUL-1999', 'DD-MON-YYYY')),
PARTITION sal99q3 VALUES LESS THAN (TO_DATE('01-OCT-1999', 'DD-MON-YYYY')),
PARTITION sal99q4 VALUES LESS THAN (TO_DATE('01-JAN-2000', 'DD-MON-YYYY')));



When to Use Composite Range-List Partitioning
Composite range-list partitioning offers the benefits of both range and list partitioning. With composite range-list partitioning, Oracle first partitions by range. Then, within each range, Oracle creates subpartitions and distributes data within them to organize sets of data in a natural way as assigned by the list.

Data placed in composite partitions is logically ordered only by the boundaries that
define the range level partitions.


Using Composite Range-List Partitioning
Use the composite range-list partitioning method for tables and local indexes if:
■ Subpartitions have a logical grouping defined by the user.
■ The contents of a partition can be spread across multiple tablespaces, devices, or
nodes (of an MPP system).
■ You require both partition pruning and partition-wise joins even when the
pruning and join predicates use different columns of the partitioned table.
■ You require a degree of parallelism that is greater than the number of partitions
for backup, recovery, and parallel operations.
Most large tables in a data warehouse should use range partitioning. Composite
partitioning should be used for very large tables or for data warehouses with a
well-defined need for these conditions. When using the composite method, Oracle
stores each subpartition on a different segment. Thus, the subpartitions may have
properties that differ from the properties of the table or from the partition to which the subpartitions belong.
This statement creates a table quarterly_regional_sales that is range
partitioned on the txn_date field and list subpartitioned on state.

CREATE TABLE quarterly_regional_sales
(deptno NUMBER, item_no VARCHAR2(20),
txn_date DATE, txn_amount NUMBER, state VARCHAR2(2))
PARTITION BY RANGE (txn_date)
SUBPARTITION BY LIST (state)
(
PARTITION q1_1999 VALUES LESS THAN(TO_DATE('1-APR-1999','DD-MON-YYYY'))
(SUBPARTITION q1_1999_northwest VALUES ('OR', 'WA'),
SUBPARTITION q1_1999_southwest VALUES ('AZ', 'UT', 'NM'),
SUBPARTITION q1_1999_northeast VALUES ('NY', 'VM', 'NJ'),
SUBPARTITION q1_1999_southeast VALUES ('FL', 'GA'),
SUBPARTITION q1_1999_northcentral VALUES ('SD', 'WI'),
SUBPARTITION q1_1999_southcentral VALUES ('NM', 'TX')),
PARTITION q2_1999 VALUES LESS THAN(TO_DATE('1-JUL-1999','DD-MON-YYYY'))
(SUBPARTITION q2_1999_northwest VALUES ('OR', 'WA'),
SUBPARTITION q2_1999_southwest VALUES ('AZ', 'UT', 'NM'),
SUBPARTITION q2_1999_northeast VALUES ('NY', 'VM', 'NJ'),
SUBPARTITION q2_1999_southeast VALUES ('FL', 'GA'),
SUBPARTITION q2_1999_northcentral VALUES ('SD', 'WI'),
SUBPARTITION q2_1999_southcentral VALUES ('NM', 'TX')),
PARTITION q3_1999 VALUES LESS THAN (TO_DATE('1-OCT-1999','DD-MON-YYYY'))
(SUBPARTITION q3_1999_northwest VALUES ('OR', 'WA'),
SUBPARTITION q3_1999_southwest VALUES ('AZ', 'UT', 'NM'),
SUBPARTITION q3_1999_northeast VALUES ('NY', 'VM', 'NJ'),
SUBPARTITION q3_1999_southeast VALUES ('FL', 'GA'),
SUBPARTITION q3_1999_northcentral VALUES ('SD', 'WI'),
SUBPARTITION q3_1999_southcentral VALUES ('NM', 'TX')),
PARTITION q4_1999 VALUES LESS THAN (TO_DATE('1-JAN-2000','DD-MON-YYYY'))
(SUBPARTITION q4_1999_northwest VALUES('OR', 'WA'),
SUBPARTITION q4_1999_southwest VALUES('AZ', 'UT', 'NM'),
SUBPARTITION q4_1999_northeast VALUES('NY', 'VM', 'NJ'),
SUBPARTITION q4_1999_southeast VALUES('FL', 'GA'),
SUBPARTITION q4_1999_northcentral VALUES ('SD', 'WI'),
SUBPARTITION q4_1999_southcentral VALUES ('NM', 'TX')));
You can create subpartitions in a composite partitioned table using a subpartition
template. A subpartition template simplifies the specification of subpartitions by not
requiring that a subpartition descriptor be specified for every partition in the table.
Instead, you describe subpartitions only once in a template, then apply that
subpartition template to every partition in the table. The following statement
illustrates an example where you can choose the subpartition name and tablespace
locations:
CREATE TABLE quarterly_regional_sales
(deptno NUMBER, item_no VARCHAR2(20),
txn_date DATE, txn_amount NUMBER, state VARCHAR2(2))
PARTITION BY RANGE (txn_date)
SUBPARTITION BY LIST (state)
SUBPARTITION TEMPLATE(
SUBPARTITION northwest VALUES ('OR', 'WA') TABLESPACE ts1,
SUBPARTITION southwest VALUES ('AZ', 'UT', 'NM') TABLESPACE ts2,
SUBPARTITION northeast VALUES ('NY', 'VM', 'NJ') TABLESPACE ts3,
SUBPARTITION southeast VALUES ('FL', 'GA') TABLESPACE ts4,
SUBPARTITION northcentral VALUES ('SD', 'WI') TABLESPACE ts5,
SUBPARTITION southcentral VALUES ('NM', 'TX') TABLESPACE ts6)
(
PARTITION q1_1999 VALUES LESS THAN(TO_DATE('1-APR-1999','DD-MON-YYYY')),
PARTITION q2_1999 VALUES LESS THAN(TO_DATE('1-JUL-1999','DD-MON-YYYY')),
PARTITION q3_1999 VALUES LESS THAN(TO_DATE('1-OCT-1999','DD-MON-YYYY')),
PARTITION q4_1999 VALUES LESS THAN(TO_DATE('1-JAN-2000','DD-MON-YYYY')));

Related Posts Plugin for WordPress, Blogger...

Please Share

Twitter Delicious Facebook Digg Stumbleupon Favorites More

 
Follow TutorialBlogs
Share on Facebook
Tweet this Blog
Add Blog to Technorati
Home