Informatica Tutorials

Big Data Analytics

Partition-Wise Joins

Partition-wise joins reduce query response time by minimizing the amount of data
exchanged among parallel execution servers when joins execute in parallel. This
significantly reduces response time and improves the use of both CPU and memory
resources. In Oracle Real Application Clusters (RAC) environments, partition-wise
joins also avoid or at least limit the data traffic over the interconnect, which is the key to achieving good scalability for massive join operations.

Partition-wise joins can be full or partial. Oracle decides which type of join to us

Full Partition-Wise Joins
A full partition-wise join divides a large join into smaller joins between a pair of
partitions from the two joined tables. To use this feature, you must equipartition both tables on their join keys. For example, consider a large join between a sales table and a customer table on the column customerid. The query "find the records of all customers who bought more than 100 articles in Quarter 3 of 1999" is a typical
example of a SQL statement performing such a join. The following is an example of
this:

SELECT c.cust_last_name, COUNT(*)
FROM sales s, customers c
WHERE s.cust_id = c.cust_id AND
s.time_id BETWEEN TO_DATE('01-JUL-1999', 'DD-MON-YYYY') AND
(TO_DATE('01-OCT-1999', 'DD-MON-YYYY'))
GROUP BY c.cust_last_name HAVING COUNT(*) > 100;

This large join is typical in data warehousing environments. The entire customer table is joined with one quarter of the sales data. In large data warehouse applications, this might mean joining millions of rows. The join method to use in that case is obviously a hash join. You can reduce the processing time for this hash join even more if both tables are equipartitioned on the customerid column. This enables a full partition-wise join.

When you execute a full partition-wise join in parallel, the granule of parallelism is a partition. As a result, the degree of parallelism is limited to the number of partitions.
For example, you require at least 16 partitions to set the degree of parallelism of the query to 16.
You can use various partitioning methods to equipartition both tables on the column
customerid with 16 partitions. These methods are described in these subsections


Full Partition-Wise Joins: Hash-Hash
This is the simplest method: the customers and sales tables are both partitioned by
hash into 16 partitions, on the s_customerid and c_customerid columns.
This partitioning method enables full partition-wise join when the tables are joined on c_customerid and s_customerid, both representing the same customer identification
number. Because you are using the same hash function to distribute the same
information (customer ID) into the same number of hash partitions, you can join the
equivalent partitions. They are storing the same values.

In serial, this join is performed between pairs of matching hash partitions, one at a
time. When one partition pair has been joined, the join of another partition pair begins.
The join completes when the 16 partition pairs have been processed.

Parallel execution of a full partition-wise join is a straightforward parallelization of the serial execution. Instead of joining one partition pair at a time, 16 partition pairs are joined in parallel by the 16 query servers.


Following figure illustrates the parallel execution of a full partition-wise join.



Full Partition-Wise Joins: (Composite-Hash)-Hash
This method is a variation of the hash-hash method. The sales table is a typical
example of a table storing historical data. For all the reasons mentioned under the
heading When to Use Range Partitioning on page 5-5 range is the logical initial
partitioning method.
For example, assume you want to partition the sales table into eight partitions by
range on the column s_salesdate. Also assume you have two years and that each
partition represents a quarter. Instead of using range partitioning, you can use
composite partitioning to enable a full partition-wise join while preserving the
partitioning on s_salesdate. Partition the sales table by range on s_salesdate
and then subpartition each partition by hash on s_customerid using 16
subpartitions for each partition, for a total of 128 subpartitions. The customers table can still use hash partitioning with 16 partitions.

When you use the method just described, a full partition-wise join works similarly to
the one created by the hash-hash method. The join is still divided into 16 smaller joins between hash partition pairs from both tables. The difference is that now each hash partition in the sales table is composed of a set of 8 subpartitions, one from each range partition.

Hash partitions are implicit in a composite table. However, Oracle does not record
them in the data dictionary, and you cannot manipulate them with DDL commands as
you can range partitions.

Following figure shows Range and Hash Partitions of a Composite Table



(Composite-Hash)-Hash partitioning is effective because it lets you combine pruning
(on s_salesdate) with a full partition-wise join (on customerid). In the previous
example query, pruning is achieved by scanning only the subpartitions corresponding
to Q3 of 1999, in other words, row number 3

All characteristics of the hash-hash partition-wise join apply to the composite-hash
partition-wise join. In particular, for this example, these two points are common to
both methods:
■ The degree of parallelism for this full partition-wise join cannot exceed 16. Even
though the sales table has 128 subpartitions, it has only 16 hash partitions.
■ The rules for data placement on MPP systems apply here. The only difference is
that a hash partition is now a collection of subpartitions. You must ensure that all
these subpartitions are placed on the same node as the matching hash partition
from the other table


Full Partition-Wise Joins: (Composite-List)-List
The (Composite-List)-List method resembles that for (Composite-Hash)-Hash
partition-wise joins.

Full Partition-Wise Joins: (Composite-Composite (Hash/List Dimension)
If needed, you can also partition the customer table by the composite method. For
example, you partition it by range on a postal code column to enable pruning based on
postal code. You then subpartition it by hash on customerid using the same number
of partitions (16) to enable a partition-wise join on the hash dimension.


Full Partition-Wise Joins: Range-Range and List-List
You can also join range partitioned tables with range partitioned tables and list
partitioned tables with list partitioned tables in a partition-wise manner, but this is relatively uncommon. This is more complex to implement because you must know the
distribution of the data before performing the join. Furthermore, if you do not
correctly identify the partition bounds so that you have partitions of equal size, data skew during the execution may result.
The basic principle for using range-range and list-list is the same as for using
hash-hash: you must equipartition both tables. This means that the number of
partitions must be the same and the partition bounds must be identical. For example,
assume that you know in advance that you have 10 million customers, and that the
values for customerid vary from 1 to 10,000,000. In other words, you have 10 million
possible different values. To create 16 partitions, you can range partition both tables, sales on s_customerid and customers on c_customerid. You should define
partition bounds for both tables in order to generate partitions of the same size. In this example, partition bounds should be defined as 625001, 1250001, 1875001, ... 10000001, so that each partition contains 625000 rows.

Full Partition-Wise Joins: Range-Composite, Composite-Composite (Range
Dimension)

Finally, you can also subpartition one or both tables on another column. Therefore, the range-composite and composite-composite methods on the range dimension are also
valid for enabling a full partition-wise join on the range dimension.

Related Posts Plugin for WordPress, Blogger...

Please Share

Twitter Delicious Facebook Digg Stumbleupon Favorites More

 
Follow TutorialBlogs
Share on Facebook
Tweet this Blog
Add Blog to Technorati
Home