Informatica Tutorials

Big Data Analytics

Partial Partition-Wise Joins

Oracle can perform partial partition-wise joins only in parallel. Unlike full
partition-wise joins, partial partition-wise joins require you to partition only one table on the join key, not both tables. The partitioned table is referred to as the reference table. The other table may or may not be partitioned. Partial partition-wise joins are more common than full partition-wise joins.

To execute a partial partition-wise join, Oracle dynamically repartitions the other table based on the partitioning of the reference table. Once the other table is repartitioned,the execution is similar to a full partition-wise join.
The performance advantage that partial partition-wise joins have over joins in
non-partitioned tables is that the reference table is not moved during the join
operation. Parallel joins between non-partitioned tables require both input tables to be redistributed on the join key. This redistribution operation involves exchanging rows between parallel execution servers. This is a CPU-intensive operation that can lead to excessive interconnect traffic in Oracle Real Application Clusters environments.

Partitioning large tables on a join key, either a foreign or primary key, prevents this redistribution every time the table is joined on that key. Of course, if you choose a foreign key to partition the table, which is the most common scenario, select a foreignkey that is involved in many queries.

To illustrate partial partition-wise joins, consider the previous sales/customer
example. Assume that sales is not partitioned or is partitioned on a column other than s_customerid. Because sales is often joined with customers on customerid,
and because this join dominates our application workload, partition sales on s_
customerid to enable partial partition-wise join every time customers and sales
are joined. As in full partition-wise join, you have several alternatives:


Partial Partition-Wise Joins: Hash-List
The simplest method to enable a partial partition-wise join is to partition sales by
hash on s_customerid. The number of partitions determines the maximum degree
of parallelism, because the partition is the smallest granule of parallelism for partial partition-wise join operations.

The parallel execution of a partial partition-wise join is illustrated in Figure 5–3, which assumes that both the degree of parallelism and the number of partitions of sales are 16. The execution involves two sets of query servers: one set, labeled set 1 in following Figure, scans the customers table in parallel. The granule of parallelism for the scan operation is a range of blocks.

Rows from customers that are selected by the first set, in this case all rows, are
redistributed to the second set of query servers by hashing customerid. For example,
all rows in customers that could have matching rows in partition P1 of sales are
sent to query server 1 in the second set. Rows received by the second set of query
servers are joined with the rows from the corresponding partitions in sales. Query
server number 1 in the second set joins all customers rows that it receives with
partition P1 of sales.


Considerations for full partition-wise joins also apply to partial partition-wise joins:
■ The degree of parallelism does not need to equal the number of partitions. In
Figure, the query executes with two sets of 16 query servers. In this case,
Oracle assigns 1 partition to each query server of the second set. Again, the
number of partitions should always be a multiple of the degree of parallelism.
■ In Oracle Real Application Clusters environments on shared-nothing platforms
(MPPs), each hash partition of sales should preferably have affinity to only one
node in order to avoid remote I/Os. Also, spread partitions over all nodes to avoid
bottlenecks and use all CPU resources available on the system. A node can host
multiple partitions when there are more partitions than nodes.


Partial Partition-Wise Joins: Composite
As with full partition-wise joins, the prime partitioning method for the sales table is to use the range method on column s_salesdate. This is because sales is a typical
example of a table that stores historical data. To enable a partial partition-wise join while preserving this range partitioning, subpartition sales by hash on column s_customerid using 16 subpartitions for each partition. Pruning and partial
partition-wise joins can be used together if a query joins customers and sales and if
the query has a selection predicate on s_salesdate.

When sales is composite, the granule of parallelism for a partial partition-wise join is a hash partition and not a subpartition. Again, the number of hash partitions should be a multiple of the degree of parallelism. Also, on an MPP system, ensure that each hash partition has affinity to a single node. In the previous example, the eight subpartitions composing a hash partition should have affinity to the same node.


Partial Partition-Wise Joins: Range
Finally, you can use range partitioning on s_customerid to enable a partial
partition-wise join. This works similarly to the hash method, but a side effect of range partitioning is that the resulting data distribution could be skewed if the size of the partitions differs. Moreover, this method is more complex to implement because it requires prior knowledge of the values of the partitioning column that is also a join key.

Related Posts Plugin for WordPress, Blogger...

Please Share

Twitter Delicious Facebook Digg Stumbleupon Favorites More

 
Follow TutorialBlogs
Share on Facebook
Tweet this Blog
Add Blog to Technorati
Home