Informatica Tutorials

Big Data Analytics

Performance Considerations for Parallel Partition-Wise Joins

The optimizer weighs the advantages and disadvantages when deciding whether or
not to use partition-wise joins.
■ In range partitioning where partition sizes differ, data skew increases response
time; some parallel execution servers take longer than others to finish their joins.
Oracle recommends the use of hash (sub)partitioning to enable partition-wise joins
because hash partitioning, if the number of partitions is a power of two, limits the
risk of skew.
■ The number of partitions used for partition-wise joins should, if possible, be a
multiple of the number of query servers. With a degree of parallelism of 16, for
example, you can have 16, 32, or even 64 partitions. If there is an even number of
partitions, some parallel execution servers are used less than others. For example,
if there are 17 evenly distributed partition pairs, only one pair will work on the
last join, while the other pairs will have to wait. This is because, in the beginning
of the execution, each parallel execution server works on a different partition pair.
At the end of this first phase, only one pair is left. Thus, a single parallel execution
server joins this remaining pair while all other parallel execution servers are idle.
■ Sometimes, parallel joins can cause remote I/Os. For example, on Oracle Real
Application Clusters environments running on MPP configurations, if a pair of
matching partitions is not collocated on the same node, a partition-wise join
requires extra internode communication due to remote I/O. This is because Oracle
must transfer at least one partition to the node where the join is performed. In this
case, it is better to explicitly redistribute the data than to use a partition-wise join.

Related Posts Plugin for WordPress, Blogger...

Please Share

Twitter Delicious Facebook Digg Stumbleupon Favorites More

Follow TutorialBlogs
Share on Facebook
Tweet this Blog
Add Blog to Technorati