Informatica Tutorials

Big Data Analytics

Types of Query Rewrite

Queries that have aggregates that require computations over a large number of rows
or joins between very large tables can be expensive and thus can take a long time to
return the results. Query rewrite transparently rewrites such queries using
materialized views that have pre-computed results, so that the queries can be
answered almost instantaneously. These materialized views can be broadly
categorized into two groups, namely materialized aggregate views and materialized
join views.

Materialized aggregate views are tables that have pre-computed aggregate
values for columns from original tables. Similarly, materialized join views are tables that have pre-computed joins between columns from original tables. Query rewrite transforms an incoming query to fetch the results from materialized view columns.

Since these columns contain already pre-computed results, the incoming query can be
answered almost instantaneously.

This section discusses the following methods that can be used to rewrite a query:

■ Text Match Rewrite
■ Join Back
■ Aggregate Computability
■ Aggregate Rollup
■ Rollup Using a Dimension
■ When Materialized Views Have Only a Subset of Data
■ Partition Change Tracking (PCT) Rewrite
■ Multiple Materialized Views



Text Match Rewrite
The query rewrite engine always initially tries to compare the text of incoming query
with the text of the definition of any potential materialized views to rewrite the query.
This is because the overhead of doing a simple text comparison is usually negligible
comparing to the cost of doing a complex analysis required for the general rewrite.
The query rewrite engine uses two text match methods, full text match rewrite and
partial text match rewrite. In full text match the entire text of a query is compared
against the entire text of a materialized view definition (that is, the entire SELECT
expression), ignoring the white space during text comparison. For example, assume
that we have the following materialized view, sum_sales_pscat_month_city_mv:

CREATE MATERIALIZED VIEW sum_sales_pscat_month_city_mv
ENABLE QUERY REWRITE AS
SELECT p.prod_subcategory, t.calendar_month_desc, c.cust_city,
SUM(s.amount_sold) AS sum_amount_sold,
COUNT(s.amount_sold) AS count_amount_sold
FROM sales s, products p, times t, customers c
WHERE s.time_id=t.time_id
AND s.prod_id=p.prod_id
AND s.cust_id=c.cust_id
GROUP BY p.prod_subcategory, t.calendar_month_desc, c.cust_city;

Consider the following query:

SELECT p.prod_subcategory, t.calendar_month_desc, c.cust_city,
SUM(s.amount_sold) AS sum_amount_sold,
COUNT(s.amount_sold) AS count_amount_sold
FROM sales s, products p, times t, customers c
WHERE s.time_id=t.time_id
AND s.prod_id=p.prod_id
AND s.cust_id=c.cust_id
GROUP BY p.prod_subcategory, t.calendar_month_desc, c.cust_city;

This query matches sum_sales_pscat_month_city_mv (white space excluded)
and is rewritten as:

SELECT mv.prod_subcategory, mv.calendar_month_desc, mv.cust_city,
mv.sum_amount_sold, mv.count_amount_sold
FROM sum_sales_pscat_month_city_mv;

When full text match fails, the optimizer then attempts a partial text match. In this
method, the text starting from the FROM clause of a query is compared against the text starting with the FROM clause of a materialized view definition. Therefore, the
following query can be rewritten:

SELECT p.prod_subcategory, t.calendar_month_desc, c.cust_city,
AVG(s.amount_sold)
FROM sales s, products p, times t, customers c
WHERE s.time_id=t.time_id AND s.prod_id=p.prod_id
AND s.cust_id=c.cust_id
GROUP BY p.prod_subcategory, t.calendar_month_desc, c.cust_city;

This query is rewritten as:

SELECT mv.prod_subcategory, mv.calendar_month_desc, mv.cust_city,
mv.sum_amount_sold/mv.count_amount_sold
FROM sum_sales_pscat_month_city_mv mv;

Note that, under the partial text match rewrite method, the average of sales aggregate required by the query is computed using the sum of sales and count of sales
aggregates stored in the materialized view.

When neither text match succeeds, the optimizer uses a general query rewrite method.
Text match rewrite can distinguish contexts where the difference between uppercase
and lowercase is significant and where it is not. For example, the following statements are equivalent:

SELECT X, 'aBc' FROM Y
Select x, 'aBc' From y

Join Back

If some column data requested by a query cannot be obtained from a materialized
view, the optimizer further determines if it can be obtained based on a data
relationship called a functional dependency. When the data in a column can determine
data in another column, such a relationship is called a functional dependency or
functional determinance. For example, if a table contains a primary key column called
prod_id and another column called prod_name, then, given a prod_id value, it is
possible to look up the corresponding prod_name. The opposite is not true, which
means a prod_name value need not relate to a unique prod_id.

When the column data required by a query is not available from a materialized view,
such column data can still be obtained by joining the materialized view back to the
table that contains required column data provided the materialized view contains a
key that functionally determines the required column data. For example, consider the
following query:

SELECT p.prod_category, t.week_ending_day, SUM(s.amount_sold)
FROM sales s, products p, times t
WHERE s.time_id=t.time_id AND s.prod_id=p.prod_id AND p.prod_category='CD'
GROUP BY p.prod_category, t.week_ending_day;

The materialized view sum_sales_prod_week_mv contains p.prod_id, but not
p.prod_category. However, you can join sum_sales_prod_week_mv back to
products to retrieve prod_category because prod_id functionally determines
prod_category. The optimizer rewrites this query using sum_sales_prod_week_
mv as follows:

SELECT p.prod_category, mv.week_ending_day, SUM(mv.sum_amount_sold)
FROM sum_sales_prod_week_mv mv, products p
WHERE mv.prod_id=p.prod_id AND p.prod_category='CD'
GROUP BY p.prod_category, mv.week_ending_day;

Here the products table is called a joinback table because it was originally joined in the materialized view but joined again in the rewritten query.

You can declare functional dependency in two ways:

■ Using the primary key constraint (as shown in the previous example)
■ Using the DETERMINES clause of a dimension

The DETERMINES clause of a dimension definition might be the only way you could
declare functional dependency when the column that determines another column
cannot be a primary key. For example, the products table is a denormalized
dimension table that has columns prod_id, prod_name, and prod_subcategory
that functionally determines prod_subcat_desc and prod_category that
determines prod_cat_desc.

The first functional dependency can be established by declaring prod_id as the
primary key, but not the second functional dependency because the prod_
subcategory column contains duplicate values. In this situation, you can use the
DETERMINES clause of a dimension to declare the second functional dependency.

The following dimension definition illustrates how functional dependencies are
declared:

CREATE DIMENSION products_dim
LEVEL product IS (products.prod_id)
LEVEL subcategory IS (products.prod_subcategory)
LEVEL category IS (products.prod_category)
HIERARCHY prod_rollup (
product CHILD OF
subcategory CHILD OF
category
)
ATTRIBUTE product DETERMINES products.prod_name

ATTRIBUTE product DETERMINES products.prod_desc
ATTRIBUTE subcategory DETERMINES products.prod_subcat_desc
ATTRIBUTE category DETERMINES products.prod_cat_desc;

The hierarchy prod_rollup declares hierarchical relationships that are also 1:n
functional dependencies. The 1:1 functional dependencies are declared using the
DETERMINES clause, as seen when prod_subcategory functionally determines
prod_subcat_desc.

If the following materialized view is created:

CREATE MATERIALIZED VIEW sum_sales_pscat_week_mv
ENABLE QUERY REWRITE AS
SELECT p.prod_subcategory, t.week_ending_day,
SUM(s.amount_sold) AS sum_amount_sole
FROM sales s, products p, times t
WHERE s.time_id = t.time_id AND s.prod_id = p.prod_id
GROUP BY p.prod_subcategory, t.week_ending_day;

Then consider the following query:

SELECT p.prod_subcategory_desc, t.week_ending_day, SUM(s.amount_sold)
FROM sales s, products p, times t
WHERE s.time_id=t.time_id AND s.prod_id=p.prod_id
AND p.prod_subcat_desc LIKE '%Men'
GROUP BY p.prod_subcat_desc, t.week_ending_day;

This can be rewritten by joining sum_sales_pscat_week_mv to the products table
so that prod_subcat_desc is available to evaluate the predicate. However, the join
will be based on the prod_subcategory column, which is not a primary key in the
products table; therefore, it allows duplicates. This is accomplished by using an
inline view that selects distinct values and this view is joined to the materialized view as shown in the rewritten query.

SELECT iv.prod_subcat_desc, mv.week_ending_day, SUM(mv.sum_amount_sold)
FROM sum_sales_pscat_week_mv mv,
(SELECT DISTINCT prod_subcategory, prod_subcat_desc
FROM products) iv
WHERE mv.prod_subcategory=iv.prod_subcategory
AND iv.prod_subcat_desc LIKE '%Men'
GROUP BY iv.prod_subcat_desc, mv.week_ending_day;

This type of rewrite is possible because prod_subcategory functionally determines
prod_subcategory_desc as declared in the dimension.


Aggregate Computability
Query rewrite can also occur when the optimizer determines if the aggregates
requested by a query can be derived or computed from one or more aggregates stored
in a materialized view. For example, if a query requests AVG(X) and a materialized
view contains SUM(X) and COUNT(X), then AVG(X) can be computed as
SUM(X)/COUNT(X).
In addition, if it is determined that the rollup of aggregates stored in a materialized view is required, then, if it is possible, query rewrite also rolls up each aggregate requested by the query using aggregates in the materialized view.
For example, SUM(sales) at the city level can be rolled up to SUM(sales) at the
state level by summing all SUM(sales) aggregates in a group with the same state
value. However, AVG(sales) cannot be rolled up to a coarser level unless COUNT(sales) or SUM(sales) is also available in the materialized view. Similarly,
VARIANCE(sales) or STDDEV(sales) cannot be rolled up unless both
COUNT(sales) and SUM(sales) are also available in the materialized view. For
example, consider the following query:

ALTER TABLE times MODIFY CONSTRAINT time_pk RELY;
ALTER TABLE customers MODIFY CONSTRAINT customers_pk RELY;
ALTER TABLE sales MODIFY CONSTRAINT sales_time_pk RELY;
ALTER TABLE sales MODIFY CONSTRAINT sales_customer_fk RELY;
SELECT p.prod_subcategory, AVG(s.amount_sold) AS avg_sales
FROM sales s, products p WHERE s.prod_id = p.prod_id


This statement can be rewritten with materialized view sum_sales_pscat_month_
city_mv provided the join between sales and times and sales and customers
are lossless and non-duplicating. Further, the query groups by prod_subcategory
whereas the materialized view groups by prod_subcategory, calendar_month_
desc and cust_city, which means the aggregates stored in the materialized view
will have to be rolled up. The optimizer rewrites the query as the following:

SELECT mv.prod_subcategory, SUM(mv.sum_amount_sold)/COUNT(mv.count_amount_sold)
AS avg_sales
FROM sum_sales_pscat_month_city_mv mv
GROUP BY mv.prod_subcategory;

The argument of an aggregate such as SUM can be an arithmetic expression such as
A+B. The optimizer tries to match an aggregate SUM(A+B) in a query with an
aggregate SUM(A+B) or SUM(B+A) stored in a materialized view. In other words,
expression equivalence is used when matching the argument of an aggregate in a
query with the argument of a similar aggregate in a materialized view. To accomplish
this, Oracle converts the aggregate argument expression into a canonical form such
that two different but equivalent expressions convert into the same canonical form.
For example, A*(B-C), A*B-C*A, (B-C)*A, and -A*C+A*B all convert into the same
canonical form and, therefore, they are successfully matched.


Aggregate Rollup
If the grouping of data requested by a query is at a coarser level than the grouping of data stored in a materialized view, the optimizer can still use the materialized view to rewrite the query. For example, the materialized view sum_sales_pscat_week_mv
groups by prod_subcategory and week_ending_day. This query groups by
prod_subcategory, a coarser grouping granularity:

ALTER TABLE times MODIFY CONSTRAINT time_pk RELY;
ALTER TABLE sales MODIFY CONSTRAINT sales_time_fk RELY;
SELECT p.prod_subcategory, SUM(s.amount_sold) AS sum_amount
FROM sales s, products p
WHERE s.prod_id=p.prod_id
GROUP BY p.prod_subcategory;

Therefore, the optimizer will rewrite this query as:

SELECT mv.prod_subcategory, SUM(mv.sum_amount_sold)
FROM sum_sales_pscat_week_mv mv
GROUP BY mv.prod_subcategory;

Related Posts Plugin for WordPress, Blogger...

Please Share

Twitter Delicious Facebook Digg Stumbleupon Favorites More

 
Follow TutorialBlogs
Share on Facebook
Tweet this Blog
Add Blog to Technorati
Home