Queries that have aggregates that require computations over a large number of rows

or joins between very large tables can be expensive and thus can take a long time to

return the results. Query rewrite transparently rewrites such queries using

materialized views that have pre-computed results, so that the queries can be

answered almost instantaneously. These materialized views can be broadly

categorized into two groups, namely materialized aggregate views and materialized

join views.

Materialized aggregate views are tables that have pre-computed aggregate

values for columns from original tables. Similarly, materialized join views are tables that have pre-computed joins between columns from original tables. Query rewrite transforms an incoming query to fetch the results from materialized view columns.

Since these columns contain already pre-computed results, the incoming query can be

answered almost instantaneously.

This section discusses the following methods that can be used to rewrite a query:

■ Text Match Rewrite

■ Join Back

■ Aggregate Computability

■ Aggregate Rollup

■ Rollup Using a Dimension

■ When Materialized Views Have Only a Subset of Data

■ Partition Change Tracking (PCT) Rewrite

■ Multiple Materialized Views

Text Match Rewrite

The query rewrite engine always initially tries to compare the text of incoming query

with the text of the definition of any potential materialized views to rewrite the query.

This is because the overhead of doing a simple text comparison is usually negligible

comparing to the cost of doing a complex analysis required for the general rewrite.

The query rewrite engine uses two text match methods, full text match rewrite and

partial text match rewrite. In full text match the entire text of a query is compared

against the entire text of a materialized view definition (that is, the entire SELECT

expression), ignoring the white space during text comparison. For example, assume

that we have the following materialized view, sum_sales_pscat_month_city_mv:

CREATE MATERIALIZED VIEW sum_sales_pscat_month_city_mv

ENABLE QUERY REWRITE AS

SELECT p.prod_subcategory, t.calendar_month_desc, c.cust_city,

SUM(s.amount_sold) AS sum_amount_sold,

COUNT(s.amount_sold) AS count_amount_sold

FROM sales s, products p, times t, customers c

WHERE s.time_id=t.time_id

AND s.prod_id=p.prod_id

AND s.cust_id=c.cust_id

GROUP BY p.prod_subcategory, t.calendar_month_desc, c.cust_city;

Consider the following query:

SELECT p.prod_subcategory, t.calendar_month_desc, c.cust_city,

SUM(s.amount_sold) AS sum_amount_sold,

COUNT(s.amount_sold) AS count_amount_sold

FROM sales s, products p, times t, customers c

WHERE s.time_id=t.time_id

AND s.prod_id=p.prod_id

AND s.cust_id=c.cust_id

GROUP BY p.prod_subcategory, t.calendar_month_desc, c.cust_city;

This query matches sum_sales_pscat_month_city_mv (white space excluded)

and is rewritten as:

SELECT mv.prod_subcategory, mv.calendar_month_desc, mv.cust_city,

mv.sum_amount_sold, mv.count_amount_sold

FROM sum_sales_pscat_month_city_mv;

When full text match fails, the optimizer then attempts a partial text match. In this

method, the text starting from the FROM clause of a query is compared against the text starting with the FROM clause of a materialized view definition. Therefore, the

following query can be rewritten:

SELECT p.prod_subcategory, t.calendar_month_desc, c.cust_city,

AVG(s.amount_sold)

FROM sales s, products p, times t, customers c

WHERE s.time_id=t.time_id AND s.prod_id=p.prod_id

AND s.cust_id=c.cust_id

GROUP BY p.prod_subcategory, t.calendar_month_desc, c.cust_city;

This query is rewritten as:

SELECT mv.prod_subcategory, mv.calendar_month_desc, mv.cust_city,

mv.sum_amount_sold/mv.count_amount_sold

FROM sum_sales_pscat_month_city_mv mv;

Note that, under the partial text match rewrite method, the average of sales aggregate required by the query is computed using the sum of sales and count of sales

aggregates stored in the materialized view.

When neither text match succeeds, the optimizer uses a general query rewrite method.

Text match rewrite can distinguish contexts where the difference between uppercase

and lowercase is significant and where it is not. For example, the following statements are equivalent:

SELECT X, 'aBc' FROM Y

Select x, 'aBc' From y

Join Back

If some column data requested by a query cannot be obtained from a materialized

view, the optimizer further determines if it can be obtained based on a data

relationship called a functional dependency. When the data in a column can determine

data in another column, such a relationship is called a functional dependency or

functional determinance. For example, if a table contains a primary key column called

prod_id and another column called prod_name, then, given a prod_id value, it is

possible to look up the corresponding prod_name. The opposite is not true, which

means a prod_name value need not relate to a unique prod_id.

When the column data required by a query is not available from a materialized view,

such column data can still be obtained by joining the materialized view back to the

table that contains required column data provided the materialized view contains a

key that functionally determines the required column data. For example, consider the

following query:

SELECT p.prod_category, t.week_ending_day, SUM(s.amount_sold)

FROM sales s, products p, times t

WHERE s.time_id=t.time_id AND s.prod_id=p.prod_id AND p.prod_category='CD'

GROUP BY p.prod_category, t.week_ending_day;

The materialized view sum_sales_prod_week_mv contains p.prod_id, but not

p.prod_category. However, you can join sum_sales_prod_week_mv back to

products to retrieve prod_category because prod_id functionally determines

prod_category. The optimizer rewrites this query using sum_sales_prod_week_

mv as follows:

SELECT p.prod_category, mv.week_ending_day, SUM(mv.sum_amount_sold)

FROM sum_sales_prod_week_mv mv, products p

WHERE mv.prod_id=p.prod_id AND p.prod_category='CD'

GROUP BY p.prod_category, mv.week_ending_day;

Here the products table is called a joinback table because it was originally joined in the materialized view but joined again in the rewritten query.

You can declare functional dependency in two ways:

■ Using the primary key constraint (as shown in the previous example)

■ Using the DETERMINES clause of a dimension

The DETERMINES clause of a dimension definition might be the only way you could

declare functional dependency when the column that determines another column

cannot be a primary key. For example, the products table is a denormalized

dimension table that has columns prod_id, prod_name, and prod_subcategory

that functionally determines prod_subcat_desc and prod_category that

determines prod_cat_desc.

The first functional dependency can be established by declaring prod_id as the

primary key, but not the second functional dependency because the prod_

subcategory column contains duplicate values. In this situation, you can use the

DETERMINES clause of a dimension to declare the second functional dependency.

The following dimension definition illustrates how functional dependencies are

declared:

CREATE DIMENSION products_dim

LEVEL product IS (products.prod_id)

LEVEL subcategory IS (products.prod_subcategory)

LEVEL category IS (products.prod_category)

HIERARCHY prod_rollup (

product CHILD OF

subcategory CHILD OF

category

)

ATTRIBUTE product DETERMINES products.prod_name

ATTRIBUTE product DETERMINES products.prod_desc

ATTRIBUTE subcategory DETERMINES products.prod_subcat_desc

ATTRIBUTE category DETERMINES products.prod_cat_desc;

The hierarchy prod_rollup declares hierarchical relationships that are also 1:n

functional dependencies. The 1:1 functional dependencies are declared using the

DETERMINES clause, as seen when prod_subcategory functionally determines

prod_subcat_desc.

If the following materialized view is created:

CREATE MATERIALIZED VIEW sum_sales_pscat_week_mv

ENABLE QUERY REWRITE AS

SELECT p.prod_subcategory, t.week_ending_day,

SUM(s.amount_sold) AS sum_amount_sole

FROM sales s, products p, times t

WHERE s.time_id = t.time_id AND s.prod_id = p.prod_id

GROUP BY p.prod_subcategory, t.week_ending_day;

Then consider the following query:

SELECT p.prod_subcategory_desc, t.week_ending_day, SUM(s.amount_sold)

FROM sales s, products p, times t

WHERE s.time_id=t.time_id AND s.prod_id=p.prod_id

AND p.prod_subcat_desc LIKE '%Men'

GROUP BY p.prod_subcat_desc, t.week_ending_day;

This can be rewritten by joining sum_sales_pscat_week_mv to the products table

so that prod_subcat_desc is available to evaluate the predicate. However, the join

will be based on the prod_subcategory column, which is not a primary key in the

products table; therefore, it allows duplicates. This is accomplished by using an

inline view that selects distinct values and this view is joined to the materialized view as shown in the rewritten query.

SELECT iv.prod_subcat_desc, mv.week_ending_day, SUM(mv.sum_amount_sold)

FROM sum_sales_pscat_week_mv mv,

(SELECT DISTINCT prod_subcategory, prod_subcat_desc

FROM products) iv

WHERE mv.prod_subcategory=iv.prod_subcategory

AND iv.prod_subcat_desc LIKE '%Men'

GROUP BY iv.prod_subcat_desc, mv.week_ending_day;

This type of rewrite is possible because prod_subcategory functionally determines

prod_subcategory_desc as declared in the dimension.

Aggregate Computability

Query rewrite can also occur when the optimizer determines if the aggregates

requested by a query can be derived or computed from one or more aggregates stored

in a materialized view. For example, if a query requests AVG(X) and a materialized

view contains SUM(X) and COUNT(X), then AVG(X) can be computed as

SUM(X)/COUNT(X).

In addition, if it is determined that the rollup of aggregates stored in a materialized view is required, then, if it is possible, query rewrite also rolls up each aggregate requested by the query using aggregates in the materialized view.

For example, SUM(sales) at the city level can be rolled up to SUM(sales) at the

state level by summing all SUM(sales) aggregates in a group with the same state

value. However, AVG(sales) cannot be rolled up to a coarser level unless COUNT(sales) or SUM(sales) is also available in the materialized view. Similarly,

VARIANCE(sales) or STDDEV(sales) cannot be rolled up unless both

COUNT(sales) and SUM(sales) are also available in the materialized view. For

example, consider the following query:

ALTER TABLE times MODIFY CONSTRAINT time_pk RELY;

ALTER TABLE customers MODIFY CONSTRAINT customers_pk RELY;

ALTER TABLE sales MODIFY CONSTRAINT sales_time_pk RELY;

ALTER TABLE sales MODIFY CONSTRAINT sales_customer_fk RELY;

SELECT p.prod_subcategory, AVG(s.amount_sold) AS avg_sales

FROM sales s, products p WHERE s.prod_id = p.prod_id

This statement can be rewritten with materialized view sum_sales_pscat_month_

city_mv provided the join between sales and times and sales and customers

are lossless and non-duplicating. Further, the query groups by prod_subcategory

whereas the materialized view groups by prod_subcategory, calendar_month_

desc and cust_city, which means the aggregates stored in the materialized view

will have to be rolled up. The optimizer rewrites the query as the following:

SELECT mv.prod_subcategory, SUM(mv.sum_amount_sold)/COUNT(mv.count_amount_sold)

AS avg_sales

FROM sum_sales_pscat_month_city_mv mv

GROUP BY mv.prod_subcategory;

The argument of an aggregate such as SUM can be an arithmetic expression such as

A+B. The optimizer tries to match an aggregate SUM(A+B) in a query with an

aggregate SUM(A+B) or SUM(B+A) stored in a materialized view. In other words,

expression equivalence is used when matching the argument of an aggregate in a

query with the argument of a similar aggregate in a materialized view. To accomplish

this, Oracle converts the aggregate argument expression into a canonical form such

that two different but equivalent expressions convert into the same canonical form.

For example, A*(B-C), A*B-C*A, (B-C)*A, and -A*C+A*B all convert into the same

canonical form and, therefore, they are successfully matched.

Aggregate Rollup

If the grouping of data requested by a query is at a coarser level than the grouping of data stored in a materialized view, the optimizer can still use the materialized view to rewrite the query. For example, the materialized view sum_sales_pscat_week_mv

groups by prod_subcategory and week_ending_day. This query groups by

prod_subcategory, a coarser grouping granularity:

ALTER TABLE times MODIFY CONSTRAINT time_pk RELY;

ALTER TABLE sales MODIFY CONSTRAINT sales_time_fk RELY;

SELECT p.prod_subcategory, SUM(s.amount_sold) AS sum_amount

FROM sales s, products p

WHERE s.prod_id=p.prod_id

GROUP BY p.prod_subcategory;

Therefore, the optimizer will rewrite this query as:

SELECT mv.prod_subcategory, SUM(mv.sum_amount_sold)

FROM sum_sales_pscat_week_mv mv

GROUP BY mv.prod_subcategory;

### Types of Query Rewrite

10:38 AM
divjeev