Informatica Tutorials

Big Data Analytics

Altering Dimensions

You can modify a dimension using the ALTER DIMENSION statement. You can add or
drop a level, hierarchy, or attribute from the dimension using this command.

Referring to the time dimension mentioned earlier, you can remove the
attribute fis_year, drop the hierarchy fis_rollup, or remove the level fiscal_
year. In addition, you can add a new level called f_year as in the following:

ALTER DIMENSION times_dim DROP ATTRIBUTE fis_year;
ALTER DIMENSION times_dim DROP HIERARCHY fis_rollup;
ALTER DIMENSION times_dim DROP LEVEL fis_year;
ALTER DIMENSION times_dim ADD LEVEL f_year IS times.fiscal_year;

If you used the extended_attribute_clause when creating the dimension, you
can drop one attribute column without dropping all attribute columns. This is
illustrated in "Dropping and Creating Attributes with Columns" on page 10-6, which
shows the following statement:

ALTER DIMENSION product_dim
DROP ATTRIBUTE size LEVEL prod_type COLUMN Prod_TypeSize;

If you try to remove anything with further dependencies inside the dimension, Oracle
Database rejects the altering of the dimension. A dimension becomes invalid if you
change any schema object that the dimension is referencing. For example, if the table
on which the dimension is defined is altered, the dimension becomes invalid.

You can modify a dimension by adding a level containing a SKIP WHEN NULL clause,
as in the following statement:
ALTER DIMENSION times_dim
ADD LEVEL f_year IS times.fiscal_year SKIP WHEN NULL;

You cannot, however, modify a level that contains a SKIP WHEN NULL clause. Instead,
you need to drop the level and re-create it.
To check the status of a dimension, view the contents of the column invalid in the
ALL_DIMENSIONS data dictionary view. To revalidate the dimension, use the
COMPILE option as follows:

ALTER DIMENSION times_dim COMPILE;
Dimensions can also be modified or deleted using Oracle Enterprise Manager.

Validating Dimensions

The information of a dimension object is declarative only and not enforced by the
database. If the relationships described by the dimensions are incorrect, incorrect
results could occur. Therefore, you should verify the relationships specified by
CREATE DIMENSION using the DBMS_DIMENSION.VALIDATE_DIMENSION
procedure periodically.

This procedure is easy to use and has only four parameters:

■ dimension: the owner and name.
■ incremental: set to TRUE to check only the new rows for tables of this
dimension.
■ check_nulls: set to TRUE to verify that all columns that are not in the levels
containing a SKIP WHEN NULL clause are not null.
■ statement_id: a user-supplied unique identifier to identify the result of each
run of the procedure.

The following example validates the dimension TIME_FN in the sh schema:
@utldim.sql
EXECUTE DBMS_DIMENSION.VALIDATE_DIMENSION ('SH.TIME_FN', FALSE, TRUE,
'my 1st example');
Before running the VALIDATE_DIMENSION procedure, you need to create a local
table, DIMENSION_EXCEPTIONS, by running the provided script utldim.sql. If the
VALIDATE_DIMENSION procedure encounters any errors, they are placed in this table.
Querying this table will identify the exceptions that were found. The following
illustrates a sample:
SELECT * FROM dimension_exceptions
WHERE statement_id = 'my 1st example';
STATEMENT_ID OWNER TABLE_NAME DIMENSION_NAME RELATIONSHIP BAD_ROWID
------------ ----- ---------- -------------- ------------ ---------
my 1st example SH MONTH TIME_FN FOREIGN KEY AAAAuwAAJAAAARwAAA

However, rather than query this table, it may be better to query the rowid of the
invalid row to retrieve the actual row that has violated the constraint. In this example, the dimension TIME_FN is checking a table called month. It has found a row thatviolates the constraints. Using the rowid, you can see exactly which row in the month table is causing the problem, as in the following:
SELECT * FROM month
WHERE rowid IN (SELECT bad_rowid
FROM dimension_exceptions
WHERE statement_id = 'my 1st example');
MONTH QUARTER FISCAL_QTR YEAR FULL_MONTH_NAME MONTH_NUMB
------ ------- ---------- ---- --------------- ----------
199903 19981 19981 1998 March 3

Viewing Dimensions

Dimensions can be viewed through one of two methods:

■ Using Oracle Enterprise Manager
■ Using the DESCRIBE_DIMENSION Procedure

Using Oracle Enterprise Manager
All of the dimensions that exist in the data warehouse can be viewed using Oracle
Enterprise Manager. Select the Dimension object from within the Schema icon to
display all of the dimensions. Select a specific dimension to graphically display its
hierarchy, levels, and any attributes that have been defined.

Using the DESCRIBE_DIMENSION Procedure

To view the definition of a dimension, use the DESCRIBE_DIMENSION procedure in
the DBMS_DIMENSION package. For example, if a dimension is created in the sh
sample schema with the following statements:

CREATE DIMENSION channels_dim
LEVEL channel IS (channels.channel_id)
LEVEL channel_class IS (channels.channel_class)
HIERARCHY channel_rollup (
channel CHILD OF channel_class)
ATTRIBUTE channel DETERMINES (channel_desc)
ATTRIBUTE channel_class DETERMINES (channel_class);
Execute the DESCRIBE_DIMENSION procedure as follows:
SET SERVEROUTPUT ON FORMAT WRAPPED; --to improve the display of info
EXECUTE DBMS_DIMENSION.DESCRIBE_DIMENSION('SH.CHANNELS_DIM');

You then see the following output results:

EXECUTE DBMS_DIMENSION.DESCRIBE_DIMENSION('SH.CHANNELS_DIM');
DIMENSION SH.CHANNELS_DIM
LEVEL CHANNEL IS SH.CHANNELS.CHANNEL_ID
LEVEL CHANNEL_CLASS IS SH.CHANNELS.CHANNEL_CLASS
HIERARCHY CHANNEL_ROLLUP (
CHANNEL CHILD OF
CHANNEL_CLASS)
ATTRIBUTE CHANNEL LEVEL CHANNEL DETERMINES
SH.CHANNELS.CHANNEL_DESC
ATTRIBUTE CHANNEL_CLASS LEVEL CHANNEL_CLASS DETERMINES
SH.CHANNELS.CHANNEL_CLASS

Using Dimensions with Constraints

Constraints play an important role with dimensions. Full referential integrity is
sometimes enabled in data warehouses, but not always. This is because operational
databases normally have full referential integrity and you can ensure that the data
flowing into your data warehouse never violates the already established integrity
rules.
It is recommended that constraints be enabled and, if validation time is a concern, then the NOVALIDATE clause should be used as follows:

ENABLE NOVALIDATE CONSTRAINT pk_time;

Primary and foreign keys should be implemented also. Referential integrity
constraints and NOT NULL constraints on the fact tables provide information that query rewrite can use to extend the usefulness of materialized views.

In addition, you should use the RELY clause to inform query rewrite that it can rely
upon the constraints being correct as follows:
ALTER TABLE time MODIFY CONSTRAINT pk_time RELY;

Creating Dimensions

Before you can create a dimension object, the dimension tables must exist in the
database possibly containing the dimension data. For example, if you create a
customer dimension, one or more tables must exist that contain the city, state, and
country information. In a star schema data warehouse, these dimension tables already
exist. It is therefore a simple task to identify which ones will be used.
Now you can draw the hierarchies of a dimension as shown in Figure. For
example, city is a child of state (because you can aggregate city-level data up to
state), and country. This hierarchical information will be stored in the database object dimension.
In the case of normalized or partially normalized dimension representation (a
dimension that is stored in more than one table), identify how these tables are joined.
Note whether the joins between the dimension tables can guarantee that each
child-side row joins with one and only one parent-side row. In the case of
denormalized dimensions, determine whether the child-side columns uniquely
determine the parent-side (or attribute) columns. If you use constraints to represent
these relationships, they can be enabled with the NOVALIDATE and RELY clauses if the
relationships represented by the constraints are guaranteed by other means.

You may want the capability to skip NULL levels in a dimension. An example of this is
with Puerto Rico. You may want Puerto Rico to be included within a region of North
America, but not include it within the state category. If you want this capability, use the SKIP WHEN NULL clause. See the sample dimension later in this section for more information and Oracle Database SQL Reference for syntax and restrictions.

You create a dimension using either the CREATE DIMENSION statement or the
Dimension Wizard in Oracle Enterprise Manager. Within the CREATE DIMENSION
statement, use the LEVEL clause to identify the names of the dimension levels.

This customer dimension contains a single hierarchy with a geographical rollup, with
arrows drawn from the child level to the parent level

Each arrow in this graph indicates that for any child there is one and only one parent.

For example, each city must be contained in exactly one state and each state must be
contained in exactly one country. States that belong to more than one country violate
hierarchical integrity. Also, you must use the SKIP WHEN NULL clause if you want to
include cities that do not belong to a state, such as Washington D.C. Hierarchical
integrity is necessary for the correct operation of management functions for
materialized views that include aggregates.

For example, you can declare a dimension products_dim, which contains levels
product, subcategory, and category:

CREATE DIMENSION products_dim
LEVEL product IS (products.prod_id)
LEVEL subcategory IS (products.prod_subcategory)
LEVEL category IS (products.prod_category) ...


Each level in the dimension must correspond to one or more columns in a table in the
database. Thus, level product is identified by the column prod_id in the products
table and level subcategory is identified by a column called prod_subcategory in
the same table.
In this example, the database tables are denormalized and all the columns exist in the
same table. However, this is not a prerequisite for creating dimensions. "Using
Normalized Dimension Tables" on pageshows how to create a dimension
customers_dim that has a normalized schema design using the JOIN KEY clause.
The next step is to declare the relationship between the levels with the HIERARCHY
statement and give that hierarchy a name. A hierarchical relationship is a functional
dependency from one level of a hierarchy to the next level in the hierarchy. Using the level names defined previously, the CHILD OF relationship denotes that each child's level value is associated with one and only one parent level value. The following statement declares a hierarchy prod_rollup and defines the relationship between products, subcategory, and category:

HIERARCHY prod_rollup (product CHILD OF subcategory CHILD OF category)

What are Dimensions?

A dimension is a structure that categorizes data in order to enable users to answer
business questions. Commonly used dimensions are customers, products, and time.
For example, each sales channel of a clothing retailer might gather and store data
regarding sales and reclamations of their Cloth assortment. The retail chain
management can build a data warehouse to analyze the sales of its products across all
stores over time and help answer questions such as:

■ What is the effect of promoting one product on the sale of a related product that is
not promoted?
■ What are the sales of a product before and after a promotion?
■ How does a promotion affect the various distribution channels?

The data in the retailer's data warehouse system has two important components:
dimensions and facts. The dimensions are products, customers, promotions, channels,
and time. One approach for identifying your dimensions is to review your reference
tables, such as a product table that contains everything about a product, or a
promotion table containing all information about promotions. The facts are sales (units sold) and profits. A data warehouse contains facts about the sales of each product at on a daily basis.

A typical relational implementation for such a data warehouse is a star schema. The
fact information is stored in what is called a fact table, whereas the dimensional
information is stored in dimension tables. In our example, each sales transaction
record is uniquely defined as for each customer, for each product, for each sales
channel, for each promotion, and for each day (time).


In Oracle Database, the dimensional information itself is stored in a dimension table.
In addition, the database object dimension helps to organize and group dimensional
information into hierarchies. This represents natural 1:n relationships between
columns or column groups (the levels of a hierarchy) that cannot be represented with
constraint conditions. Going up a level in the hierarchy is called rolling up the data and going down a level in the hierarchy is called drilling down the data. In the retailer
example:
■ Within the time dimension, months roll up to quarters, quarters roll up to years,
and years roll up to all years.
■ Within the product dimension, products roll up to subcategories, subcategories
roll up to categories, and categories roll up to all products.
■ Within the customer dimension, customers roll up to city. Then cities roll up to
state. Then states roll up to country. Then countries roll up to subregion.
Finally, subregions roll up to region, as shown in following figure



Data analysis typically starts at higher levels in the dimensional hierarchy and
gradually drills down if the situation warrants such analysis.
Dimensions do not have to be defined. However, if your application uses dimensional
modeling, it is worth spending time creating them as it can yield significant benefits, because they help query rewrite perform more complex types of rewrite.

Dimensions are also beneficial to certain types of materialized view refresh operations and with the SQL Access Advisor. They are only mandatory if you use the SQL Access Advisor (a GUI tool for materialized view and index management) without a workload to recommend which materialized views and indexes to create, drop, or retain.

In spite of the benefits of dimensions, you must not create dimensions in any schema
that does not fully satisfy the dimensional relationships described in this chapter.
Incorrect results can be returned from queries otherwise.

Parallelization Rules for SQL Statements

A SQL statement can be parallelized if it includes a parallel hint or if the table or index being operated on has been declared PARALLEL with a CREATE or ALTER statement.
In addition, a DDL statement can be parallelized by using the PARALLEL clause.
However, not all of these methods apply to all types of SQL statements.

Parallelization has two components: the decision to parallelize and the DOP. These
components are determined differently for queries, DDL operations, and DML
operations.

To determine the DOP, Oracle looks at the reference objects:

■ Parallel query looks at each table and index, in the portion of the query being
parallelized, to determine which is the reference table. The basic rule is to pick the
table or index with the largest DOP.
■ For parallel DML (INSERT, UPDATE, MERGE, and DELETE), the reference object
that determines the DOP is the table being modified by an insert, update, or delete
operation. Parallel DML also adds some limits to the DOP to prevent deadlock. If
the parallel DML statement includes a subquery, the subquery's DOP is the same
as the DML operation.

■ For parallel DDL, the reference object that determines the DOP is the table, index,
or partition being created, rebuilt, split, or moved. If the parallel DDL statement
includes a subquery, the subquery's DOP is the same as the DDL operation
Rules for Parallelizing Queries
This section discusses some rules for parallelizing queries.
Decision to Parallelize A SELECT statement can be parallelized only if the following
conditions are satisfied:
■ The query includes a parallel hint specification (PARALLEL or PARALLEL_INDEX)
or the schema objects referred to in the query have a PARALLEL declaration
associated with them.
■ At least one of the tables specified in the query requires one of the following:
■ A full table scan
■ An index range scan spanning multiple partitions
■ No scalar subqueries are in the SELECT list.
Degree of Parallelism The DOP for a query is determined by the following rules:
■ The query uses the maximum DOP taken from all of the table declarations
involved in the query and all of the potential indexes that are candidates to satisfy
the query (the reference objects). That is, the table or index that has the greatest
DOP determines the query's DOP (maximum query directive).
■ If a table has both a parallel hint specification in the query and a parallel
declaration in its table specification, the hint specification takes precedence over
parallel declaration specification.

Rules for UPDATE, MERGE, and DELETE
UPDATE, MERGE, and DELETE operations are parallelized by partition or subpartition.
Update, merge, and delete parallelism are not possible within a partition, nor on a nonpartitioned table.


You have two ways to specify parallel directives for UPDATE, MERGE, and DELETE
operations (assuming that PARALLEL DML mode is enabled):

■ Use a parallel clause in the definition of the table being updated or deleted (the
reference object).
■ Use an update, merge, or delete parallel hint in the statement.

Parallel hints are placed immediately after the UPDATE, MERGE, or DELETE keywords
in UPDATE, MERGE, and DELETE statements. The hint also applies to the underlying
scan of the table being changed.

You can use the ALTER SESSION FORCE PARALLEL DML statement to override parallel
clauses for subsequent UPDATE, MERGE, and DELETE statements in a session. Parallel
hints in UPDATE, MERGE, and DELETE statements override the ALTER SESSION FORCE
PARALLEL DML statement.

Decision to Parallelize The following rule determines whether the UPDATE, MERGE, or
DELETE operation should be parallelized:

The UPDATE or DELETE operation will be parallelized if and only if at least one of the following is true:

■ The table being updated or deleted has a PARALLEL specification.
■ The PARALLEL hint is specified in the DML statement.
■ An ALTER SESSION FORCE PARALLEL DML statement has been issued previously
during the session.

If the statement contains subqueries or updatable views, then they may have their own
separate parallel hints or clauses. However, these parallel directives do not affect the decision to parallelize the UPDATE, MERGE, or DELETE.

The parallel hint or clause on the tables is used by both the query and the UPDATE,
MERGE, DELETE portions to determine parallelism, the decision to parallelize the
UPDATE, MERGE, or DELETE portion is made independently of the query portion, and
vice versa. Degree of Parallelism The DOP is determined by the same rules as for the queries. Note that in the case of UPDATE and DELETE operations, only the target table to be modified (the only reference object) is involved. Thus, the UPDATE or DELETE parallel hint specification takes precedence over the parallel declaration specification of the target table. In other words, the precedence order is: MERGE, UPDATE, DELETE hint > Session > Parallel declaration specification of target table.

A parallel execution server can update or merge into, or delete from multiple
partitions, but each partition can only be updated or deleted by one parallel execution server.

If the DOP is less than the number of partitions, then the first process to finish work on one partition continues working on another partition, and so on until the work is finished on all partitions. If the DOP is greater than the number of partitions involved in the operation, then the excess parallel execution servers will have no work to do.

Types of Parallelism

The following types of parallelism are discussed in this post:
■ Parallel Query
■ Parallel DDL
■ Parallel DML
■ Parallel Execution of Functions
■ Other Types of Parallelism


Parallel Query
You can parallelize queries and subqueries in SELECT statements. You can also
parallelize the query portions of DDL statements and DML statements (INSERT,
UPDATE, and DELETE). You can also query external tables in parallel


Parallel Queries on Index-Organized Tables
The following parallel scan methods are supported on index-organized tables:
■ Parallel fast full scan of a nonpartitioned index-organized table
■ Parallel fast full scan of a partitioned index-organized table
■ Parallel index range scan of a partitioned index-organized table

These scan methods can be used for index-organized tables with overflow areas and
for index-organized tables that contain LOBs.

Nonpartitioned Index-Organized Tables
Parallel query on a nonpartitioned index-organized table uses parallel fast full scan. The DOP is determined, in decreasing order of priority, by:

1. A PARALLEL hint (if present)
2. An ALTER SESSION FORCE PARALLEL QUERY statement
3. The parallel degree associated with the table, if the parallel degree is specified in the CREATE TABLE or ALTER TABLE statement

The allocation of work is done by dividing the index segment into a sufficiently large number of block ranges and then assigning the block ranges to parallel execution servers in a demand-driven manner. The overflow blocks corresponding to any row are accessed in a demand-driven manner only by the process which owns that row.


Partitioned Index-Organized Tables
Both index range scan and fast full scan can be performed in parallel. For parallel fast full scan, parallelization is exactly the same as for nonpartitioned index-organized tables. For parallel index range scan on partitioned index-organized tables, the DOP is the minimum of the degree picked up from the previous priority list (like in parallel fast full scan) and the number of partitions in the index-organized table. Depending on the DOP, each parallel execution server gets one or more partitions (assigned in a demand-driven manner), each of which contains the primary key index segment and the associated overflow segment, if any.


Parallel Queries on Object Types
Parallel queries can be performed on object type tables and tables containing object
type columns. Parallel query for object types supports all of the features that are
available for sequential queries on object types, including:
■ Methods on object types
Attribute access of object types
■ Constructors to create object type instances
■ Object views
■ PL/SQL and OCI queries for object types
There are no limitations on the size of the object types for parallel queries.
The following restrictions apply to using parallel query for object types.
■ A MAP function is needed to parallelize queries involving joins and sorts (through
ORDER BY, GROUP BY, or set operations). In the absence of a MAP function, the
query will automatically be executed serially.
■ Parallel DML and parallel DDL are not supported with object types. DML and
DDL statements are always performed serially.

In all cases where the query cannot execute in parallel because of any of these
restrictions, the whole query executes serially without giving an error message.


Parallel DDL
This section includes the following topics on parallelism for DDL statements:
■ DDL Statements That Can Be Parallelized
■ CREATE TABLE ... AS SELECT in Parallel
■ Recoverability and Parallel DDL
■ Space Management for Parallel DDL

DDL Statements That Can Be Parallelized
You can parallelize DDL statements for tables and indexes that are nonpartitioned or
partitioned.

The parallel DDL statements for nonpartitioned tables and indexes are:
■ CREATE INDEX
■ CREATE TABLE ... AS SELECT
■ ALTER INDEX ... REBUILD
The parallel DDL statements for partitioned tables and indexes are:
■ CREATE INDEX
■ CREATE TABLE ... AS SELECT
■ ALTER TABLE ... [MOVE|SPLIT|COALESCE] PARTITION
■ ALTER INDEX ... [REBUILD|SPLIT] PARTITION
■ This statement can be executed in parallel only if the (global) index partition
being split is usable.

All of these DDL operations can be performed in no-logging mode for either parallel
or serial execution.

CREATE TABLE for an index-organized table can be parallelized either with or without
an AS SELECT clause.Different parallelism is used for different operations
Parallel CREATE TABLE ... AS SELECT statements on partitioned tables and parallel
CREATE INDEX statements on partitioned indexes execute with a DOP equal to the
number of partitions.
Partition parallel analyze table is made less necessary by the ANALYZE {TABLE,
INDEX} PARTITION statements, since parallel analyze of an entire partitioned table
can be constructed with multiple user sessions.
Parallel DDL cannot occur on tables with object columns. Parallel DDL cannot occur
on non-partitioned tables with LOB columns.


CREATE TABLE ... AS SELECT in Parallel
For performance reasons, decision support applications often require large amounts of
data to be summarized or rolled up into smaller tables for use with ad hoc, decision
support queries. Rollup occurs regularly (such as nightly or weekly) during a short
period of system inactively.
Parallel execution lets you parallelize the query and create operations of creating a
table as a subquery from another table or set of tables.

Clustered tables cannot be created and populated in parallel.

Parallel DML
Parallel DML (PARALLEL INSERT, UPDATE, DELETE, and MERGE) uses parallel
execution mechanisms to speed up or scale up large DML operations against large
database tables and indexes.


Other Types of Parallelism
In addition to parallel SQL execution, Oracle can use parallelism for the following
types of operations:
■ Parallel recovery
■ Parallel propagation (replication)
■ Parallel load (the SQL*Loader utility)

Like parallel SQL, parallel recovery and propagation are performed by a parallel
execution coordinator and multiple parallel execution servers. Parallel load, however, uses a different mechanism.

The behavior of the parallel execution coordinator and parallel execution servers may
differ, depending on what kind of operation they perform (SQL, recovery, or propagation). For example, if all parallel execution servers in the pool are occupied
and the maximum number of parallel execution servers has been started:
■ In parallel SQL, the parallel execution coordinator switches to serial processing.
■ In parallel propagation, the parallel execution coordinator returns an error.
For a given session, the parallel execution coordinator coordinates only one kind of
operation. A parallel execution coordinator cannot coordinate, for example, parallel
SQL and parallel recovery or propagation at the same time.

Related Posts Plugin for WordPress, Blogger...

Please Share

Twitter Delicious Facebook Digg Stumbleupon Favorites More

 
Follow TutorialBlogs
Share on Facebook
Tweet this Blog
Add Blog to Technorati
Home