Informatica Tutorials

Big Data Analytics

Schema Modeling Techniques

The following topics provide information about schemas in a data warehouse:

■ Schemas in Data Warehouses
■ Third Normal Form
■ Star Schemas
■ Optimizing Star Queries


Schemas in Data Warehouses
A schema is a collection of database objects, including tables, views, indexes, and
synonyms.
There is a variety of ways of arranging schema objects in the schema models designed
for data warehousing. One data warehouse schema model is a star schema. The sh
sample schema (the basis for most of the examples in this book) uses a star schema.
However, there are other schema models that are commonly used for data warehouses. The most prevalent of these schema models is the third normal form (3NF) schema. Additionally, some data warehouse schemas are neither star schemas nor 3NF schemas, but instead share characteristics of both schemas; these are referred to as hybrid schema models.

The Oracle Database is designed to support all data warehouse schemas. Some
features may be specific to one schema model . However, the vast majority of Oracle's data warehousing features are equally applicable to star schemas, 3NF schemas, and hybrid schemas. Key data warehousing capabilities such as partitioning (including the rolling window load technique), parallelism, materialized views, and analytic SQL are implemented in all schema models.

The determination of which schema model should be used for a data warehouse
should be based upon the requirements and preferences of the data warehouse project
team. Comparing the merits of the alternative schema models is outside of the scope of this book; instead, this chapter will briefly introduce each schema model and suggest how Oracle can be optimized for those environments.


Third Normal Form
Although this guide primarily uses star schemas in its examples, you can also use the
third normal form for your data warehouse implementation.Third normal form modeling is a classical relational-database modeling technique that minimizes data redundancy through normalization. When compared to a star schema, a 3NF schema typically has a larger number of tables due to this normalization process.

3NF schemas are typically chosen for large data warehouses, especially environments
with significant data-loading requirements that are used to feed data marts and
execute long-running queries.
The main advantages of 3NF schemas are that they:
■ Provide a neutral schema design, independent of any application or data-usage
considerations
■ May require less data-transformation than more normalized schemas such as star
schemas


Optimizing Third Normal Form Queries
Queries on 3NF schemas are often very complex and involve a large number of tables.
The performance of joins between large tables is thus a primary consideration when
using 3NF schemas.
One particularly important feature for 3NF schemas is partition-wise joins. The largest tables in a 3NF schema should be partitioned to enable partition-wise joins. The most common partitioning technique in these environments is composite range-hash
partitioning for the largest tables, with the most-common join key chosen as the
hash-partitioning key.
Parallelism is often heavily utilized in 3NF environments, and parallelism should
typically be enabled in these environments.




Star Schemas
The star schema is perhaps the simplest data warehouse schema. It is called a star
schema because the entity-relationship diagram of this schema resembles a star, with
points radiating from a central table. The center of the star consists of a large fact table and the points of the star are the dimension tables.

A star query is a join between a fact table and a number of dimension tables. Each
dimension table is joined to the fact table using a primary key to foreign key join, but the dimension tables are not joined to each other. The optimizer recognizes star
queries and generates efficient execution plans for them.

A typical fact table contains keys and measures. For example, in the sh sample
schema, the fact table, sales, contain the measures quantity_sold, amount, and
cost, and the keys cust_id, time_id, prod_id, channel_id, and promo_id. The
dimension tables are customers, times, products, channels, and promotions.
The products dimension table, for example, contains information about each product
number that appears in the fact table.

A star join is a primary key to foreign key join of the dimension tables to a fact table.

The main advantages of star schemas are that they:
■ Provide a direct and intuitive mapping between the business entities being
analyzed by end users and the schema design.
■ Provide highly optimized performance for typical star queries.
■ Are widely supported by a large number of business intelligence tools, which may
anticipate or even require that the data warehouse schema contain dimension
tables.





Snowflake Schemas
The snowflake schema is a more complex data warehouse model than a star schema,
and is a type of star schema. It is called a snowflake schema because the diagram of
the schema resembles a snowflake.

Snowflake schemas normalize dimensions to eliminate redundancy. That is, the
dimension data has been grouped into multiple tables instead of one large table. For
example, a product dimension table in a star schema might be normalized into a
products table, a product_category table, and a product_manufacturer table in
a snowflake schema. While this saves space, it increases the number of dimension
tables and requires more foreign key joins. The result is more complex queries and
reduced query performance

Related Posts Plugin for WordPress, Blogger...

Please Share

Twitter Delicious Facebook Digg Stumbleupon Favorites More

 
Follow TutorialBlogs
Share on Facebook
Tweet this Blog
Add Blog to Technorati
Home