Informatica Tutorials

Big Data Analytics

Using Bitmap Indexes in Data Warehouses

Bitmap indexes are widely used in data warehousing environments. The
environments typically have large amounts of data and ad hoc queries, but a low level
of concurrent DML transactions. For such applications, bitmap indexing provides:
■ Reduced response time for large classes of ad hoc queries.
■ Reduced storage requirements compared to other indexing techniques.
■ Dramatic performance gains even on hardware with a relatively small number of
CPUs or a small amount of memory.
■ Efficient maintenance during parallel DML and loads.

Fully indexing a large table with a traditional B-tree index can be prohibitively
expensive in terms of disk space because the indexes can be several times larger than
the data in the table. Bitmap indexes are typically only a fraction of the size of the indexed data in the table.

An index provides pointers to the rows in a table that contain a given key value. A
regular index stores a list of rowids for each key corresponding to the rows with that key value. In a bitmap index, a bitmap for each key value replaces a list of rowids.

Each bit in the bitmap corresponds to a possible rowid, and if the bit is set, it means that the row with the corresponding rowid contains the key value. A mapping
function converts the bit position to an actual rowid, so that the bitmap index provides the same functionality as a regular index. Bitmap indexes store the bitmaps in a compressed way. If the number of distinct key values is small, bitmap indexes
compress better and the space saving benefit compared to a B-tree index becomes even
better.

Bitmap indexes are most effective for queries that contain multiple conditions in the
WHERE clause. Rows that satisfy some, but not all, conditions are filtered out before the unsure of which indexes to create, the SQL Access Advisor can generate
recommendations on what to create. As the bitmaps from bitmap indexes can be
combined quickly, it is usually best to use single-column bitmap indexes.
When creating bitmap indexes, you should use NOLOGGING and COMPUTE
STATISTICS. In addition, you should keep in mind that bitmap indexes are usually
easier to destroy and re-create than to maintain.

Benefits for using Bit Map Index in Data Warehousing Applications

Bitmap indexes are primarily intended for data warehousing applications where users
query the data rather than update it. They are not suitable for OLTP applications with large numbers of concurrent transactions modifying the data.
Parallel query and parallel DML work with bitmap indexes. Bitmap indexing also
supports parallel create indexes and concatenated indexes

Cardinality

The advantages of using bitmap indexes are greatest for columns in which the ratio of
the number of distinct values to the number of rows in the table is small. We refer to this ratio as the degree of cardinality. A gender column, which has only two distinct values (male and female), is optimal for a bitmap index. However, data warehouse administrators also build bitmap indexes on columns with higher cardinalities.
For example, on a table with one million rows, a column with 10,000 distinct values is a candidate for a bitmap index. A bitmap index on this column can outperform a
B-tree index, particularly when this column is often queried in conjunction with other indexed columns. In fact, in a typical data warehouse environments, a bitmap index can be considered for any non-unique column.
B-tree indexes are most effective for high-cardinality data: that is, for data with many possible values, such as customer_name or phone_number. In a data warehouse,
B-tree indexes should be used only for unique columns or other columns with very
high cardinalities (that is, columns that are almost unique). The majority of indexes in a data warehouse should be bitmap indexes.
In ad hoc queries and similar situations, bitmap indexes can dramatically improve
query performance. AND and OR conditions in the WHERE clause of a query can be
resolved quickly by performing the corresponding Boolean operations directly on the
bitmaps before converting the resulting bitmap to rowids. If the resulting number of
rows is small, the query can be answered quickly without resorting to a full table scan.
Example 6–1 Bitmap Index
The following shows a portion of a company's customers table.
SELECT cust_id, cust_gender, cust_marital_status, cust_income_level
FROM customers;
CUST_ID C CUST_MARITAL_STATUS CUST_INCOME_LEVEL
---------- - -------------------- ---------------------
...
70 F D: 70,000 - 89,999
80 F married H: 150,000 - 169,999
Indexes 6-3
90 M single H: 150,000 - 169,999
100 F I: 170,000 - 189,999
110 F married C: 50,000 - 69,999
120 M single F: 110,000 - 129,999
130 M J: 190,000 - 249,999
140 M married G: 130,000 - 149,999
...
Because cust_gender, cust_marital_status, and cust_income_level are all
low-cardinality columns (there are only three possible values for marital status, two
possible values for gender, and 12 for income level), bitmap indexes are ideal for these
columns. Do not create a bitmap index on cust_id because this is a unique column.
Instead, a unique B-tree index on this column provides the most efficient
representation and retrieval.

Following Table illustrates the bitmap index for the cust_gender column in this example. It consists of two separate bitmaps, one for gender.



Each entry (or bit) in the bitmap corresponds to a single row of the customers table.
The value of each bit depends upon the values of the corresponding row in the table.
For example, the bitmap cust_gender='F' contains a one as its first bit because the
gender is F in the first row of the customers table. The bitmap cust_gender='F'
has a zero for its third bit because the gender of the third row is not F.
An analyst investigating demographic trends of the company's customers might ask,
"How many of our married customers have an income level of G or H?" This
corresponds to the following query:
SELECT COUNT(*) FROM customers
WHERE cust_marital_status = 'married'
AND cust_income_level IN ('H: 150,000 - 169,999', 'G: 130,000 - 149,999');
Bitmap indexes can efficiently process this query by merely counting the number of
ones in the bitmap illustrated in Following Figure. The result set will be found by using bitmap OR merge operations without the necessity of a conversion to rowids. To identify additional specific customer attributes that satisfy the criteria, use the
resulting bitmap to access the table after a bitmap to rowid conversion.



Bitmap Indexes and Nulls

Unlike most other types of indexes, bitmap indexes include rows that have NULL
values. Indexing of nulls can be useful for some types of SQL statements, such as
queries with the aggregate function COUNT.

Example 6–2 Bitmap Index
SELECT COUNT(*) FROM customers WHERE cust_marital_status IS NULL;
This query uses a bitmap index on cust_marital_status. Note that this query
would not be able to use a B-tree index, because B-tree indexes do not store the NULL
values.
SELECT COUNT(*) FROM customers;
Any bitmap index can be used for this query because all table rows are indexed,
including those that have NULL data. If nulls were not indexed, the optimizer would
be able to use indexes only on columns with NOT NULL constraints.

Related Posts Plugin for WordPress, Blogger...

Please Share

Twitter Delicious Facebook Digg Stumbleupon Favorites More

 
Follow TutorialBlogs
Share on Facebook
Tweet this Blog
Add Blog to Technorati
Home