Informatica Tutorials

Big Data Analytics

Change Data Capture

Often, data warehousing involves the extraction and transportation of relational data
from one or more production databases into a data warehouse for analysis. Change
Data Capture quickly identifies and processes only the data that has changed and
makes the change data available for further use.

Capturing Change Data Without Change Data Capture
Prior to the introduction of Change Data Capture, there were a number of ways that
users could capture change data, including table differencing and change-value
selection. Table differencing involves transporting a copy of an entire table from the source (production) database to the staging database (where the change data is captured),where an older version of the table already exists. Using the SQL MINUS operator, you can obtain the inserted and new versions of updated rows with the following query:

SELECT * FROM new_version
MINUS SELECT * FROM old_version;

Moreover, you can obtain the deleted rows and old versions of updated rows with the
following query:

SELECT * FROM old_version
MINUS SELECT * FROM new_version;

However, there are several problems with this method:

■ It requires that the new version of the entire table be transported to the staging
database, not just the change data, thereby greatly increasing transport costs.
■ The computational cost of performing the two MINUS operations on the staging
database can be very high.
■ Table differencing cannot capture data that have reverted to their old values. For
example, suppose the price of a product changes several times between the old
version and the new version of the product's table. If the price in the new version
ends up being the same as the old, table differencing cannot detect that the price
has fluctuated. Moreover, any intermediate price values between the old and new
versions of the product's table cannot be captured using table differencing.
■ There is no way to determine which changes were made as part of the same
transaction. For example, suppose a sales manager creates a special discount to
close a deal. The fact that the creation of the discount and the creation of the sale
occurred as part of the same transaction cannot be captured, unless the source
database is specifically designed to do so.

Change-value selection involves capturing the data on the source database by
selecting the new and changed data from the source tables based on the value of a
specific column. For example, suppose the source table has a LAST_UPDATE_DATE
column. To capture changes, you base your selection from the source table on the
LAST_UPDATE_DATE column value.

However, there are also several problems with this method:

■ The overhead of capturing the change data must be borne on the source database,
and you must run potentially expensive queries against the source table on the
source database. The need for these queries may force you to add indexes that
would otherwise be unneeded. There is no way to offload this overhead to the
staging database.
■ This method is no better at capturing intermediate values than the table
differencing method. If the price in the product's table fluctuates, you will not be
able to capture all the intermediate values, or even tell if the price had changed, if
the ending value is the same as it was the last time that you captured change data.
■ This method is also no better than the table differencing method at capturing
which data changes were made together in the same transaction. If you need to
capture information concerning which changes occurred together in the same
transaction, you must include specific designs for this purpose in your source
database.
■ The granularity of the change-value column may not be fine enough to uniquely
identify the new and changed rows. For example, suppose the following:


– You capture data changes using change-value selection on a date column such
as LAST_UPDATE_DATE.

– The capture happens at a particular instant in time, 14-FEB-2003 17:10:00.
– Additional updates occur to the table during the same second that you
performed your capture.

When you next capture data changes, you will select rows with a LAST_UPDATE_
DATE strictly after 14-FEB-2003 17:10:00, and thereby miss the changes that
occurred during the remainder of that second.

To use change-value selection, you either have to accept that anomaly, add an
artificial change-value column with the granularity you need, or lock out changes
to the source table during the capture process, thereby further burdening the
performance of the source database.

■ You have to design your source database in advance with this capture mechanism
in mind – all tables from which you wish to capture change data must have a
change-value column. If you want to build a data warehouse with data sources
from legacy systems, those legacy systems may not supply the necessary
change-value columns you need.

Change Data Capture does not depend on expensive and cumbersome table
differencing or change-value selection mechanisms. Instead, it captures the change
data resulting from INSERT, UPDATE, and DELETE operations made to user tables.
The change data is then stored in a relational table called a change table, and the
change data is made available to applications or individuals in a controlled way.


Capturing Change Data with Change Data Capture
Change Data Capture can capture and publish committed change data in either of the
following modes:

■ Synchronous
Triggers on the source database allow change data to be captured immediately, as
each SQL statement that performs a data manipulation language (DML) operation
(INSERT, UPDATE, or DELETE) is made. In this mode, change data is captured as
part of the transaction modifying the source table. Synchronous Change Data
Capture is available with Oracle Standard Edition and Enterprise Edition.

■ Asynchronous
By taking advantage of the data sent to the redo log files, change data is captured
after a SQL statement that performs a DML operation is committed. In this mode,
change data is not captured as part of the transaction that is modifying the source
table, and therefore has no effect on that transaction. Asynchronous Change Data
Capture is available with Oracle Enterprise Edition only.

Related Posts Plugin for WordPress, Blogger...

Please Share

Twitter Delicious Facebook Digg Stumbleupon Favorites More

 
Follow TutorialBlogs
Share on Facebook
Tweet this Blog
Add Blog to Technorati
Home