Informatica Tutorials

Big Data Analytics

Error Logging and Handling Mechanisms

Having data that is not clean is very common when loading and transforming data,
especially when dealing with data coming from a variety of sources, including
external ones. If this dirty data causes you to abort a long-running load or
transformation operation, a lot of time and resources will be wasted. The following
sections discuss the two main causes of errors and how to address them:

■ Business Rule Violations
■ Data Rule Violations (Data Errors)

Business Rule Violations
Data that is logically not clean violates business rules that are known prior to any data consumption. Most of the time, handling these kind of errors will be incorporated into the loading or transformation process. However, in situations where the error identification for all records would become too expensive and the business rule can be enforced as a data rule violation, for example, testing hundreds of columns to see if they are NOT NULL, programmers often choose to handle even known possible logical error cases more generically.

Incorporating logical rules can be as easy as applying filter conditions on the data
input stream or as complex as feeding the dirty data into a different transformation
workflow. Some examples are as follows:

■ Filtering of logical data errors using SQL. Data that does not adhere to certain
conditions will be filtered out prior to being processed.
■ Identifying and separating logical data errors.

Data Rule Violations (Data Errors)
Unlike logical errors, data rule violations are not usually anticipated by the load or transformation process. Such unexpected data rule violations (also known as data
errors) that are not handled from an operation cause the operation to fail. Data rule
violations are error conditions that happen inside the database and cause a statement
to fail. Examples of this are data type conversion errors or constraint violations.
Prior to this release, SQL did not offer a way to handle data errors on a row level as part of its bulk processing. The only way to handle data errors inside the database was to use PL/SQL. Oracle Database 10g, Release2, however, enables you to log data errors into a special error table while the DML operation continues.

Handling Data Errors in PL/SQL
The following statement is an example of how error handling can be done using
PL/SQL. Note that you have to use procedural record-level processing to catch any

errm number default 0;
FOR crec IN (SELECT product_id, customer_id, TRUNC(sales_date) sd,
promotion_id, quantity, amount
FROM sales_activity_direct) loop
INSERT INTO sales VALUES (crec.product_id, crec.customer_id,, 3, crec.promotion_id,
WHEN others then
errm := sqlerrm;
INSERT INTO sales_activity_error
VALUES (errm, crec.product_id, crec.customer_id,,
crec.promotion_id, crec.quantity, crec.amount);
END loop;

Handling Data Errors with an Error Logging Table
DML error logging extends existing DML functionality by enabling you to specify the
name of an error logging table into which Oracle should record errors encountered
during DML operations. This enables you to complete the DML operation in spite of
any errors, and to take corrective action on the erroneous rows at a later time.

This DML error logging table consists of several mandatory control columns and a set
of user-defined columns that represent either all or a subset of the columns of the
target table of the DML operation using a data type that is capable of storing potential errors for the target column. For example, you need a VARCHAR2 data type in the error logging table to store TO_NUM data type conversion errors for a NUMBER column in the target table. You should use the DBMS_ERRLOG package to create the DML error logging tables. See the Oracle Database PL/SQL Packages and Types Reference for more information about this package and the structure of the logging table.
The column name mapping between the DML target table and an error logging table
determines which columns besides the control columns will be logged for a DML

INTO sales SELECT product_id, customer_id, TRUNC(sales_date), 3,
promotion_id, quantity, amount
FROM sales_activity_direct
LOG ERRORS INTO sales_activity_errors('load_20040802')

All data errors will be logged into table sales_activity_errors, identified by the
optional tag load_20040802. The INSERT statement will succeed even in the
presence of data errors. Note that you have to create the DML error logging table prior to using this statement.

If REJECT LIMIT X had been specified, the statement would have failed with the error
message of error X=1. The error message can be different for different reject limits.
In the case of a failing statement, only the DML statement is rolled back, not the
insertion into the DML error logging table. The error logging table will contain X+1

A DML error logging table can be in a different schema than the executing user, but
you must fully specify the table name in that case. Optionally, the name of the DML
error logging table can be omitted; Oracle then assumes a default name for the table as generated by the DBMS_ERRLOG package.

Oracle logs the following errors during DML operations:
■ Column values that are too large.
■ Constraint violations (NOT NULL, unique, referential, and check constraints).
■ Errors raised during trigger execution.
■ Errors resulting from type conversion between a column in a subquery and the
corresponding column of the table.
■ Partition mapping errors.

The following conditions cause the statement to fail and roll back without invoking the
error logging capability:
■ Violated deferred constraints.
■ Out of space errors.
■ Any direct-path INSERT operation (INSERT or MERGE) that raises a unique
constraint or index violation.
■ Any UPDATE operation (UPDATE or MERGE) that raises a unique constraint or
index violation.

Related Posts Plugin for WordPress, Blogger...

Please Share

Twitter Delicious Facebook Digg Stumbleupon Favorites More

Follow TutorialBlogs
Share on Facebook
Tweet this Blog
Add Blog to Technorati