Informatica Tutorials

Big Data Analytics

Loading Mechanisms

You can use the following mechanisms for loading a data warehouse:

■ Loading a Data Warehouse with SQL*Loader
■ Loading a Data Warehouse with External Tables
■ Loading a Data Warehouse with OCI and Direct-Path APIs
■ Loading a Data Warehouse with Export/Import


Loading a Data Warehouse with SQL*Loader

Before any data transformations can occur within the database, the raw data must
become accessible for the database. One approach is to load it into the database.
Chapter 13, "Transportation in Data Warehouses", discusses several techniques for
transporting data to an Oracle data warehouse. Perhaps the most common technique
for transporting data is by way of flat files.

SQL*Loader is used to move data from flat files into an Oracle data warehouse.
During this data load, SQL*Loader can also be used to implement basic data
transformations. When using direct-path SQL*Loader, basic data manipulation, such
as datatype conversion and simple NULL handling, can be automatically resolved
during the data load. Most data warehouses use direct-path loading for performance
reasons.

The conventional-path loader provides broader capabilities for data transformation
than a direct-path loader: SQL functions can be applied to any column as those values
are being loaded. This provides a rich capability for transformations during the data
load. However, the conventional-path loader is slower than direct-path loader. For
these reasons, the conventional-path loader should be considered primarily for
loading and transforming smaller amounts of data.

The following is a simple example of a SQL*Loader controlfile to load data into the
sales table of the sh sample schema from an external file sh_sales.dat. The
external flat file sh_sales.dat consists of sales transaction data, aggregated on a
daily level. Not all columns of this external file are loaded into sales. This external file will also be used as source for loading the second fact table of the sh sample schema, which is done using an external table:

The following shows the control file (sh_sales.ctl) loading the sales table:

LOAD DATA INFILE sh_sales.dat APPEND INTO TABLE sales
FIELDS TERMINATED BY "|"
(PROD_ID, CUST_ID, TIME_ID, CHANNEL_ID, PROMO_ID, QUANTITY_SOLD, AMOUNT_SOLD)

It can be loaded with the following command:
$ sqlldr sh/sh control=sh_sales.ctl direct=true


Loading a Data Warehouse with External Tables

Another approach for handling external data sources is using external tables. Oracle's external table feature enables you to use external data as a virtual table that can be queried and joined directly and in parallel without requiring the external data to be first loaded in the database. You can then use SQL, PL/SQL, and Java to access theexternal data.

External tables enable the pipelining of the loading phase with the transformation
phase. The transformation process can be merged with the loading process without
any interruption of the data streaming. It is no longer necessary to stage the data
inside the database for further processing inside the database, such as comparison or
transformation. For example, the conversion functionality of a conventional load can
be used for a direct-path INSERT AS SELECT statement in conjunction with the
SELECT from an external table.

The main difference between external tables and regular tables is that externally
organized tables are read-only. No DML operations (UPDATE/INSERT/DELETE) are
possible and no indexes can be created on them.

External tables are a mostly compliant to the existing SQL*Loader functionality and
provide superior functionality in most cases. External tables are especially useful for environments where the complete external source has to be joined with existing
database objects or when the data has to be transformed in a complex manner. For
example, unlike SQL*Loader, you can apply any arbitrary SQL transformation and use
the direct path insert method.

You can create an external table named sales_transactions_ext, representing
the structure of the complete sales transaction data, represented in the external file sh_sales.dat. The product department is especially interested in a cost analysis on product and time. We thus create a fact table named cost in the sales history schema. The operational source data is the same as for the sales fact table. However, because we are not investigating every dimensional information that is provided, the data in the cost fact table has a coarser granularity than in the sales fact table, for example, all different distribution channels are aggregated.
We cannot load the data into the cost fact table without applying the previously
mentioned aggregation of the detailed information, due to the suppression of some of
the dimensions.

The external table framework offers a solution to solve this. Unlike SQL*Loader,
where you would have to load the data before applying the aggregation, you can
combine the loading and transformation within a single SQL DML statement, as
shown in the following. You do not have to stage the data temporarily before inserting into the target table.

The object directories must already exist, and point to the directory containing the sh_ sales.dat file as well as the directory containing the bad and log files.

CREATE TABLE sales_transactions_ext
(PROD_ID NUMBER, CUST_ID NUMBER,
TIME_ID DATE, CHANNEL_ID NUMBER,
PROMO_ID NUMBER, QUANTITY_SOLD NUMBER,
AMOUNT_SOLD NUMBER(10,2), UNIT_COST NUMBER(10,2),
UNIT_PRICE NUMBER(10,2))
ORGANIZATION external (TYPE oracle_loader
DEFAULT DIRECTORY data_file_dir ACCESS PARAMETERS
(RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
BADFILE log_file_dir:'sh_sales.bad_xt'
LOGFILE log_file_dir:'sh_sales.log_xt'
FIELDS TERMINATED BY "|" LDRTRIM
( PROD_ID, CUST_ID,
TIME_ID DATE(10) "YYYY-MM-DD",
CHANNEL_ID, PROMO_ID, QUANTITY_SOLD, AMOUNT_SOLD,
UNIT_COST, UNIT_PRICE))
location ('sh_sales.dat')
)REJECT LIMIT UNLIMITED;

The external table can now be used from within the database, accessing some columns
of the external data only, grouping the data, and inserting it into the costs fact table:

INSERT /*+ APPEND */ INTO COSTS
(TIME_ID, PROD_ID, UNIT_COST, UNIT_PRICE)
SELECT TIME_ID, PROD_ID, AVG(UNIT_COST), AVG(amount_sold/quantity_sold)
FROM sales_transactions_ext GROUP BY time_id, prod_id;


Loading a Data Warehouse with OCI and Direct-Path APIs
OCI and direct-path APIs are frequently used when the transformation and
computation are done outside the database and there is no need for flat file staging.


Loading a Data Warehouse with Export/Import
Export and import are used when the data is inserted as is into the target system. No
complex extractions are possible

Related Posts Plugin for WordPress, Blogger...

Please Share

Twitter Delicious Facebook Digg Stumbleupon Favorites More

 
Follow TutorialBlogs
Share on Facebook
Tweet this Blog
Add Blog to Technorati
Home