Datawarehouse

Oracle9i
Data Warehousing Guide
Release 2 (9.2)
March 2002
Part No. A96520-01

Oracle9i Data Warehousing Guide, Release 2 (9.2)
Part No. A96520-01
Copyright © 1996, 2002 Oracle Corporation. All rights reserved.
Primary Author: Paul Lane
Contributing Authors: Viv Schupmann (Change Data Capture)
Contributors: Patrick Amor, Hermann Baer, Subhransu Basu, Srikanth Bellamkonda, Randy Bello,
Tolga Bozkaya, Benoit Dageville, John Haydu, Lilian Hobbs, Hakan Jakobsson, George Lumpkin, Cetin
Ozbutun, Jack Raitto, Ray Roccaforte, Sankar Subramanian, Gregory Smith, Ashish Thusoo,
Jean-Francois Verrier, Gary Vincent, Andy Witkowski, Zia Ziauddin
Graphic Designer: Valarie Moore
The Programs (which include both the software and documentation) contain proprietary information of
Oracle Corporation; they are provided under a license agreement containing restrictions on use and
disclosure and are also protected by copyright, patent and other intellectual and industrial property
laws. Reverse engineering, disassembly or decompilation of the Programs, except to the extent required
to obtain interoperability with other independently created software or as speciﬁed by law, is prohibited.
The information contained in this document is subject to change without notice. If you ﬁnd any problems
in the documentation, please report them to us in writing. Oracle Corporation does not warrant that this
document is error-free. Except as may be expressly permitted in your license agreement for these
Programs, no part of these Programs may be reproduced or transmitted in any form or by any means,
electronic or mechanical, for any purpose, without the express written permission of Oracle Corporation.
If the Programs are delivered to the U.S. Government or anyone licensing or using the programs on
behalf of the U.S. Government, the following notice is applicable:
Restricted Rights Notice Programs delivered subject to the DOD FAR Supplement are "commercial
computer software" and use, duplication, and disclosure of the Programs, including documentation,
shall be subject to the licensing restrictions set forth in the applicable Oracle license agreement.
Otherwise, Programs delivered subject to the Federal Acquisition Regulations are "restricted computer
software" and use, duplication, and disclosure of the Programs shall be subject to the restrictions in FAR
52.227-19, Commercial Computer Software - Restricted Rights (June, 1987). Oracle Corporation, 500
Oracle Parkway, Redwood City, CA 94065.
The Programs are not intended for use in any nuclear, aviation, mass transit, medical, or other inherently
dangerous applications. It shall be the licensee's responsibility to take all appropriate fail-safe, backup,
redundancy, and other measures to ensure the safe use of such applications if the Programs are used for
such purposes, and Oracle Corporation disclaims liability for any damages caused by such use of the
Programs.
Oracle is a registered trademark, and Express, Oracle Expert, Oracle Store, Oracle7, Oracle8, Oracle8i,
Oracle9i, Oracle Store, PL/SQL, Pro*C, and SQL*Plus are trademarks or registered trademarks of Oracle
Corporation. Other names may be trademarks of their respective owners.

iii
Contents
Send Us Your Comments ................................................................................................................. xix
Preface.......................................................................................................................................................... xxi
What’s New in Data Warehousing?........................................................................................ xxxiii
Part I Concepts
1 Data Warehousing Concepts
What is a Data Warehouse?............................................................................................................... 1-2
Subject Oriented............................................................................................................................ 1-2
Integrated....................................................................................................................................... 1-2
Nonvolatile .................................................................................................................................... 1-3
Time Variant.................................................................................................................................. 1-3
Contrasting OLTP and Data Warehousing Environments..................................................... 1-3
Data Warehouse Architectures......................................................................................................... 1-5
Data Warehouse Architecture (Basic)........................................................................................ 1-5
Data Warehouse Architecture (with a Staging Area).............................................................. 1-6
Data Warehouse Architecture (with a Staging Area and Data Marts) ................................. 1-7
Part II Logical Design
2 Logical Design in Data Warehouses
Logical Versus Physical Design in Data Warehouses.................................................................. 2-2

iv
Creating a Logical Design ................................................................................................................. 2-2
Data Warehousing Schemas.............................................................................................................. 2-3
Star Schemas.................................................................................................................................. 2-4
Other Schemas............................................................................................................................... 2-5
Data Warehousing Objects................................................................................................................ 2-5
Fact Tables...................................................................................................................................... 2-5
Dimension Tables ......................................................................................................................... 2-6
Unique Identifiers......................................................................................................................... 2-8
Relationships ................................................................................................................................. 2-8
Example of Data Warehousing Objects and Their Relationships.......................................... 2-8
Part III Physical Design
3 Physical Design in Data Warehouses
Moving from Logical to Physical Design....................................................................................... 3-2
Physical Design................................................................................................................................... 3-2
Physical Design Structures.......................................................................................................... 3-4
Tablespaces.................................................................................................................................... 3-4
Tables and Partitioned Tables..................................................................................................... 3-5
Views .............................................................................................................................................. 3-6
Integrity Constraints .................................................................................................................... 3-6
Indexes and Partitioned Indexes ................................................................................................ 3-6
Materialized Views....................................................................................................................... 3-7
Dimensions .................................................................................................................................... 3-7
4 Hardware and I/O Considerations in Data Warehouses
Overview of Hardware and I/O Considerations in Data Warehouses ..................................... 4-2
Why Stripe the Data?.................................................................................................................... 4-2
Automatic Striping ....................................................................................................................... 4-3
Manual Striping ............................................................................................................................ 4-4
Local and Global Striping............................................................................................................ 4-5
Analyzing Striping ....................................................................................................................... 4-6
RAID Conﬁgurations ......................................................................................................................... 4-9
RAID 0 (Striping) ........................................................................................................................ 4-10

v
RAID 1 (Mirroring)..................................................................................................................... 4-10
RAID 0+1 (Striping and Mirroring) ......................................................................................... 4-10
Striping, Mirroring, and Media Recovery............................................................................... 4-10
RAID 5.......................................................................................................................................... 4-11
The Importance of Specific Analysis........................................................................................ 4-12
5 Parallelism and Partitioning in Data Warehouses
Overview of Parallel Execution........................................................................................................ 5-2
When to Implement Parallel Execution..................................................................................... 5-2
Granules of Parallelism..................................................................................................................... 5-3
Block Range Granules.................................................................................................................. 5-3
Partition Granules......................................................................................................................... 5-4
Partitioning Design Considerations ............................................................................................... 5-4
Types of Partitioning.................................................................................................................... 5-4
Partitioning and Data Segment Compression........................................................................ 5-17
Partition Pruning ........................................................................................................................ 5-19
Partition-Wise Joins.................................................................................................................... 5-21
Miscellaneous Partition Operations ............................................................................................. 5-31
Adding Partitions ....................................................................................................................... 5-32
Dropping Partitions.................................................................................................................... 5-33
Exchanging Partitions................................................................................................................ 5-34
Moving Partitions....................................................................................................................... 5-34
Splitting and Merging Partitions.............................................................................................. 5-35
Truncating Partitions ................................................................................................................. 5-35
Coalescing Partitions.................................................................................................................. 5-36
6 Indexes
Bitmap Indexes.................................................................................................................................... 6-2
Bitmap Join Indexes...................................................................................................................... 6-6
B-tree Indexes .................................................................................................................................... 6-10
Local Indexes Versus Global Indexes ........................................................................................... 6-10
7 Integrity Constraints
Why Integrity Constraints are Useful in a Data Warehouse ...................................................... 7-2

vi
Overview of Constraint States.......................................................................................................... 7-3
Typical Data Warehouse Integrity Constraints ............................................................................. 7-4
UNIQUE Constraints in a Data Warehouse ............................................................................. 7-4
FOREIGN KEY Constraints in a Data Warehouse................................................................... 7-5
RELY Constraints.......................................................................................................................... 7-6
Integrity Constraints and Parallelism........................................................................................ 7-7
Integrity Constraints and Partitioning....................................................................................... 7-7
View Constraints........................................................................................................................... 7-7
8 Materialized Views
Overview of Data Warehousing with Materialized Views......................................................... 8-2
Materialized Views for Data Warehouses................................................................................. 8-2
Materialized Views for Distributed Computing...................................................................... 8-3
Materialized Views for Mobile Computing.............................................................................. 8-3
The Need for Materialized Views .............................................................................................. 8-3
Components of Summary Management ................................................................................... 8-5
Data Warehousing Terminology ................................................................................................ 8-7
Materialized View Schema Design ............................................................................................ 8-8
Loading Data ............................................................................................................................... 8-10
Overview of Materialized View Management Tasks............................................................ 8-11
Types of Materialized Views .......................................................................................................... 8-12
Materialized Views with Aggregates....................................................................................... 8-13
Materialized Views Containing Only Joins ............................................................................ 8-16
Nested Materialized Views ....................................................................................................... 8-18
Creating Materialized Views.......................................................................................................... 8-21
Naming Materialized Views ..................................................................................................... 8-22
Storage And Data Segment Compression............................................................................... 8-23
Build Methods............................................................................................................................. 8-23
Enabling Query Rewrite ............................................................................................................ 8-24
Query Rewrite Restrictions ....................................................................................................... 8-24
Refresh Options........................................................................................................................... 8-25
ORDER BY Clause ...................................................................................................................... 8-31
Materialized View Logs............................................................................................................. 8-31
Using Oracle Enterprise Manager............................................................................................ 8-32
Using Materialized Views with NLS Parameters .................................................................. 8-32

vii
Registering Existing Materialized Views..................................................................................... 8-33
Partitioning and Materialized Views............................................................................................ 8-35
Partition Change Tracking ........................................................................................................ 8-35
Partitioning a Materialized View ............................................................................................. 8-39
Partitioning a Prebuilt Table..................................................................................................... 8-40
Rolling Materialized Views....................................................................................................... 8-41
Materialized Views in OLAP Environments............................................................................... 8-41
OLAP Cubes................................................................................................................................ 8-41
Specifying OLAP Cubes in SQL............................................................................................... 8-42
Querying OLAP Cubes in SQL................................................................................................. 8-43
Partitioning Materialized Views for OLAP ............................................................................ 8-47
Compressing Materialized Views for OLAP.......................................................................... 8-47
Materialized Views with Set Operators .................................................................................. 8-47
Choosing Indexes for Materialized Views................................................................................... 8-49
Invalidating Materialized Views................................................................................................... 8-50
Security Issues with Materialized Views..................................................................................... 8-50
Altering Materialized Views .......................................................................................................... 8-51
Dropping Materialized Views........................................................................................................ 8-52
Analyzing Materialized View Capabilities................................................................................. 8-52
Using the DBMS_MVIEW.EXPLAIN_MVIEW Procedure................................................... 8-53
MV_CAPABILITIES_TABLE.CAPABILITY_NAME Details............................................... 8-56
MV_CAPABILITIES_TABLE Column Details ....................................................................... 8-58
9 Dimensions
What are Dimensions?....................................................................................................................... 9-2
Creating Dimensions ......................................................................................................................... 9-4
Multiple Hierarchies .................................................................................................................... 9-7
Using Normalized Dimension Tables ....................................................................................... 9-9
Viewing Dimensions........................................................................................................................ 9-10
Using The DEMO_DIM Package.............................................................................................. 9-10
Using Oracle Enterprise Manager............................................................................................ 9-11
Using Dimensions with Constraints............................................................................................. 9-11
Validating Dimensions.................................................................................................................... 9-12
Altering Dimensions........................................................................................................................ 9-13
Deleting Dimensions....................................................................................................................... 9-14

viii
Using the Dimension Wizard ......................................................................................................... 9-14
Managing the Dimension Object.............................................................................................. 9-14
Creating a Dimension................................................................................................................. 9-17
Part IV Managing the Warehouse Environment
10 Overview of Extraction, Transformation, and Loading
Overview of ETL ............................................................................................................................... 10-2
ETL Tools ............................................................................................................................................ 10-3
Daily Operations......................................................................................................................... 10-4
Evolution of the Data Warehouse ............................................................................................ 10-4
11 Extraction in Data Warehouses
Overview of Extraction in Data Warehouses............................................................................... 11-2
Introduction to Extraction Methods in Data Warehouses......................................................... 11-2
Logical Extraction Methods....................................................................................................... 11-3
Physical Extraction Methods..................................................................................................... 11-4
Change Data Capture................................................................................................................. 11-5
Data Warehousing Extraction Examples....................................................................................... 11-8
Extraction Using Data Files....................................................................................................... 11-8
Extraction Via Distributed Operations.................................................................................. 11-11
12 Transportation in Data Warehouses
Overview of Transportation in Data Warehouses ...................................................................... 12-2
Introduction to Transportation Mechanisms in Data Warehouses ......................................... 12-2
Transportation Using Flat Files ................................................................................................ 12-2
Transportation Through Distributed Operations .................................................................. 12-2
Transportation Using Transportable Tablespaces ................................................................. 12-3
13 Loading and Transformation
Overview of Loading and Transformation in Data Warehouses ............................................. 13-2
Transformation Flow.................................................................................................................. 13-2
Loading Mechanisms ....................................................................................................................... 13-5
SQL*Loader ................................................................................................................................. 13-5

ix
External Tables............................................................................................................................ 13-6
OCI and Direct-Path APIs ......................................................................................................... 13-8
Export/Import ............................................................................................................................ 13-8
Transformation Mechanisms.......................................................................................................... 13-9
Transformation Using SQL ....................................................................................................... 13-9
Transformation Using PL/SQL.............................................................................................. 13-15
Transformation Using Table Functions................................................................................. 13-16
Loading and Transformation Scenarios...................................................................................... 13-25
Parallel Load Scenario.............................................................................................................. 13-25
Key Lookup Scenario ............................................................................................................... 13-33
Exception Handling Scenario ................................................................................................. 13-34
Pivoting Scenarios .................................................................................................................... 13-35
14 Maintaining the Data Warehouse
Using Partitioning to Improve Data Warehouse Refresh ......................................................... 14-2
Refresh Scenarios........................................................................................................................ 14-5
Scenarios for Using Partitioning for Refreshing Data Warehouses .................................... 14-7
Optimizing DML Operations During Refresh ........................................................................... 14-8
Implementing an Efficient MERGE Operation ...................................................................... 14-9
Maintaining Referential Integrity........................................................................................... 14-10
Purging Data ............................................................................................................................. 14-11
Refreshing Materialized Views ................................................................................................... 14-12
Complete Refresh ..................................................................................................................... 14-13
Fast Refresh ............................................................................................................................... 14-14
ON COMMIT Refresh.............................................................................................................. 14-14
Manual Refresh Using the DBMS_MVIEW Package .......................................................... 14-14
Refresh Specific Materialized Views with REFRESH.......................................................... 14-15
Refresh All Materialized Views with REFRESH_ALL_MVIEWS ..................................... 14-16
Refresh Dependent Materialized Views with REFRESH_DEPENDENT......................... 14-16
Using Job Queues for Refresh................................................................................................. 14-18
When Refresh is Possible......................................................................................................... 14-18
Recommended Initialization Parameters for Parallelism................................................... 14-18
Monitoring a Refresh ............................................................................................................... 14-19
Checking the Status of a Materialized View......................................................................... 14-19
Tips for Refreshing Materialized Views with Aggregates ................................................. 14-19

x
Tips for Refreshing Materialized Views Without Aggregates........................................... 14-22
Tips for Refreshing Nested Materialized Views .................................................................. 14-23
Tips for Fast Refresh with UNION ALL ............................................................................... 14-25
Tips After Refreshing Materialized Views............................................................................ 14-25
Using Materialized Views with Partitioned Tables ................................................................. 14-26
Fast Refresh with Partition Change Tracking....................................................................... 14-26
Fast Refresh with CONSIDER FRESH................................................................................... 14-30
15 Change Data Capture
About Change Data Capture........................................................................................................... 15-2
Publish and Subscribe Model.................................................................................................... 15-3
Example of a Change Data Capture System........................................................................... 15-4
Components and Terminology for Synchronous Change Data Capture ........................... 15-5
Installation and Implementation................................................................................................... 15-8
Change Data Capture Restriction on Direct-Path INSERT................................................... 15-8
Security ............................................................................................................................................... 15-9
Columns in a Change Table............................................................................................................ 15-9
Change Data Capture Views......................................................................................................... 15-10
Synchronous Mode of Data Capture........................................................................................... 15-12
Publishing Change Data................................................................................................................ 15-12
Step 1: Decide which Oracle Instance will be the Source System...................................... 15-12
Step 2: Create the Change Tables that will Contain the Changes...................................... 15-12
Managing Change Tables and Subscriptions............................................................................ 15-14
Subscribing to Change Data......................................................................................................... 15-15
Steps Required to Subscribe to Change Data ....................................................................... 15-15
What Happens to Subscriptions when the Publisher Makes Changes............................. 15-19
Export and Import Considerations .............................................................................................. 15-20
16 Summary Advisor
Overview of the Summary Advisor in the DBMS_OLAP Package ........................................ 16-2
Using the Summary Advisor .......................................................................................................... 16-6
Identifier Numbers ..................................................................................................................... 16-7
Workload Management ............................................................................................................. 16-7
Loading a User-Defined Workload.......................................................................................... 16-9
Loading a Trace Workload...................................................................................................... 16-12

xi
Loading a SQL Cache Workload............................................................................................ 16-15
Validating a Workload............................................................................................................. 16-17
Removing a Workload............................................................................................................. 16-18
Using Filters with the Summary Advisor............................................................................. 16-18
Removing a Filter ..................................................................................................................... 16-22
Recommending Materialized Views...................................................................................... 16-23
SQL Script Generation ............................................................................................................. 16-27
Summary Data Report ............................................................................................................. 16-29
When Recommendations are No Longer Required............................................................. 16-31
Stopping the Recommendation Process................................................................................ 16-32
Summary Advisor Sample Sessions ...................................................................................... 16-32
Summary Advisor and Missing Statistics............................................................................. 16-37
Summary Advisor Privileges and ORA-30446..................................................................... 16-38
Estimating Materialized View Size............................................................................................. 16-38
ESTIMATE_MVIEW_SIZE Parameters................................................................................. 16-38
Is a Materialized View Being Used? ........................................................................................... 16-39
DBMS_OLAP.EVALUATE_MVIEW_STRATEGY Procedure........................................... 16-39
Summary Advisor Wizard............................................................................................................. 16-40
Summary Advisor Steps.......................................................................................................... 16-41
Part V Warehouse Performance
17 Schema Modeling Techniques
Schemas in Data Warehouses......................................................................................................... 17-2
Third Normal Form .......................................................................................................................... 17-2
Optimizing Third Normal Form Queries................................................................................ 17-3
Star Schemas...................................................................................................................................... 17-4
Snowflake Schemas .................................................................................................................... 17-5
Optimizing Star Queries ................................................................................................................. 17-6
Tuning Star Queries ................................................................................................................... 17-6
Using Star Transformation........................................................................................................ 17-7
18 SQL for Aggregation in Data Warehouses
Overview of SQL for Aggregation in Data Warehouses........................................................... 18-2

xii
Analyzing Across Multiple Dimensions ................................................................................. 18-3
Optimized Performance............................................................................................................. 18-4
An Aggregate Scenario .............................................................................................................. 18-5
Interpreting NULLs in Examples ............................................................................................. 18-6
ROLLUP Extension to GROUP BY................................................................................................ 18-6
When to Use ROLLUP ............................................................................................................... 18-7
ROLLUP Syntax.......................................................................................................................... 18-7
Partial Rollup............................................................................................................................... 18-8
CUBE Extension to GROUP BY ................................................................................................... 18-10
When to Use CUBE................................................................................................................... 18-10
CUBE Syntax ............................................................................................................................. 18-11
Partial CUBE.............................................................................................................................. 18-12
Calculating Subtotals Without CUBE.................................................................................... 18-13
GROUPING Functions .................................................................................................................. 18-13
GROUPING Function .............................................................................................................. 18-14
When to Use GROUPING ....................................................................................................... 18-16
GROUPING_ID Function........................................................................................................ 18-17
GROUP_ID Function................................................................................................................ 18-17
GROUPING SETS Expression ..................................................................................................... 18-19
Composite Columns....................................................................................................................... 18-21
Concatenated Groupings............................................................................................................... 18-24
Concatenated Groupings and Hierarchical Data Cubes..................................................... 18-26
Considerations when Using Aggregation.................................................................................. 18-28
Hierarchy Handling in ROLLUP and CUBE........................................................................ 18-28
Column Capacity in ROLLUP and CUBE............................................................................. 18-29
HAVING Clause Used with GROUP BY Extensions .......................................................... 18-29
ORDER BY Clause Used with GROUP BY Extensions ....................................................... 18-30
Using Other Aggregate Functions with ROLLUP and CUBE............................................ 18-30
Computation Using the WITH Clause........................................................................................ 18-30
19 SQL for Analysis in Data Warehouses
Overview of SQL for Analysis in Data Warehouses.................................................................. 19-2
Ranking Functions............................................................................................................................ 19-5
RANK and DENSE_RANK....................................................................................................... 19-5
Top N Ranking.......................................................................................................................... 19-12

xiii
Bottom N Ranking.................................................................................................................... 19-12
CUME_DIST.............................................................................................................................. 19-13
PERCENT_RANK..................................................................................................................... 19-14
NTILE......................................................................................................................................... 19-14
ROW_NUMBER........................................................................................................................ 19-16
Windowing Aggregate Functions................................................................................................ 19-17
Treatment of NULLs as Input to Window Functions ......................................................... 19-18
Windowing Functions with Logical Offset........................................................................... 19-18
Cumulative Aggregate Function Example ........................................................................... 19-18
Moving Aggregate Function Example .................................................................................. 19-19
Centered Aggregate Function................................................................................................. 19-20
Windowing Aggregate Functions in the Presence of Duplicates...................................... 19-21
Varying Window Size for Each Row ..................................................................................... 19-22
Windowing Aggregate Functions with Physical Offsets.................................................... 19-23
FIRST_VALUE and LAST_VALUE ....................................................................................... 19-24
Reporting Aggregate Functions ................................................................................................... 19-24
Reporting Aggregate Example ............................................................................................... 19-26
RATIO_TO_REPORT............................................................................................................... 19-27
LAG/LEAD Functions.................................................................................................................... 19-27
LAG/LEAD Syntax.................................................................................................................. 19-28
FIRST/LAST Functions.................................................................................................................. 19-28
FIRST/LAST Syntax................................................................................................................. 19-29
FIRST/LAST As Regular Aggregates.................................................................................... 19-29
FIRST/LAST As Reporting Aggregates................................................................................ 19-30
Linear Regression Functions ........................................................................................................ 19-31
REGR_COUNT ......................................................................................................................... 19-32
REGR_AVGY and REGR_AVGX ........................................................................................... 19-32
REGR_SLOPE and REGR_INTERCEPT................................................................................ 19-32
REGR_R2.................................................................................................................................... 19-32
REGR_SXX, REGR_SYY, and REGR_SXY............................................................................. 19-33
Linear Regression Statistics Examples................................................................................... 19-33
Sample Linear Regression Calculation.................................................................................. 19-34
Inverse Percentile Functions......................................................................................................... 19-34
Normal Aggregate Syntax....................................................................................................... 19-35
Inverse Percentile Restrictions................................................................................................ 19-38

xiv
Hypothetical Rank and Distribution Functions ....................................................................... 19-38
Hypothetical Rank and Distribution Syntax......................................................................... 19-38
WIDTH_BUCKET Function.......................................................................................................... 19-40
WIDTH_BUCKET Syntax........................................................................................................ 19-40
User-Deﬁned Aggregate Functions ............................................................................................. 19-43
CASE Expressions........................................................................................................................... 19-44
CASE Example .......................................................................................................................... 19-44
Creating Histograms With User-Defined Buckets............................................................... 19-45
20 OLAP and Data Mining
OLAP ................................................................................................................................................... 20-2
Benefits of OLAP and RDBMS Integration............................................................................. 20-2
Data Mining....................................................................................................................................... 20-4
Enabling Data Mining Applications ........................................................................................ 20-5
Predictions and Insights ............................................................................................................ 20-5
Mining Within the Database Architecture.............................................................................. 20-5
Java API........................................................................................................................................ 20-7
21 Using Parallel Execution
Introduction to Parallel Execution Tuning................................................................................... 21-2
When to Implement Parallel Execution................................................................................... 21-2
Operations That Can Be Parallelized....................................................................................... 21-3
The Parallel Execution Server Pool .......................................................................................... 21-3
How Parallel Execution Servers Communicate ..................................................................... 21-5
Parallelizing SQL Statements.................................................................................................... 21-6
Types of Parallelism ....................................................................................................................... 21-11
Parallel Query............................................................................................................................ 21-11
Parallel DDL .............................................................................................................................. 21-13
Parallel DML.............................................................................................................................. 21-18
Parallel Execution of Functions .............................................................................................. 21-28
Other Types of Parallelism...................................................................................................... 21-29
Initializing and Tuning Parameters for Parallel Execution .................................................... 21-30
Selecting Automated or Manual Tuning of Parallel Execution ......................................... 21-31
Using Automatically Derived Parameter Settings............................................................... 21-31
Setting the Degree of Parallelism ........................................................................................... 21-32

xv
How Oracle Determines the Degree of Parallelism for Operations.................................. 21-34
Balancing the Workload .......................................................................................................... 21-37
Parallelization Rules for SQL Statements.............................................................................. 21-38
Enabling Parallelism for Tables and Queries ....................................................................... 21-46
Degree of Parallelism and Adaptive Multiuser: How They Interact................................ 21-47
Forcing Parallel Execution for a Session ............................................................................... 21-48
Controlling Performance with the Degree of Parallelism .................................................. 21-48
Tuning General Parameters for Parallel Execution.................................................................. 21-49
Parameters Establishing Resource Limits for Parallel Operations.................................... 21-49
Parameters Affecting Resource Consumption ..................................................................... 21-58
Parameters Related to I/O ...................................................................................................... 21-63
Monitoring and Diagnosing Parallel Execution Performance............................................... 21-64
Is There Regression?................................................................................................................. 21-66
Is There a Plan Change?........................................................................................................... 21-66
Is There a Parallel Plan?........................................................................................................... 21-66
Is There a Serial Plan? .............................................................................................................. 21-66
Is There Parallel Execution?.................................................................................................... 21-67
Is the Workload Evenly Distributed? .................................................................................... 21-67
Monitoring Parallel Execution Performance with Dynamic Performance Views .......... 21-68
Monitoring Session Statistics .................................................................................................. 21-71
Monitoring System Statistics................................................................................................... 21-73
Monitoring Operating System Statistics................................................................................ 21-74
Afﬁnity and Parallel Operations.................................................................................................. 21-75
Affinity and Parallel Queries .................................................................................................. 21-75
Affinity and Parallel DML....................................................................................................... 21-76
Miscellaneous Parallel Execution Tuning Tips......................................................................... 21-76
Setting Buffer Cache Size for Parallel Operations ............................................................... 21-77
Overriding the Default Degree of Parallelism...................................................................... 21-77
Rewriting SQL Statements ...................................................................................................... 21-78
Creating and Populating Tables in Parallel.......................................................................... 21-78
Creating Temporary Tablespaces for Parallel Sort and Hash Join.................................... 21-80
Executing Parallel SQL Statements........................................................................................ 21-81
Using EXPLAIN PLAN to Show Parallel Operations Plans .............................................. 21-81
Additional Considerations for Parallel DML ....................................................................... 21-82
Creating Indexes in Parallel .................................................................................................... 21-85

xvi
Parallel DML Tips..................................................................................................................... 21-87
Incremental Data Loading in Parallel.................................................................................... 21-90
Using Hints with Cost-Based Optimization ......................................................................... 21-92
FIRST_ROWS(n) Hint .............................................................................................................. 21-93
Enabling Dynamic Statistic Sampling.................................................................................... 21-93
22 Query Rewrite
Overview of Query Rewrite............................................................................................................ 22-2
Cost-Based Rewrite..................................................................................................................... 22-3
When Does Oracle Rewrite a Query? ...................................................................................... 22-4
Enabling Query Rewrite.................................................................................................................. 22-7
Initialization Parameters for Query Rewrite .......................................................................... 22-8
Controlling Query Rewrite........................................................................................................ 22-8
Privileges for Enabling Query Rewrite.................................................................................... 22-9
Accuracy of Query Rewrite..................................................................................................... 22-10
How Oracle Rewrites Queries...................................................................................................... 22-11
Text Match Rewrite Methods.................................................................................................. 22-12
General Query Rewrite Methods............................................................................................ 22-13
When are Constraints and Dimensions Needed? ................................................................ 22-14
Special Cases for Query Rewrite ................................................................................................. 22-45
Query Rewrite Using Partially Stale Materialized Views................................................... 22-45
Query Rewrite Using Complex Materialized Views........................................................... 22-49
Query Rewrite Using Nested Materialized Views............................................................... 22-50
Query Rewrite When Using GROUP BY Extensions .......................................................... 22-51
Did Query Rewrite Occur?............................................................................................................ 22-56
Explain Plan............................................................................................................................... 22-56
DBMS_MVIEW.EXPLAIN_REWRITE Procedure ............................................................... 22-57
Design Considerations for Improving Query Rewrite Capabilities..................................... 22-63
Query Rewrite Considerations: Constraints......................................................................... 22-63
Query Rewrite Considerations: Dimensions ........................................................................ 22-63
Query Rewrite Considerations: Outer Joins ......................................................................... 22-63
Query Rewrite Considerations: Text Match ......................................................................... 22-63
Query Rewrite Considerations: Aggregates ......................................................................... 22-64
Query Rewrite Considerations: Grouping Conditions ....................................................... 22-64
Query Rewrite Considerations: Expression Matching........................................................ 22-64

xvii
Query Rewrite Considerations: Date Folding...................................................................... 22-65
Query Rewrite Considerations: Statistics.............................................................................. 22-65
Glossary
Index

xix
Send Us Your Comments
Oracle9i Data Warehousing Guide, Release 2 (9.2)
Part No. A96520-01
Oracle Corporation welcomes your comments and suggestions on the quality and usefulness of this
document. Your input is an important part of the information used for revision.
s Did you ﬁnd any errors?
s Is the information clearly presented?
s Do you need more information? If so, where?
s Are the examples correct? Do you need more examples?
s What features did you like most?
If you ﬁnd any errors or have any other suggestions for improvement, please indicate the document
title and part number, and the chapter, section, and page number (if available). You can send com-
ments to us in the following ways:
s Electronic mail: infodev_us@oracle.com
s FAX: (650) 506-7227 Attn: Server Technologies Documentation Manager
s Postal service:
Oracle Corporation
Server Technologies Documentation
500 Oracle Parkway, Mailstop 4op11
Redwood Shores, CA 94065
USA
If you would like a reply, please give your name, address, telephone number, and (optionally) elec-
tronic mail address.
If you have problems with the software, please contact your local Oracle Support Services.

xxi
Preface
This manual provides information about Oracle9i’s data warehousing capabilities.
This preface contains these topics:
s Audience
s Organization
s Related Documentation
s Conventions
s Documentation Accessibility

xxii
Audience
Oracle9i Data Warehousing Guide is intended for database administrators, system
administrators, and database application developers who design, maintain, and use
data warehouses.
To use this document, you need to be familiar with relational database concepts,
basic Oracle server concepts, and the operating system environment under which
you are running Oracle.
Organization
This document contains:
Part 1: Concepts
Chapter 1, Data Warehousing Concepts
This chapter contains an overview of data warehousing concepts.
Part 2: Logical Design
Chapter 2, Logical Design in Data Warehouses
This chapter discusses the logical design of a data warehouse.
Part 3: Physical Design
Chapter 3, Physical Design in Data Warehouses
This chapter discusses the physical design of a data warehouse.
Chapter 4, Hardware and I/O Considerations in Data Warehouses
This chapter describes some hardware and input-output issues.
Chapter 5, Parallelism and Partitioning in Data Warehouses
This chapter describes the basics of parallelism and partitioning in data
warehouses.
Chapter 6, Indexes
This chapter describes how to use indexes in data warehouses.

xxiii
Chapter 7, Integrity Constraints
This chapter describes some issues involving constraints.
Chapter 8, Materialized Views
This chapter describes how to use materialized views in data warehouses.
Chapter 9, Dimensions
This chapter describes how to use dimensions in data warehouses.
Part 4: Managing the Warehouse Environment
Chapter 10, Overview of Extraction, Transformation, and Loading
This chapter is an overview of the ETL process.
Chapter 11, Extraction in Data Warehouses
This chapter describes extraction issues.
Chapter 12, Transportation in Data Warehouses
This chapter describes transporting data in data warehouses.
Chapter 13, Loading and Transformation
This chapter describes transforming data in data warehouses.
Chapter 14, Maintaining the Data Warehouse
This chapter describes how to refresh in a data warehousing environment.
Chapter 15, Change Data Capture
This chapter describes how to use Change Data Capture capabilities.
Chapter 16, Summary Advisor
This chapter describes how to use the Summary Advisor utility.

xxiv
Part 5: Warehouse Performance
Chapter 17, Schema Modeling Techniques
This chapter describes the schemas useful in data warehousing environments.
Chapter 18, SQL for Aggregation in Data Warehouses
This chapter explains how to use SQL aggregation in data warehouses.
Chapter 19, SQL for Analysis in Data Warehouses
This chapter explains how to use analytic functions in data warehouses.
Chapter 20, OLAP and Data Mining
This chapter describes using analytic services in combination with Oracle9i.
Chapter 21, Using Parallel Execution
This chapter describes how to tune data warehouses using parallel execution.
Chapter 22, Query Rewrite
This chapter describes how to use query rewrite.
Glossary
Related Documentation
For more information, see these Oracle resources:
s Oracle9i Database Performance Tuning Guide and Reference
Many of the examples in this book use the sample schemas of the seed database,
which is installed by default when you install Oracle. Refer to Oracle9i Sample
Schemas for information on how these schemas were created and how you can use
them yourself.
In North America, printed documentation is available for sale in the Oracle Store at
http://oraclestore.oracle.com/

xxv
Customers in Europe, the Middle East, and Africa (EMEA) can purchase
documentation from
http://www.oraclebookshop.com/
Other customers can contact their Oracle representative to purchase printed
documentation.
To download free release notes, installation documentation, white papers, or other
collateral, please visit the Oracle Technology Network (OTN). You must register
online before using OTN; registration is free and can be done at
http://otn.oracle.com/admin/account/membership.html
If you already have a username and password for OTN, then you can go directly to
the documentation section of the OTN Web site at
http://otn.oracle.com/docs/index.htm
To access the database documentation search engine directly, please visit
http://tahiti.oracle.com
For additional information, see:
s The Data Warehouse Toolkit by Ralph Kimball (John Wiley and Sons, 1996)
s Building the Data Warehouse by William Inmon (John Wiley and Sons, 1996)
Conventions
This section describes the conventions used in the text and code examples of this
documentation set. It describes:
s Conventions in Text
s Conventions in Code Examples
s Conventions for Windows Operating Systems

xxvi
Conventions in Text
We use various conventions in text to help you more quickly identify special terms.
The following table describes those conventions and provides examples of their use.
Convention Meaning Example
Bold Bold typeface indicates terms that are
defined in the text or terms that appear in
a glossary, or both.
When you specify this clause, you create an
index-organized table.
Italics Italic typeface indicates book titles or
emphasis.
Oracle9i Database Concepts
Ensure that the recovery catalog and target
database do not reside on the same disk.
UPPERCASE
monospace
(fixed-width)
font
Uppercase monospace typeface indicates
elements supplied by the system. Such
elements include parameters, privileges,
datatypes, RMAN keywords, SQL
keywords, SQL*Plus or utility commands,
packages and methods, as well as
system-supplied column names, database
objects and structures, usernames, and
roles.
You can specify this clause only for a NUMBER
column.
You can back up the database by using the
BACKUP command.
Query the TABLE_NAME column in the USER_
TABLES data dictionary view.
Use the DBMS_STATS.GENERATE_STATS
procedure.
lowercase
monospace
(fixed-width)
font
Lowercase monospace typeface indicates
executables, filenames, directory names,
and sample user-supplied elements. Such
elements include computer and database
names, net service names, and connect
identifiers, as well as user-supplied
database objects and structures, column
names, packages and classes, usernames
and roles, program units, and parameter
values.
Note: Some programmatic elements use a
mixture of UPPERCASE and lowercase.
Enter these elements as shown.
Enter sqlplus to open SQL*Plus.
The password is specified in the orapwd file.
Back up the datafiles and control files in the
/disk1/oracle/dbs directory.
The department_id, department_name,
and location_id columns are in the
hr.departments table.
Set the QUERY_REWRITE_ENABLED
initialization parameter to true.
Connect as oe user.
The JRepUtil class implements these
methods.
lowercase
italic
monospace
(fixed-width)
font
Lowercase italic monospace font
represents placeholders or variables.
You can specify the parallel_clause.
Run Uold_release.SQL where old_
release refers to the release you installed
prior to upgrading.

xxvii
Conventions in Code Examples
Code examples illustrate SQL, PL/SQL, SQL*Plus, or other command-line
statements. They are displayed in a monospace (ﬁxed-width) font and separated
from normal text as shown in this example:
SELECT username FROM dba_users WHERE username = ’MIGRATE’;
The following table describes typographic conventions used in code examples and
provides examples of their use.
[ ] Brackets enclose one or more optional
items. Do not enter the brackets.
DECIMAL (digits [ , precision ])
{ } Braces enclose two or more items, one of
which is required. Do not enter the braces.
{ENABLE | DISABLE}
| A vertical bar represents a choice of two
or more options within brackets or braces.
Enter one of the options. Do not enter the
vertical bar.
{ENABLE | DISABLE}
[COMPRESS | NOCOMPRESS]
... Horizontal ellipsis points indicate either:
s That we have omitted parts of the
code that are not directly related to
the example
s That you can repeat a portion of the
code
CREATE TABLE ... AS subquery;
SELECT col1, col2, ... , coln FROM
employees;
.
.
.
Vertical ellipsis points indicate that we
have omitted several lines of code not
directly related to the example.
SQL> SELECT NAME FROM V$DATAFILE;
NAME
------------------------------------
/fsl/dbs/tbs_01.dbf
/fs1/dbs/tbs_02.dbf
.
.
.
/fsl/dbs/tbs_09.dbf
9 rows selected.
Other notation You must enter symbols other than
brackets, braces, vertical bars, and ellipsis
points as shown.
acctbal NUMBER(11,2);
acct CONSTANT NUMBER(4) := 3;

xxviii
Conventions for Windows Operating Systems
The following table describes conventions for Windows operating systems and
provides examples of their use.
Italics Italicized text indicates placeholders or
variables for which you must supply
particular values.
CONNECT SYSTEM/system_password
DB_NAME = database_name
UPPERCASE Uppercase typeface indicates elements
supplied by the system. We show these
terms in uppercase in order to distinguish
them from terms you define. Unless terms
appear in brackets, enter them in the
order and with the spelling shown.
However, because these terms are not
case sensitive, you can enter them in
lowercase.
SELECT last_name, employee_id FROM
employees;
SELECT * FROM USER_TABLES;
DROP TABLE hr.employees;
lowercase Lowercase typeface indicates
programmatic elements that you supply.
For example, lowercase indicates names
of tables, columns, or files.
Note: Some programmatic elements use a
mixture of UPPERCASE and lowercase.
Enter these elements as shown.
SELECT last_name, employee_id FROM
employees;
sqlplus hr/hr
CREATE USER mjones IDENTIFIED BY ty3MU9;
Choose Start > How to start a program. To start the Database Configuration Assistant,
choose Start > Programs > Oracle - HOME_
NAME > Configuration and Migration Tools >
Database Configuration Assistant.
File and directory
names
File and directory names are not case
sensitive. The following special characters
are not allowed: left angle bracket (<),
right angle bracket (>), colon (:), double
quotation marks ("), slash (/), pipe (|),
and dash (-). The special character
backslash () is treated as an element
separator, even when it appears in quotes.
If the file name begins with , then
Windows assumes it uses the Universal
Naming Convention.
c:winnt""system32 is the same as
C:WINNTSYSTEM32

xxix
C:> Represents the Windows command
prompt of the current hard disk drive.
The escape character in a command
prompt is the caret (^). Your prompt
reﬂects the subdirectory in which you are
working. Referred to as the command
prompt in this manual.
C:oracleoradata>
Special characters The backslash () special character is
sometimes required as an escape
character for the double quotation mark
(") special character at the Windows
command prompt. Parentheses and the
single quotation mark (’) do not require
an escape character. Refer to your
Windows operating system
documentation for more information on
escape and special characters.
C:>exp scott/tiger TABLES=emp
QUERY="WHERE job=’SALESMAN’ and
sal<1600"
C:>imp SYSTEM/password FROMUSER=scott
TABLES=(emp, dept)
HOME_NAME Represents the Oracle home name. The
home name can be up to 16 alphanumeric
characters. The only special character
allowed in the home name is the
underscore.
C:> net start OracleHOME_NAMETNSListener

xxx
ORACLE_HOME
and ORACLE_
BASE
In releases prior to Oracle8i release 8.1.3,
when you installed Oracle components,
all subdirectories were located under a
top level ORACLE_HOME directory that by
default used one of the following names:
s C:orant for Windows NT
s C:orawin98 for Windows 98
This release complies with Optimal
Flexible Architecture (OFA) guidelines.
All subdirectories are not under a top
level ORACLE_HOME directory. There is a
top level directory called ORACLE_BASE
that by default is C:oracle. If you
install the latest Oracle release on a
computer with no other Oracle software
installed, then the default setting for the
ﬁrst Oracle home directory is
C:oracleorann, where nn is the
latest release number. The Oracle home
directory is located directly under
ORACLE_BASE.
All directory path examples in this guide
follow OFA conventions.
Refer to Oracle9i Database Getting Started
for Windows for additional information
about OFA compliances and for
information about installing Oracle
products in non-OFA compliant
directories.
Go to the ORACLE_BASEORACLE_
HOMErdbmsadmin directory.

xxxi
Documentation Accessibility
Our goal is to make Oracle products, services, and supporting documentation
accessible, with good usability, to the disabled community. To that end, our
documentation includes features that make information available to users of
assistive technology. This documentation is available in HTML format, and contains
markup to facilitate access by the disabled community. Standards will continue to
evolve over time, and Oracle Corporation is actively engaged with other
market-leading technology vendors to address technical obstacles so that our
documentation can be accessible to all of our customers. For additional information,
visit the Oracle Accessibility Program Web site at
http://www.oracle.com/accessibility/
Accessibility of Code Examples in Documentation JAWS, a Windows screen
reader, may not always correctly read the code examples in this document. The
conventions for writing code require that closing braces should appear on an
otherwise empty line; however, JAWS may not always read a line of text that
consists solely of a bracket or brace.
Accessibility of Links to External Web Sites in Documentation This
documentation may contain links to Web sites of other companies or organizations
that Oracle Corporation does not own or control. Oracle Corporation neither
evaluates nor makes any representations regarding the accessibility of these Web
sites.

xxxiii
What’s New in Data Warehousing?
This section describes new features of Oracle9i release 2 (9.2) and provides pointers
to additional information. New features information from previous releases is also
retained to help those users migrating to the current release.
The following sections describe the new features in Oracle Data Warehousing:
s Oracle9i Release 2 (9.2) New Features in Data Warehousing
s Oracle9i Release 1 (9.0.1) New Features in Data Warehousing

xxxiv
Oracle9i Release 2 (9.2) New Features in Data Warehousing
s Data Segment Compression
You can compress data segments in heap-organized tables, and a typical
example of a heap-organized table you should consider for data segment
compression is partitioned tables. Data segment compression is also useful for
highly redundant data, such as tables with many foreign keys and materialized
views created with the ROLLUP clause. You should avoid compression on tables
with many updates or DML.
s Materialized View Enhancements
You can now nest materialized views when the materialized view contains joins
and aggregates. Fast refresh is now possible on a materialized views containing
the UNION ALL operator. Various restrictions were removed in addition to
expanding the situations where materialized views could be effectively used. In
particular, using materialized views in an OLAP environment has been
improved.
s Parallel DML on Non-Partitioned Tables
You can now use parallel DML on non-partitioned tables.
s Partitioning Enhancements
You can now simplify SQL syntax by using a DEFAULT partition or a
subpartition template. You can implement SPLIT operations more easily.
See Also: Chapter 8, "Materialized Views"
See Also: "Overview of Data Warehousing with Materialized
Views" on page 8-2 and "Materialized Views in OLAP
Environments" on page 8-41, and Chapter 14, "Maintaining the
Data Warehouse"
See Also: Chapter 21, "Using Parallel Execution"
See Also: "Partitioning Methods" on page 5-5, Chapter 5,
"Parallelism and Partitioning in Data Warehouses", and Oracle9i
Database Administrator’s Guide

xxxv
s Query Rewrite Enhancements
Text match processing and join equivalence recognition have been improved.
Materialized views containing the UNION ALL operator can now use query
rewrite.
s Range-List Partitioning
You can now subpartition by list range-partitioned tables.
s Summary Advisor Enhancements
The Summary Advisor tool and its related DBMS_OLAP package were improved
so you can restrict workloads to a speciﬁc schema.
Oracle9i Release 1 (9.0.1) New Features in Data Warehousing
s Analytic Functions
Oracle’s analytic capabilities have been improved through the addition of
Inverse percentile, hypothetical distribution, and ﬁrst/last analytic functions.
s Bitmap Join Index
A bitmap join index spans multiple tables and improves the performance of
joins of those tables.
s ETL Enhancements
Oracle’s extraction, transformation, and loading capabilities have been
improved with a MERGE statement, multi-table inserts, and table functions.
See Also: Chapter 22, "Query Rewrite"
See Also: "Types of Partitioning" on page 5-4
See Also: Chapter 16, "Summary Advisor"
See Also: Chapter 19, "SQL for Analysis in Data Warehouses"
See Also: "Bitmap Indexes" on page 6-2
See Also: Chapter 10, "Overview of Extraction, Transformation,
and Loading"

xxxvi
s Full Outer Joins
Oracle added full support for full outer joins so that you can more easily
express certain complex queries.
s Grouping Sets
You can now selectively specify the set of groups that you want to create using
a GROUPING SETS expression within a GROUP BY clause. This allows precise
specification across multiple dimensions without computing the whole CUBE.
s List Partitioning
List partitioning offers you precise control over which data belongs in a
particular partition.
s Materialized View Enhancements
Various restrictions were removed in addition to expanding the situations
where materialized views could be effectively used.
s Query Rewrite Enhancements
The query rewrite feature, which allows many SQL statements to use
materialized views, thereby improving performance significantly, was
improved significantly. Text match processing and join equivalence recognition
have been improved.
See Also: Oracle9i Database Performance Tuning Guide and Reference
See Also: Chapter 18, "SQL for Aggregation in Data Warehouses"
See Also: "Partitioning Design Considerations" on page 5-4 and
Oracle9i Database Concepts, and Oracle9i Database Administrator’s
Guide
See Also: "Overview of Data Warehousing with Materialized
Views" on page 8-2
See Also: Chapter 22, "Query Rewrite"

xxxvii
s Summary Advisor Enhancements
The Summary Advisor tool and its related DBMS_OLAP package were improved
so you can specify workloads. In addition, a broader class of schemas is now
supported.
s WITH Clause
The WITH clause enables you to reuse a query block in a SELECT statement
when it occurs more than once within a complex query.
See Also: Chapter 16, "Summary Advisor"
See Also: "Computation Using the WITH Clause" on page 18-30

Part I
Concepts
This section introduces basic data warehousing concepts.
It contains the following chapter:
s Data Warehousing Concepts

Data Warehousing Concepts 1-1
1
Data Warehousing Concepts
This chapter provides an overview of the Oracle data warehousing implementation.
It includes:
s What is a Data Warehouse?
s Data Warehouse Architectures
Note that this book is meant as a supplement to standard texts about data
warehousing. This book focuses on Oracle-speciﬁc material and does not reproduce
in detail material of a general nature. Two standard texts are:
s The Data Warehouse Toolkit by Ralph Kimball (John Wiley and Sons, 1996)
s Building the Data Warehouse by William Inmon (John Wiley and Sons, 1996)

What is a Data Warehouse?
1-2 Oracle9i Data Warehousing Guide
A data warehouse is a relational database that is designed for query and analysis
rather than for transaction processing. It usually contains historical data derived
from transaction data, but it can include data from other sources. It separates
analysis workload from transaction workload and enables an organization to
consolidate data from several sources.
In addition to a relational database, a data warehouse environment includes an
extraction, transportation, transformation, and loading (ETL) solution, an online
analytical processing (OLAP) engine, client analysis tools, and other applications
that manage the process of gathering data and delivering it to business users.
A common way of introducing data warehousing is to refer to the characteristics of
a data warehouse as set forth by William Inmon:
s Subject Oriented
s Integrated
s Nonvolatile
s Time Variant
Subject Oriented
Data warehouses are designed to help you analyze data. For example, to learn more
about your company’s sales data, you can build a warehouse that concentrates on
sales. Using this warehouse, you can answer questions like "Who was our best
customer for this item last year?" This ability to deﬁne a data warehouse by subject
matter, sales in this case, makes the data warehouse subject oriented.
Integrated
Integration is closely related to subject orientation. Data warehouses must put data
from disparate sources into a consistent format. They must resolve such problems
as naming conﬂicts and inconsistencies among units of measure. When they achieve
this, they are said to be integrated.
See Also: Chapter 10, "Overview of Extraction, Transformation,
and Loading"

Nonvolatile
Nonvolatile means that, once entered into the warehouse, data should not change.
This is logical because the purpose of a warehouse is to enable you to analyze what
has occurred.
Time Variant
In order to discover trends in business, analysts need large amounts of data. This is
very much in contrast to online transaction processing (OLTP) systems, where
performance requirements demand that historical data be moved to an archive. A
data warehouse’s focus on change over time is what is meant by the term time
variant.
Contrasting OLTP and Data Warehousing Environments
Figure 1–1 illustrates key differences between an OLTP system and a data
warehouse.
Figure 1–1 Contrasting OLTP and Data Warehousing Environments
One major difference between the types of system is that data warehouses are not
usually in third normal form (3NF), a type of data normalization common in OLTP
environments.
Few
Rare
Normalized
DBMS
Many
Indexes
Derived Data
and Aggregates
Duplicated
Data
Joins
Many
Complex data
structures
(3NF databases)
Multidimensional
data structures
OLTP Data Warehouse
Common
Denormalized
DBMS
Some

Data warehouses and OLTP systems have very different requirements. Here are
some examples of differences between typical data warehouses and OLTP systems:
s Workload
Data warehouses are designed to accommodate ad hoc queries. You might not
know the workload of your data warehouse in advance, so a data warehouse
should be optimized to perform well for a wide variety of possible query
operations.
OLTP systems support only predefined operations. Your applications might be
specifically tuned or designed to support only these operations.
s Data modifications
A data warehouse is updated on a regular basis by the ETL process (run nightly
or weekly) using bulk data modification techniques. The end users of a data
warehouse do not directly update the data warehouse.
In OLTP systems, end users routinely issue individual data modification
statements to the database. The OLTP database is always up to date, and reflects
the current state of each business transaction.
s Schema design
Data warehouses often use denormalized or partially denormalized schemas
(such as a star schema) to optimize query performance.
OLTP systems often use fully normalized schemas to optimize
update/insert/delete performance, and to guarantee data consistency.
s Typical operations
A typical data warehouse query scans thousands or millions of rows. For
example, "Find the total sales for all customers last month."
A typical OLTP operation accesses only a handful of records. For example,
"Retrieve the current order for this customer."
s Historical data
Data warehouses usually store many months or years of data. This is to support
historical analysis.
OLTP systems usually store data from only a few weeks or months. The OLTP
system stores only historical data as needed to successfully meet the
requirements of the current transaction.

Data Warehouse Architectures
Data warehouses and their architectures vary depending upon the speciﬁcs of an
organization's situation. Three common architectures are:
s Data Warehouse Architecture (Basic)
s Data Warehouse Architecture (with a Staging Area)
s Data Warehouse Architecture (with a Staging Area and Data Marts)
Data Warehouse Architecture (Basic)
Figure 1–2 shows a simple architecture for a data warehouse. End users directly
access data derived from several source systems through the data warehouse.
Figure 1–2 Architecture of a Data Warehouse
In Figure 1–2, the metadata and raw data of a traditional OLTP system is present, as
is an additional type of data, summary data. Summaries are very valuable in data
warehouses because they pre-compute long operations in advance. For example, a
typical data warehouse query is to retrieve something like August sales. A
summary in Oracle is called a materialized view.
WarehouseData Sources
Summary
Data
Raw Data
Metadata
Operational
System
Operational
System
Flat Files
Users
Analysis
Reporting
Mining

Data Warehouse Architecture (with a Staging Area)
In Figure 1–2, you need to clean and process your operational data before putting it
into the warehouse. You can do this programmatically, although most data
warehouses use a staging area instead. A staging area simpliﬁes building
summaries and general warehouse management. Figure 1–3 illustrates this typical
architecture.
Figure 1–3 Architecture of a Data Warehouse with a Staging Area
Operational
System
Data
Sources
Staging
Area Warehouse Users
Operational
System
Flat Files
Analysis
Reporting
Mining
Summary
Data
Raw Data
Metadata

Data Warehouse Architecture (with a Staging Area and Data Marts)
Although the architecture in Figure 1–3 is quite common, you may want to
customize your warehouse’s architecture for different groups within your
organization. You can do this by adding data marts, which are systems designed for
a particular line of business. Figure 1–4 illustrates an example where purchasing,
sales, and inventories are separated. In this example, a ﬁnancial analyst might want
to analyze historical data for purchases and sales.
Figure 1–4 Architecture of a Data Warehouse with a Staging Area and Data Marts
Note: Data marts are an important part of many warehouses, but
they are not the focus of this book.
See Also: Data Mart Suites documentation for further information
regarding data marts
Operational
System
Data
Sources
Staging
Area Warehouse
Data
Marts Users
Operational
System
Flat Files
Sales
Purchasing
Inventory
Analysis
Reporting
Mining
Summary
Data
Raw Data
Metadata

Part II
Logical Design
This section deals with the issues in logical design in a data warehouse.
It contains the following chapter:
s Logical Design in Data Warehouses

Logical Design in Data Warehouses 2-1
2
Logical Design in Data Warehouses
This chapter tells you how to design a data warehousing environment and includes
the following topics:
s Logical Versus Physical Design in Data Warehouses
s Creating a Logical Design
s Data Warehousing Schemas
s Data Warehousing Objects

Logical Versus Physical Design in Data Warehouses
Logical Versus Physical Design in Data Warehouses
Your organization has decided to build a data warehouse. You have defined the
business requirements and agreed upon the scope of your application, and created a
conceptual design. Now you need to translate your requirements into a system
deliverable. To do so, you create the logical and physical design for the data
warehouse. You then define:
s The specific data content
s Relationships within and between groups of data
s The system environment supporting your data warehouse
s The data transformations required
s The frequency with which data is refreshed
The logical design is more conceptual and abstract than the physical design. In the
logical design, you look at the logical relationships among the objects. In the
physical design, you look at the most effective way of storing and retrieving the
objects as well as handling them from a transportation and backup/recovery
perspective.
Orient your design toward the needs of the end users. End users typically want to
perform analysis and look at aggregated data, rather than at individual
transactions. However, end users might not know what they need until they see it.
In addition, a well-planned design allows for growth and changes as the needs of
users change and evolve.
By beginning with the logical design, you focus on the information requirements
and save the implementation details for later.
Creating a Logical Design
A logical design is conceptual and abstract. You do not deal with the physical
implementation details yet. You deal only with defining the types of information
that you need.
One technique you can use to model your organization's logical information
requirements is entity-relationship modeling. Entity-relationship modeling involves
identifying the things of importance (entities), the properties of these things
(attributes), and how they are related to one another (relationships).
The process of logical design involves arranging data into a series of logical
relationships called entities and attributes. An entity represents a chunk of

Data Warehousing Schemas
information. In relational databases, an entity often maps to a table. An attribute is
a component of an entity that helps define the uniqueness of the entity. In relational
databases, an attribute maps to a column.
To be sure that your data is consistent, you need to use unique identifiers. A unique
identifier is something you add to tables so that you can differentiate between the
same item when it appears in different places. In a physical design, this is usually a
primary key.
While entity-relationship diagramming has traditionally been associated with
highly normalized models such as OLTP applications, the technique is still useful
for data warehouse design in the form of dimensional modeling. In dimensional
modeling, instead of seeking to discover atomic units of information (such as
entities and attributes) and all of the relationships between them, you identify
which information belongs to a central fact table and which information belongs to
its associated dimension tables. You identify business subjects or fields of data,
define relationships between business subjects, and name the attributes for each
subject.
Your logical design should result in (1) a set of entities and attributes corresponding
to fact tables and dimension tables and (2) a model of operational data from your
source into subject-oriented information in your target data warehouse schema.
You can create the logical design using a pen and paper, or you can use a design
tool such as Oracle Warehouse Builder (specifically designed to support modeling
the ETL process) or Oracle Designer (a general purpose modeling tool).
A schema is a collection of database objects, including tables, views, indexes, and
synonyms. You can arrange schema objects in the schema models designed for data
warehousing in a variety of ways. Most data warehouses use a dimensional model.
The model of your source data and the requirements of your users help you design
the data warehouse schema. You can sometimes get the source model from your
company's enterprise data model and reverse-engineer the logical data model for
the data warehouse from this. The physical implementation of the logical data
See Also: Chapter 9, "Dimensions" for further information
regarding dimensions
See Also: Oracle Designer and Oracle Warehouse Builder
documentation sets

warehouse model may require some changes to adapt it to your system
parameters—size of machine, number of users, storage capacity, type of network,
and software.
Star Schemas
The star schema is the simplest data warehouse schema. It is called a star schema
because the diagram resembles a star, with points radiating from a center. The
center of the star consists of one or more fact tables and the points of the star are the
dimension tables, as shown in Figure 2–1.
Figure 2–1 Star Schema
The most natural way to model a data warehouse is as a star schema, only one join
establishes the relationship between the fact table and any one of the dimension
tables.
A star schema optimizes performance by keeping queries simple and providing fast
response time. All the information about each level is stored in one row.
Note: Oracle Corporation recommends that you choose a star
schema unless you have a clear reason not to.
customers
products
Dimension Table Dimension Table
channels
sales
(amount_sold,
quantity_sold)
times
Fact Table

Data Warehousing Objects
Other Schemas
Some schemas in data warehousing environments use third normal form rather
than star schemas. Another schema that is sometimes useful is the snowﬂake
schema, which is a star schema with normalized dimensions in a tree structure.
Fact tables and dimension tables are the two types of objects commonly used in
dimensional data warehouse schemas.
Fact tables are the large tables in your warehouse schema that store business
measurements. Fact tables typically contain facts and foreign keys to the dimension
tables. Fact tables represent data, usually numeric and additive, that can be
analyzed and examined. Examples include sales, cost, and profit.
Dimension tables, also known as lookup or reference tables, contain the relatively
static data in the warehouse. Dimension tables store the information you normally
use to contain queries. Dimension tables are usually textual and descriptive and
you can use them as the row headers of the result set. Examples are customers or
products.
Fact Tables
A fact table typically has two types of columns: those that contain numeric facts
(often called measurements), and those that are foreign keys to dimension tables. A
fact table contains either detail-level facts or facts that have been aggregated. Fact
tables that contain aggregated facts are often called summary tables. A fact table
usually contains facts with the same level of aggregation. Though most facts are
additive, they can also be semi-additive or non-additive. Additive facts can be
aggregated by simple arithmetical addition. A common example of this is sales.
Non-additive facts cannot be added at all. An example of this is averages.
Semi-additive facts can be aggregated along some of the dimensions and not along
others. An example of this is inventory levels, where you cannot tell what a level
means simply by looking at it.
See Also: Chapter 17, "Schema Modeling Techniques" for further
information regarding star and snowﬂake schemas in data
warehouses and Oracle9i Database Concepts for further conceptual
material

Creating a New Fact Table
You must define a fact table for each star schema. From a modeling standpoint, the
primary key of the fact table is usually a composite key that is made up of all of its
foreign keys.
Dimension Tables
A dimension is a structure, often composed of one or more hierarchies, that
categorizes data. Dimensional attributes help to describe the dimensional value.
They are normally descriptive, textual values. Several distinct dimensions,
combined with facts, enable you to answer business questions. Commonly used
dimensions are customers, products, and time.
Dimension data is typically collected at the lowest level of detail and then
aggregated into higher level totals that are more useful for analysis. These natural
rollups or aggregations within a dimension table are called hierarchies.
Hierarchies
Hierarchies are logical structures that use ordered levels as a means of organizing
data. A hierarchy can be used to define data aggregation. For example, in a time
dimension, a hierarchy might aggregate data from the month level to the quarter
level to the year level. A hierarchy can also be used to define a navigational drill
path and to establish a family structure.
Within a hierarchy, each level is logically connected to the levels above and below it.
Data values at lower levels aggregate into the data values at higher levels. A
dimension can be composed of more than one hierarchy. For example, in the
product dimension, there might be two hierarchies—one for product categories
and one for product suppliers.
Dimension hierarchies also group levels from general to granular. Query tools use
hierarchies to enable you to drill down into your data to view different levels of
granularity. This is one of the key benefits of a data warehouse.
When designing hierarchies, you must consider the relationships in business
structures. For example, a divisional multilevel sales organization.
Hierarchies impose a family structure on dimension values. For a particular level
value, a value at the next higher level is its parent, and values at the next lower level
are its children. These familial relationships enable analysts to access data quickly.

Levels A level represents a position in a hierarchy. For example, a time dimension
might have a hierarchy that represents data at the month, quarter, and year
levels. Levels range from general to specific, with the root level as the highest or
most general level. The levels in a dimension are organized into one or more
hierarchies.
Level Relationships Level relationships specify top-to-bottom ordering of levels from
most general (the root) to most specific information. They define the parent-child
relationship between the levels in a hierarchy.
Hierarchies are also essential components in enabling more complex rewrites. For
example, the database can aggregate an existing sales revenue on a quarterly base to
a yearly aggregation when the dimensional dependencies between quarter and year
are known.
Typical Dimension Hierarchy
Figure 2–2 illustrates a dimension hierarchy based on customers.
Figure 2–2 Typical Levels in a Dimension Hierarchy
See Also: Chapter 9, "Dimensions" and Chapter 22, "Query
Rewrite" for further information regarding hierarchies
region
customer
country_name
subregion

Datawarehouse

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (7)

Similaire à Datawarehouse

Similaire à Datawarehouse (20)

Dernier

Dernier (20)

Datawarehouse