2. Dimensional Models
A denormalized relational model
Made up of tables with attributes
Relationships defined by keys and foreign keys
Organized for understandability and ease of reporting
rather than update
Queried and maintained by SQL or special purpose
management tools.
2
3. From Relational to Dimensional
Relational Model
Designed from the
perspective of process
efficiency
Dimensional Model
Sales
Marketing
Sales
Customers
“Normalised” data
structures
Entity Relationship Model
Used for transactional, or
operational systems
Based on data that is
Current
Non Redundant
“De-normalised” data
structures in blatant
violation of normalisation
Used for analysis of
aggregated data
OLAP : OnLine Analytical
Processing
OLTP : OnLine Transaction
Processing
Designed from the
perspective of subject
Based on data that is
Historical
May be redundant
3
4. ER vs. Dimensional Models
One table per entity
Minimize data
redundancy
Optimize update
The Transaction
Processing Model
One fact table for
data organization
Maximize
understandability
Optimized for
retrieval
The data
warehousing model
4
5. Strengths of the Dimensional Model
Predictable, standard
framework
Respond well to changes in
user reporting needs
Relatively easy to add data
without reloading tables
Standard design approaches
have been developed
There exist a number of
products supporting the
dimensional model
“The Data Warehouse
Toolkit” by Ralph
Kimball & Margy Ross
“The Data Warehouse
Lifecycle Toolkit” by
Ralph Kimball & Margy
Ross
5
9. Facts & Dimensions
• There are two main types of objects in a dimensional
model
– Facts are quantitative measures that we wish to analyse and
report on.
– Dimensions contain textual descriptors of the business. They
provide context for the facts.
9
10. Fact & Dimension Tables
FACTS
DIMENSIONS
Contains two or more
foreign keys
Contain text and
descriptive information
Tend to have huge
numbers of records
1 in a 1-M relationship
Useful facts tend to be
numeric and additive
Generally the source of
interesting constraints
Typically contain the
attributes for the SQL
answer set.
10
11. Fact Table
Measurements associated with a specific business
process
Grain: level of detail of the table
Process events produce fact records
Facts (attributes) are usually
Numeric
Additive
Derived facts included
Foreign (surrogate) keys refer to dimension tables
(entities)
Classification values help define subsets
11
12. Dimension Tables
Entities describing the objects of the process
Conformed dimensions cross processes
Attributes are descriptive
Text
Numeric
Surrogate keys
Less volatile than facts (1:m with the fact table)
Null entries
Date dimensions
Produce “by” questions
12
14. Business Model
As always in life, there are some disadvantages
to 3NF:
Performance can be truly awful. Most of the
work that is performed on denormalizing a data
model is an attempt to reach performance
objectives.
The structure can be overwhelmingly complex.
We may wind up creating many small relations
which the user might think of as a single
relation or group of data.
14
15. The 4 Step Design Process
Choose the Data Mart
Declare the Grain
Choose the Dimensions
Choose the Facts
15
16. Structural Dimensions
The first step is the development of the
structural dimensions. This step corresponds
very closely to what we normally do in a
relational database.
The star architecture that we will develop here
depends upon taking the central intersection
entities as the fact tables and building the
foreign key => primary key relations as
dimensions.
16
17. Steps in dimensional modeling
Select an associative entity for a fact table
Determine granularity
Replace operational keys with surrogate keys
Promote the keys from all hierarchies to the fact table
Add date dimension
Split all compound attributes
Add necessary categorical dimensions
Fact (varies with time) / Attribute (constant)
17
18. The Big Picture
Customer ID
Cust Name
Cust Address
Order ID
Customer ID (FK)
Date
Order ID (FK)
Item ID
Product ID (FK)
Quantity
Value
Product ID
Product Name
Product Desc
Unit Price
OLTP
OLAP
Customer ID
Cust Name
Cust Address
Transaction ID
Product ID (FK)
Client ID (FK)
Date
Quantity
Value
Product ID
Product Name
Product Desc
Unit Price
18
19. Converting an E-R Diagram
Determine the purpose of the mart
Identify an association table as the central fact
table
Determine facts to be included
Replace all keys with surrogate keys
Promote foreign keys in related tables to the
fact table
Add time dimension
Refine the dimension tables
19
20. Fact Tables
Represent a process or reporting environment that is of
value to the organization
It is important to determine the identity of the fact table
and specify exactly what it represents.
Typically correspond to an associative entity in the E-R
model
20
21. Grain (unit of analysis)
The grain determines what each fact record represents:
the level of detail.
For example
Individual transactions
Snapshots (points in time)
Line items on a document
Generally better to focus on the smallest grain
21
22. Facts
Measurements associated with fact table records at fact
table granularity
Normally numeric and additive
Non-key attributes in the fact table
Attributes in dimension tables are constants. Facts vary with
the granularity of the fact table
22
23. Dimensions
A table (or hierarchy of tables) connected with the
fact table with keys and foreign keys
Preferably single valued for each fact record
(1:m)
Connected with surrogate (generated) keys, not
operational keys
Dimension tables contain text or numeric
attributes
23
26. Date Dimensions
Fiscal Year
Calendar Year
Fiscal Quarter
Calendar
Quarter
Fiscal Month
Calendar
Month
Fiscal Week
Calendar
Week
Type of Day
Day of Week
Day
Holiday
26
27. Attribute Name
Attribute Description
Day
The specific day that an activity took
place.
Day of Week
The specific name of the day.
Holiday
Identifies that this day is a holiday.
Type of Day
Indicates whether or not this day is
a weekday or a weekend day.
Calendar Week
The week ending date, always a
Saturday. Note that WE denotes
Calendar Month
The calendar month.
Calendar Quarter
Calendar Year
Fiscal Week
Fiscal Month
Fiscal Quarter
Fiscal Year
Sample Values
06/04/1998; 06/05/1998
Monday; Tuesday
Easter; Thanksgiving
Weekend; Weekday
WE 06/06/1998;
WE 06/13/1998
January,1998; February,
1998
The calendar quarter.
1998Q1; 1998Q4
The calendar year.
1998
The week that represents the
F Week 1 1998;
corporate calendar. Note that the F F Week 46 1998
The fiscal period comprised of 4 or 5 F January, 1998;
weeks. Note that the F in the data
F February, 1998
The grouping of 3 fiscal months.
F 1998Q1; F1998Q2
The grouping of 52 fiscal weeks / 12 F 1998; F 1999
fiscal months that comprise the
financial year.
27
31. Slowly Changing Dimensions
(Addresses, Managers, etc.)
Type 1: Store only the current value, overwrite
previous value
Type 2: Create a dimension record for each value (with
or without date stamps)
Type 3: Create an attribute in the dimension record for
previous value
31
33. Type 1 Slowly Changing Dimension
The simplest form
Only updates existing records
Overwrites history
33
34. Type 1 Slowly Changing Dimension
CustomerID
Code
Name
State Gender
1
K001
Miranda Kerr
VIC
NSW
F
34
35. Type 2 Slowly Changing Dimension
Allows the recording of changes of state over time
Generates a new record each time the state changes
Usually requires the use of effective dates when joining
to facts.
35
36. Type 2 Slowly Changing Dimension
CustomerID
Code
Name
State Gender Start
End
1
K001
Miranda Kerr NSW
F
1/1/09
23/2/09
<NULL>
2
K001
Miranda Kerr VIC
F
24/2/09
<NULL>
36
37. Type 3 Slowly Changing Dimension
De-normalized change tracking
Only keeps a limited history
Stores changes in separate columns
37
38. Type 3 Slowly Changing Dimension
CustomerID Code Name
1
K001
Miranda Kerr
Current Gender Prev
State
State
NSW
F
<NULL>
VIC
38
Notes de l'éditeur
A simplistic transactional schema showing 7 tables relating to sales orders
This is a star schema, (later on we will discuss snowflake schemas.) showing 4 tables that relate to the previous transactional schema
State and Country have been denormalized under Customer
Dimensions are in Blue
These are the things that we analyse “by” (eg. By Time, By Customer, By Region)
Fact is yellow
These are ususally quantitative things that we are interested in
We already have the data in a data model – why create another data model…? Well…
What is currently called “Data Warehousing” or “Business Intelligence” was originally often called “Decision Support Systems”
We already have all the data in the OLTP system, why replicate it in a dimensional model?
Atomic - Summary
Supports Transaction throughput – Supports Aggregate queries
Current - Historic
Facts work best if they are additive
Dimensions allow us to “slice & dice” the facts into meaningful groups. The provide context
There are some changes where it is valid to overwrite history. When someone gets married and changes their name, they may want to carry the history of their previous purchases over to their new name rather than see a split history.
This makes inserts into your fact table more expensive as you always need to match on the effective dates as well as the business key. Sometimes people kept a “Current” flag. Another approach rather than putting nulls in the End date is to put an arbitrary date well in the future, this can make the join logic a bit simpler.
This type of change tracking is more useful when there is a once off change like a change in sales regions where you want to see history re-cast into the new regions, but may also want to compare the old and new regions.