1. AN INTRODUCTION TO DIMESIONAL DATA
MODELLING
Ashish Chandwani
Intern – Nationwide Insurance
School – University of Maryland, College Park
2. CONTENTS
Overview of Data Warehouse
Introductionto Dimensional Modelling
Elementsof Dimensional Model
Designing a Dimensional Data Model
Types of Schema
Dimensional Data Model vs Relational Data Model
3. Data Warehouse
Central Repositoriesof Integrated Data
from one or more diverse sources.
Store current and historicaldata.
Sometimes referred to as Enterprise Data
Warehouse
Often data is collected from multiple
sources within and outside the
organization andprocesses are deployed
involving cleansing and data integrity.
DW is used for reporting and analysis.
4. Introduction to Dimensional Modelling
Dimensional Modeling is a technique
for database design.
Important for supporting end user
queries relating to business
transactions.
Intended to support analysis and
reporting. Contains business attribute
tables (dimensions) and business
transaction tables(facts/measures).
Used as basis for OLAP(Online
Analytical Processing) cubes.
5. Elements of Dimensional Data Model
Dimensions Table:
Collection of reference information about a
business. Eg: Location, Product and Date are
dimensions for certain metrics of
organizations like Nationwide Insurance.
Each dimension table contains attributes
which describe the details of the dimension.
Eg: Product dimensions can contain product
name, type, price.
Each dimension table may also contain
hierarchies. For eg: Location dimension can
contain location name, location city, location
state, location country.
Fact Table
Measurable events for which dimension
table data is collected and is used for
analysis and reporting.
Facts tables could contain information
like sales against a set of dimensions like
Location, Product and Date.
Primary Key in Dimensional Models are
mapped as foreign keys in the Fact
Tables.
Usually these keys are Surrogate Keys.
Dimensions contain the context for the business problems and facts are
the measures for those contexts.
6. Surrogate Keys
Before moving along to understand how to design dimensional models, it’s important to
understand the concept of Surrogate Keys.
A surrogate key is an unintelligent/dumb key which is not derived from application data like
natural keys.
Surrogate key is artificially derived to cover regular changes with in the fact and dimension
tables.
It is usually an incremental key with values from 1 to N against each row entry in the data
warehouse tables.
7. Why Surrogate Keys
Avoid backend application data key conflicts.
Consistency among dimension keys as different backend application may use
different columns as keys.
Covers the data warehouse for changes in the backend application data.
Implement history of slowly changing dimensions.
Usually surrogate keys are integers and not characters.
Surrogate keys are also used for recycling data as per business requirements.
8. Designing a Dimensional Model
Understand the business problem = Most Important.
Basically while designing a data model solution you should be able to
answer : Why, How Much , When/Where/Who, What
Designing Dimensional Models typically involves the following steps:
Choose
the
Business
Process
• Why
Declare
the
Grain
• How Much
Identify
the
Dimensi
on
• 3Ws
Identify
the Fact
• What
9. Choose the Business Process
The actual business processes the data warehouse should cover.
Describe the problem on which/for which models should be built
on.
This is the “why” of building a data model.
Here is a sample business process :-
The Senior Executives at Nationwide want to determine
the sales for certain products in different location for a particular
time period.
10. Declare the Grain
The Grain describes the level of detail needed for the business
problem/solution.
Lowest level of information stored in any table.
This is the “How much” of building a data model.
Sample Grain:
The Senior Executives at Nationwide want to determine the sales
for certain products in different locations for every week.
So the grain is “by product by location by week”.
11. Identify the Dimension
Dimensions are the reference information for the business.
Contains dimension tables with their attributes(columns) and
hierarchies.
This is the “When, Where and Who” of building a data model
Sample Dimensions:
The Senior Executives at Nationwide want to determine
the sales for certain products in different locations for a
particular time period.
Dimensions here are : - Products, Location and Time
Dimension Attributes :- For Product - Product key(surrogate key),
Product Name, Product specs, Product type.
Dimension Hierarchies : - For location – location country, location
city, location street, location name
12. Identify the Fact
Measurable events for Dimensions.
This is the “What” of building a data model
Sample Facts:
The Senior Executives at Nationwide want to determine
the sales for certain products in certain locations for a particular
time period.
Fact here is :- Sum of Sales by product by location by time.
13. Types of Dimensional Model Schemas
A star schema is the one in which a
central fact table is surrounded by
denormalized dimensional tables. A
star schema can be simple or
complex. A simple star schema
consists of one fact table where as a
complex star schema have more
than one fact table.
A snow flake schema is an
enhancement of star schema by
adding additional dimensions. Snow
flake schema are useful when there
are low cardinality attributes in the
dimensions.
Star Schema Snowflake Schema
14. Differences between Star and Snowflake
Schema
Property Star Schema Snowflake Schema
Ease of maintenance /
change
Easy to maintain due to
low redundancy.
Difficult to maintain due to
high redundancy.
Facts and Dimension
Properties
Dimension Tables are
normalized, Fact tables are
denormalized
Dimension Tables and
tables are denormalized
Ease of Use Difficult to understand to
due to increased queries
Easier to understand due
to simple queries
Query Performance Poor, due to increased
complexity in
joins.(increased foreign
keys)
Good, less
complexity.(Less foreign
keys).
Type of Data warehouse Complex Relations ( Many
to Many)
Simple Relations ( One to
One/ One to Many)
When to use Greater size of dimension
tables, snowflake schema
helps reduce space.
Smaller size of dimension
tables.
15. Slowly Changing Dimensions
Sometimes the attribute information with in the dimensions might be
altered to correspond to business decisions/rules.
Hence dimension information would be altered which has to be
accounted for in the data model.
The changes in the dimension are unpredictable rather than
changing over a fixed schedule.
These are Slowly Changing Dimensions.
16. Illustration of Slowly Changing Dimensions -
I
Lets consider our example:
The Senior Executives at Nationwide want to determine the sales for
certain products in certain locations for a particular time period.
Consider the Product Dimension:
Product Key Product Name Product Type Product Price
1 Nationwide
Personal
PL $10
2 Nationwide
Commercial
CL $25
3 Nationwide Pet PL $35
Let us consider the company decided tomorrow that Nationwide pet should be classified as
others instead of PL or decided to change the price of Nationwide Personal from $10 to
$15?
How will that affect the analysis and reporting and how do we account for such changes?
Do we keep the old historical data or we insert the new data directly?
17. Methodologies for Handling Slowly
Changing Dimensions
Type 1- No need to track historical data simply overwrite
the existing data with the new one. (No history)
Type 2 – Historical data should be tracked. Create a
new row for the natural key but with a different surrogate
key. ( Full History)
Type 3 – Historical data should be partially tracked.
Between Type 1 and Type 2. Insert additional columns
to track current and last state of the changing attribute.
(Partial History).
18. Illustrations of SCD Handling Methodologies
Lets take a product dimension. Product Type Changes for Nationwide Pet from PL to
others.
Type1:
Type2:
Type3:
Product Key Product Name Product Type Product Price
3 Nationwide Pet PL $35
Product Key Product Name Product Type Product Price
3 Nationwide Pet Others $35
Product
Key
Product Name Produc
t Type
Product
Price
Effective
Date
Expiry
Date
Latest_Ind
3 Nationwide Pet PL $35 01-01-2000 08-10-2015 N
4 Nationwide Pet Others $35 08-11-2015 12-31-9999 Y
Product Key Product Name Product
Type_Old
Product Price Product
Type_New
3 Nationwide Pet PL $35 Others
19. Relational vs Dimensional Models
Relational Data Models Dimensional Data Models
Units of storage are tables. Units of Storage are Cubes
Data is Normalized. Data is Denormalized.
Detailed Level of Transaction. Aggregates and Measures used for
Business.
Volatile and Time Variant. Non Volatile and Time Invariant.
Used for OLTP. Used for OLAP Cubes
Normal Reports. Interactive, user friendly reports.