3. 9/23/2010 3
Star SchemaStar Schema
ØØFact tableFact table
ØØDimensionsDimensions
ØØDrilling Down & Roll upDrilling Down & Roll up
ØØSlicing & DicingSlicing & Dicing
4. 9/23/2010 4
Fact
• Definition : Facts are numeric measurements (values) that
represent a specific business activity.
Facts are stored in a FACT table I.e. the center of the
star schema.
Facts are used in business data analysis, are units,
cost, prices and revenues
• Example: sales figures are numeric measurements that
represent product and/or service sales.
5. 9/23/2010 5
Fact Table
Central table
– Mostly raw numeric items
– Narrow rows, a few columns at most
– Large number of rows (millions to a billion)
– Access via dimensions
6. 9/23/2010 6
Fact Table
Definition :The centralized table in a star schema is called
as FACT table, that contains facts and connected to
dimensions. A fact table typically has two types of
columns:
Ø Contain facts and
Ø Foreign keys to dimension tables.
Ø The primary key of a fact table is usually a
composite key that is made up of all of its foreign
keys.
A fact table might contain either detail level
facts or facts that have been aggregated (fact tables
that contain aggregated facts are often instead
called summary tables). A fact table usually contains
facts with the same level of aggregation.
7. 9/23/2010 7
Dimension
• Definition : Qualifying characteristics that provide
additional perspective to a given fact.
• Example: sales might be compared by product from
region to region and from one time period to the
next.
Here sales have product, location and time dimensions.
Such dimensions are stored in DIMENSIONAL TABLE.
8. 9/23/2010 8
Dimension Tables
• Definition: The dimensions of the fact table are
further described with dimension tables
• Fact table:
Sales (Market_id, Product_Id, Time_Id, Sales_Amt)
• Dimension Tables:
Market (Market_Id, City, State, Region)
Product (Product_Id, Name, Category, Price)
Time (Time_Id, Week, Month, Quarter)
9. 9/23/2010 9
Definition: Star Schema is a relational database schema for
representing multidimensional data. It is the simplest form
of data warehouse schema that contains one or more
dimensions and fact tables.
• It is called a star schema because the entity-
relationship diagram between dimensions and fact tables
resembles a star where one fact table is connected to
multiple dimensions.
• The center of the star schema consists of a large
fact table and it points towards the dimension tables.
• The advantage of star schema are slicing down, performance
increase and easy understanding of data.
What is Star Schema?
10. 9/23/2010 10
Steps in designing Star Schema
Ø Identify a business process for analysis(like sales).
Ø Identify measures or facts (sales dollar).
Ø Identify dimensions for facts(product dimension, location
dimension, time dimension, organization dimension).
Ø List the columns that describe each dimension.(region name,
branch name, region name).
Ø Determine the lowest level of summary in a fact table(sales
dollar).
Ø In a star schema every dimension will have a primary key.
Ø In a star schema, a dimension table will not have any parent
table.
• Whereas in a snow flake schema, a dimension table will have
one or more parent tables.
Ø Hierarchies for the dimensions are stored in the dimensional
table itself in star schema.
Ø Whereas hierarchies are broken into separate tables in snow
flake schema. These hierarchies helps to drill down the data
from topmost hierarchies to the lowermost hierarchies.
11. 9/23/2010 11
Attributes
• Each dimension table contain attributes.
• Used to search, filter and classify facts.
• Example, Sales, we can identify some attributes for
each dimension:
– Product Dimension: product ID, description, product
type
– Location Dimension: region, state, city.
– Time Dimension: year quarter, month, week and date.
12. 9/23/2010 12
Attributes Hierarchy
•Definition : AH provides a top-down data organization
•Used for aggregation and drill-down/roll-up data
analysis.
•Example, location dimension attributes can be organized in a
hierarchy by region, state and city.
•AH provides the capability to perform drill-down and roll-up
searches.
•Allows the DW and OLAP systems to to have defined path.
13. 9/23/2010 13
A Concept Hierarchy: Dimension (location)
all
Europe North_America
MexicoCanadaSpainGermany
Vancouver
M. WindL. Chan
...
......
... ...
...
all
region
office
country
TorontoFrankfurtcity
14. 9/23/2010 14
A Concept Hierarchy: Dimension (location)
The Adventuresof
HuckleberryFinn
FictionAudiobooksBooks
Winnie The PoohChildrensAudiobooksBooks
The HobbitChildrensAudiobooksBooks
Wild Swans:Three
Daughtersof China
BiographiesAudiobooksBooks
High Top AlmondsArchitectureArtsand MusicBooks
Product Name
Product
Category
Product FamilyProduct Line
Product_Line->Product_Family->Product_Category->Product_Name
15. 9/23/2010 15
Multidimensional Data
• Sales volumeas afunction of product,
month, and region
ProductRegion
Month
Dimensions: Product, Location, Time
Hierarchical summarization paths
Industry Region Year
Category Country Quarter
Product City Month Week
Office Day
16. 9/23/2010 16
A Sample Data Cube
Total annual sales
of TV in U.S.A.
Date
Product
Country
sum
sum
TV
VCR
PC
1Qtr 2Qtr 3Qtr 4Qtr
U.S.A
Canada
Mexico
sum
17. 9/23/2010 17
A Sample Data Cube
Total annual sales
of TV in U.S.A.
Date
Product
Country
sum
sumTV
VCR
PC
1Qtr 2Qtr3Qtr 4Qtr
U.S.A
Canada
Mexico
sum
Illnois
300Ohio
Texas
California
New York
Mac
Qtr4Qtr3Qtr2Qtr1
3466346634663466Illnois
6633663366336633Ohio
63446634466344663446Texas
200200200200California
1000100010001000New York
John
SalesSalesSalesSales
Qtr4Qtr3Qtr2Qtr1Sales Manager
Essbase
18. 9/23/2010 18
Star Schema
• A single fact tableand
for each dimension
onedimension table
• Doesnot capture
hierarchiesdirectly
20. 9/23/2010 20
In the example, sales fact table is connected to
dimensions location, product, time and organization.
It shows that data can be sliced across all
dimensions and again it is possible for the data to
be aggregated across multiple dimensions. "Sales
dollar" in sales fact table can be calculated across
all dimensions independently or in a combined manner
which is explained below.
Ø Sales dollar value for a particular product
Ø Sales dollar value for a product in a location
Ø Sales dollar value for a product in a year within a
location
Ø Sales dollar value for a product in a year within a
location sold or serviced by an employee
21. 9/23/2010 21
Example of Star Schema
•time_key
day
day_of_the_week
month
quarter
year
time
location_key
street
city
province_or_street
country
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_key
item_name
brand
type
supplier_type
item
branch_key
branch_name
branch_type
branch
22. 9/23/2010 22
Aggregation
• Many OLAP queries involve aggregation of the data in
the fact table
• For example, to find the total sales (over time) of
each product in each market, we might use
SELECT S.Market_Id, S.Product_Id, SUM
(S.Sales_Amt)
FROM Sales S
GROUP BY S.Market_Id, S.Product_Id
• The aggregation is over the entire time dimension and
thus produces a two-dimensional view of the data
23. 9/23/2010 23
Aggregation Over Time
The output of the previous query
………P5
…70007503P4
…34503P3
…24026003P2
…15033003P1
M4M3M2M1
SUM(Sales_Amt)
Market_Id
Product_Id
24. 9/23/2010 24
Typical OLAP Operations
• Roll up (drill-up): summarize data
– by climbing up hierarchy or by dimension reduction
• Drill down (roll down): reverse of roll-up
– from higher level summary to lower level summary or
detailed data, or introducing new dimensions
• Slice and dice:
– project and select
• Pivot (rotate):
– reorient the cube, visualization, 3D to series of 2D
planes.
• Other operations
– drill across: involving (across) more than one fact table
– drill through: through the bottom level of the cube to its
back-end relational tables (using SQL)
25. 9/23/2010 25
Drilling Down and Rolling Up
• Some dimension tables form an aggregation hierarchy
Market_Id ® City ® State ® Region
• Executing a series of queries that moves down a
hierarchy (e.g., from aggregation over regions to
that over states) is called drilling down
– Requires the use of the fact table or information
more specific than the requested aggregation (e.g.,
cities)
• Executing a series of queries that moves up the
hierarchy (e.g., from states to regions) is called
rolling up
26. 9/23/2010 26
• Drilling down on market: from Region to State
Sales (Market_Id, Product_Id, Time_Id, Sales_Amt)
Market (Market_Id, City, State, Region)
– SELECT S.Product_Id, M.Region, SUM (S.Sales_Amt)
FROM Sales S, Market M
WHERE M.Market_Id = S.Market_Id
GROUP BY S.Product_Id, M.Region
– SELECT S.Product_Id, M.State, SUM (S.Sales_Amt)
FROM Sales S, Market M
WHERE M.Market_Id = S.Market_Id
GROUP BY S.Product_Id, M.State,
Drilling Down
27. 9/23/2010 27
Rolling Up
• Rolling up on market, from State to Region
– If we have already created a table, State_Sales, using
1. SELECT S.Product_Id, M.State, SUM
(S.Sales_Amt)
FROM Sales S, Market M
WHERE M.Market_Id = S.Market_Id
GROUP BY S.Product_Id, M.State
then we can roll up from there to:
2. SELECT T.Product_Id, M.Region, SUM
(T.Sales_Amt)
FROM State_Sales T, Market M
WHERE M.State = T.State
GROUP BY T.Product_Id, M.Region
28. 9/23/2010 28
Roll-up and Drill Down
Ø Sales Channel
Ø Region
Ø Country
Ø State
Ø Location Address
Ø Sales
Representative
RollUp
Higher Level of
Aggregation
Low-level
Details
Drill-Down
29. 9/23/2010 29
“Slicing and Dicing”
Product
Sales Channel
Regions
Retail Direct Special
Household
Telecomm
Video
Audio India
Far East
Europe
The Telecomm Slice
30. 9/23/2010 30
Snowflake Schema
A snowflake schema is a term that
describes a star schema structure normalized
through the use of outrigger tables. i.e
dimension table hierarchies are broken into
simpler tables. In star schema example we had
4 dimensions like location, product, time,
organization and a fact table (sales)
31. 9/23/2010 31
Snowflake schema
• Represent dimensional hierarchy directly by
normalizing tables.
• Easy to maintain and saves storage
33. 9/23/2010 33
Example of Snowflake Schema
time_key
day
day_of_the_week
month
quarter
year
time
location_key
street
city_key
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_key
item_name
brand
type
supplier_key
item
branch_key
branch_name
branch_type
branch
supplier_key
supplier_type
supplier
city_key
city
province_or_street
country
city