SlideShare a Scribd company logo
1 of 17
Lecture-12
Dimensional Modeling (DM)
By Mamuna Fatima
1
 Problems with early COBOLian data processing
systems.
 Data redundancies
 From flat file to Table, each entity ultimately becomes
a Table in the physical schema.
 Simple O(n2
) Join to work with Tables
2
◦ Coupled with normalization drives out all
the redundancy out of the database.
◦ Change (or add or delete) the data at just
one point.
◦ Can be used with indexing for very fast
access.
◦ Resulted in success of OLTP systems.
3
 Lets have a look at a typical ER data model first.
 Some Observations:
◦ All tables look-alike, as a consequence it is difficult to
identify:
 Which table is more important ?
 Which is the largest?
 Which tables contain numerical measurements of the business?
 Which table contain nearly static descriptive attributes?
4
◦ Many topologies for the same ER diagram,
all appearing different.
 Very hard to visualize and remember.
 A large number of possible connections to any
two (or more) tables
5
1
10
3
12
2
6
5
11 4
7
8
9
1
10
3
12
2
6
5
11
4
7
8
9
 The Paradox: Trying to make information
accessible using tables resulted in an inability to
query them!
 ER and Normalization result in large number of tables
which are:
◦ Hard to understand by the users (DB programmers)
◦ Hard to navigate optimally by DBMS software
 Real value of ER is in using tables individually or in
pairs
 Too complex for queries that span multiple tables with
a large number of records
6
ER DM
Constituted to optimize OLTP
performance.
Constituted to optimize DSS
query performance.
Models the micro relationships
among data elements.
Models the macro
relationships among data
elements with an overall
deterministic strategy.
A wild variability of the
structure of ER models.
All dimensions serve as
equal entry points to the
fact table.
Very vulnerable to changes in
the user's querying habits,
because such schemas are
asymmetrical.
Changes in users' querying
habits can be
accommodated by
automatic SQL generators.
7
Two general methods:
◦ De-Normalization
◦ Dimensional Modeling (DM)
8
 A simpler logical model optimized for decision
support.
 Inherently dimensional in nature, with a single
central fact table and a set of smaller
dimensional tables.
 Multi-part key for the fact table
 Dimensional tables with a single-part PK.
 Keys are usually system generated
9
Data cubes
Dimension Table Dimension Table
Fact Table
...
 Results in a star like structure, called star schema
or star join.
◦ All relationships mandatory M-1.
◦ Single path between any two levels.
 Supports ROLAP operations.
11
12
Items
Books Cloths
Fiction Text Men Women
MedicalEngg
Analysts tend to look at the data through dimension at aAnalysts tend to look at the data through dimension at a
particular “level” in the hierarchyparticular “level” in the hierarchy
13
Star
Snow-flake
14
CITY DISTRICT
1
ZONE CITY
DISTRICTDIVISION
MONTH QTR
STORE # STREET ZONE ...
WEEK MONTH
DATE WEEK
RECEIPT #STORE # DATE ...
ITEM #RECEIPT # ... $
ITEM # CATEGORY
ITEM #
DEPTCATEGORY
year
month
week
sale_header
store
sale_detail
item_x_cat
item_x_splir
cat_x_dept
M
1
M
1M
1
M
1
1
M M
1
M
M M1
1
M
1
1
M
YEAR QTR
1
M
quarter
SUPPLIER
DIVISIONPROVINCEM
1 BACK
division
district
zone
15
RECEIPT#
STORE#
DATE
ITEM# M
Fact Table
ITEM#
CATEGORY
DEPT
SUPPLIER
Product Dim
M
Sale Rs.
M
STORE#
ZONE
CITY
PROVINCE
Geography Dim
DISTRICT
DATE
WEEK
QUARTER
YEAR
Time Dim
MONTH
.
.
.
1
1
1
facts
DIVISION
16
Beauty lies in close correspondence
with the business, evident even to
business users.
Dimensional hierarchies are collapsed into a single
table for each dimension. Loss of Information?
A single fact table created with a single header from the
detail records, resulting in:
◦ A vastly simplified physical data model!
◦ Fewer tables (thousands of tables in some ERP systems).
◦ Fewer joins resulting in high performance.
◦ Some requirement of additional space.
17

More Related Content

What's hot

Basic Statistics (MEAN)
Basic Statistics (MEAN)Basic Statistics (MEAN)
Basic Statistics (MEAN)Shahirah Aziz
 
Sql interview q&a
Sql interview q&aSql interview q&a
Sql interview q&aSyed Shah
 
Normalization
NormalizationNormalization
Normalizationochesing
 
Deep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problemsDeep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problemsColleen Farrelly
 
Spreadsheet ml subject xml-mapping
Spreadsheet ml subject   xml-mappingSpreadsheet ml subject   xml-mapping
Spreadsheet ml subject xml-mappingShawn Villaron
 
Knowledge And Patterns
Knowledge And PatternsKnowledge And Patterns
Knowledge And PatternsDavid Wilson
 
Functions of ms excel 2003
Functions of ms excel 2003Functions of ms excel 2003
Functions of ms excel 2003gaurav jain
 
High-Dimensional Data Visualization, Geometry, and Stock Market Crashes
High-Dimensional Data Visualization, Geometry, and Stock Market CrashesHigh-Dimensional Data Visualization, Geometry, and Stock Market Crashes
High-Dimensional Data Visualization, Geometry, and Stock Market CrashesColleen Farrelly
 
Dbms Interview Question And Answer
Dbms Interview Question And AnswerDbms Interview Question And Answer
Dbms Interview Question And AnswerJagan Mohan Bishoyi
 
Efficient Database Design for Banking System
Efficient Database Design for Banking SystemEfficient Database Design for Banking System
Efficient Database Design for Banking SystemS.M. Murad Hasan Tanvir
 

What's hot (20)

Spss vs excel
Spss vs excelSpss vs excel
Spss vs excel
 
Advanced Excel, Day 4
Advanced Excel, Day 4Advanced Excel, Day 4
Advanced Excel, Day 4
 
Basic Statistics (MEAN)
Basic Statistics (MEAN)Basic Statistics (MEAN)
Basic Statistics (MEAN)
 
Morse-Smale Regression
Morse-Smale RegressionMorse-Smale Regression
Morse-Smale Regression
 
Sql interview q&a
Sql interview q&aSql interview q&a
Sql interview q&a
 
Normalization
NormalizationNormalization
Normalization
 
Deep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problemsDeep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problems
 
Access 05
Access 05Access 05
Access 05
 
Excel
Excel Excel
Excel
 
Spreadsheets 101
Spreadsheets 101Spreadsheets 101
Spreadsheets 101
 
Spreadsheet ml subject xml-mapping
Spreadsheet ml subject   xml-mappingSpreadsheet ml subject   xml-mapping
Spreadsheet ml subject xml-mapping
 
Knowledge And Patterns
Knowledge And PatternsKnowledge And Patterns
Knowledge And Patterns
 
Functions of ms excel 2003
Functions of ms excel 2003Functions of ms excel 2003
Functions of ms excel 2003
 
Database design process
Database design processDatabase design process
Database design process
 
High-Dimensional Data Visualization, Geometry, and Stock Market Crashes
High-Dimensional Data Visualization, Geometry, and Stock Market CrashesHigh-Dimensional Data Visualization, Geometry, and Stock Market Crashes
High-Dimensional Data Visualization, Geometry, and Stock Market Crashes
 
Dbms Interview Question And Answer
Dbms Interview Question And AnswerDbms Interview Question And Answer
Dbms Interview Question And Answer
 
Efficient Database Design for Banking System
Efficient Database Design for Banking SystemEfficient Database Design for Banking System
Efficient Database Design for Banking System
 
Data pre processing
Data pre processingData pre processing
Data pre processing
 
8 system models (1)
8 system models (1)8 system models (1)
8 system models (1)
 
Database aggregation using metadata
Database aggregation using metadataDatabase aggregation using metadata
Database aggregation using metadata
 

Similar to Lecture on Dimensional Modeling for Data Warehouse Design

Intro to Data warehousing lecture 08
Intro to Data warehousing   lecture 08Intro to Data warehousing   lecture 08
Intro to Data warehousing lecture 08AnwarrChaudary
 
Intro to Data warehousing Lecture 04
Intro to Data warehousing   Lecture 04Intro to Data warehousing   Lecture 04
Intro to Data warehousing Lecture 04AnwarrChaudary
 
Dwh lecture 07-denormalization
Dwh   lecture 07-denormalizationDwh   lecture 07-denormalization
Dwh lecture 07-denormalizationSulman Ahmed
 
When & Why\'s of Denormalization
When & Why\'s of DenormalizationWhen & Why\'s of Denormalization
When & Why\'s of DenormalizationAliya Saldanha
 
Dwh lecture-07-denormalization
Dwh lecture-07-denormalizationDwh lecture-07-denormalization
Dwh lecture-07-denormalizationSulman Ahmed
 
Lecture 13
Lecture 13Lecture 13
Lecture 13Shani729
 
Distributed database
Distributed databaseDistributed database
Distributed databaseNasIr Irshad
 
Normalization in Database
Normalization in DatabaseNormalization in Database
Normalization in DatabaseA. S. M. Shafi
 
Relational Theory for Budding Einsteins -- LonestarPHP 2016
Relational Theory for Budding Einsteins -- LonestarPHP 2016Relational Theory for Budding Einsteins -- LonestarPHP 2016
Relational Theory for Budding Einsteins -- LonestarPHP 2016Dave Stokes
 
denormalization.ppt
denormalization.pptdenormalization.ppt
denormalization.pptABUSUFYAN55
 

Similar to Lecture on Dimensional Modeling for Data Warehouse Design (20)

Intro to Data warehousing lecture 08
Intro to Data warehousing   lecture 08Intro to Data warehousing   lecture 08
Intro to Data warehousing lecture 08
 
Intro to Data warehousing Lecture 04
Intro to Data warehousing   Lecture 04Intro to Data warehousing   Lecture 04
Intro to Data warehousing Lecture 04
 
Dwh lecture 07-denormalization
Dwh   lecture 07-denormalizationDwh   lecture 07-denormalization
Dwh lecture 07-denormalization
 
When & Why\'s of Denormalization
When & Why\'s of DenormalizationWhen & Why\'s of Denormalization
When & Why\'s of Denormalization
 
Dwh lecture-07-denormalization
Dwh lecture-07-denormalizationDwh lecture-07-denormalization
Dwh lecture-07-denormalization
 
Lecture 13
Lecture 13Lecture 13
Lecture 13
 
Year 11 DATA PROCESSING 1st Term
Year 11 DATA PROCESSING 1st TermYear 11 DATA PROCESSING 1st Term
Year 11 DATA PROCESSING 1st Term
 
T-SQL Overview
T-SQL OverviewT-SQL Overview
T-SQL Overview
 
Persentation of SAD 2
Persentation of SAD 2Persentation of SAD 2
Persentation of SAD 2
 
Distributed database
Distributed databaseDistributed database
Distributed database
 
ch02models.pptx
ch02models.pptxch02models.pptx
ch02models.pptx
 
ch02models.pptx
ch02models.pptxch02models.pptx
ch02models.pptx
 
RDBMS concepts
RDBMS conceptsRDBMS concepts
RDBMS concepts
 
1816 1819
1816 18191816 1819
1816 1819
 
1816 1819
1816 18191816 1819
1816 1819
 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
 
Normalization in Database
Normalization in DatabaseNormalization in Database
Normalization in Database
 
Relational Theory for Budding Einsteins -- LonestarPHP 2016
Relational Theory for Budding Einsteins -- LonestarPHP 2016Relational Theory for Budding Einsteins -- LonestarPHP 2016
Relational Theory for Budding Einsteins -- LonestarPHP 2016
 
denormalization.ppt
denormalization.pptdenormalization.ppt
denormalization.ppt
 
02 Related Concepts
02 Related Concepts02 Related Concepts
02 Related Concepts
 

More from Sulman Ahmed

Entrepreneurial Strategy Generating and Exploiting new entries
Entrepreneurial Strategy Generating and Exploiting new entriesEntrepreneurial Strategy Generating and Exploiting new entries
Entrepreneurial Strategy Generating and Exploiting new entriesSulman Ahmed
 
Entrepreneurial Intentions and corporate entrepreneurship
Entrepreneurial Intentions and corporate entrepreneurshipEntrepreneurial Intentions and corporate entrepreneurship
Entrepreneurial Intentions and corporate entrepreneurshipSulman Ahmed
 
Entrepreneurship main concepts and description
Entrepreneurship main concepts and descriptionEntrepreneurship main concepts and description
Entrepreneurship main concepts and descriptionSulman Ahmed
 
Run time Verification using formal methods
Run time Verification using formal methodsRun time Verification using formal methods
Run time Verification using formal methodsSulman Ahmed
 
Use of Formal Methods at Amazon Web Services
Use of Formal Methods at Amazon Web ServicesUse of Formal Methods at Amazon Web Services
Use of Formal Methods at Amazon Web ServicesSulman Ahmed
 
student learning App
student learning Appstudent learning App
student learning AppSulman Ahmed
 
Software Engineering Economics Life Cycle.
Software Engineering Economics  Life Cycle.Software Engineering Economics  Life Cycle.
Software Engineering Economics Life Cycle.Sulman Ahmed
 
Data mining Techniques
Data mining TechniquesData mining Techniques
Data mining TechniquesSulman Ahmed
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data miningSulman Ahmed
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data miningSulman Ahmed
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining Sulman Ahmed
 
Data mining Basics and complete description
Data mining Basics and complete description Data mining Basics and complete description
Data mining Basics and complete description Sulman Ahmed
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onwordSulman Ahmed
 
Dwh lecture-06-normalization
Dwh lecture-06-normalizationDwh lecture-06-normalization
Dwh lecture-06-normalizationSulman Ahmed
 
Dwh lecture 13-process dm
Dwh  lecture 13-process dmDwh  lecture 13-process dm
Dwh lecture 13-process dmSulman Ahmed
 
Dwh lecture 11-molap
Dwh  lecture 11-molapDwh  lecture 11-molap
Dwh lecture 11-molapSulman Ahmed
 
Dwh lecture 10-olap
Dwh   lecture 10-olapDwh   lecture 10-olap
Dwh lecture 10-olapSulman Ahmed
 
Dwh lecture 08-denormalization tech
Dwh   lecture 08-denormalization techDwh   lecture 08-denormalization tech
Dwh lecture 08-denormalization techSulman Ahmed
 
Wbs, estimation and scheduling
Wbs, estimation and schedulingWbs, estimation and scheduling
Wbs, estimation and schedulingSulman Ahmed
 

More from Sulman Ahmed (20)

Entrepreneurial Strategy Generating and Exploiting new entries
Entrepreneurial Strategy Generating and Exploiting new entriesEntrepreneurial Strategy Generating and Exploiting new entries
Entrepreneurial Strategy Generating and Exploiting new entries
 
Entrepreneurial Intentions and corporate entrepreneurship
Entrepreneurial Intentions and corporate entrepreneurshipEntrepreneurial Intentions and corporate entrepreneurship
Entrepreneurial Intentions and corporate entrepreneurship
 
Entrepreneurship main concepts and description
Entrepreneurship main concepts and descriptionEntrepreneurship main concepts and description
Entrepreneurship main concepts and description
 
Run time Verification using formal methods
Run time Verification using formal methodsRun time Verification using formal methods
Run time Verification using formal methods
 
Use of Formal Methods at Amazon Web Services
Use of Formal Methods at Amazon Web ServicesUse of Formal Methods at Amazon Web Services
Use of Formal Methods at Amazon Web Services
 
student learning App
student learning Appstudent learning App
student learning App
 
Software Engineering Economics Life Cycle.
Software Engineering Economics  Life Cycle.Software Engineering Economics  Life Cycle.
Software Engineering Economics Life Cycle.
 
Data mining Techniques
Data mining TechniquesData mining Techniques
Data mining Techniques
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Data mining Basics and complete description
Data mining Basics and complete description Data mining Basics and complete description
Data mining Basics and complete description
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onword
 
Dwh lecture-06-normalization
Dwh lecture-06-normalizationDwh lecture-06-normalization
Dwh lecture-06-normalization
 
Dwh lecture 13-process dm
Dwh  lecture 13-process dmDwh  lecture 13-process dm
Dwh lecture 13-process dm
 
Dwh lecture 11-molap
Dwh  lecture 11-molapDwh  lecture 11-molap
Dwh lecture 11-molap
 
Dwh lecture 10-olap
Dwh   lecture 10-olapDwh   lecture 10-olap
Dwh lecture 10-olap
 
Dwh lecture 08-denormalization tech
Dwh   lecture 08-denormalization techDwh   lecture 08-denormalization tech
Dwh lecture 08-denormalization tech
 
Wbs
WbsWbs
Wbs
 
Wbs, estimation and scheduling
Wbs, estimation and schedulingWbs, estimation and scheduling
Wbs, estimation and scheduling
 

Recently uploaded

"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...Erbil Polytechnic University
 
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdfPaper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdfNainaShrivastava14
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communicationpanditadesh123
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
TEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACHTEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACHSneha Padhiar
 
Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodLevelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodManicka Mamallan Andavar
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmDeepika Walanjkar
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionSneha Padhiar
 
OOP concepts -in-Python programming language
OOP concepts -in-Python programming languageOOP concepts -in-Python programming language
OOP concepts -in-Python programming languageSmritiSharma901052
 
Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfChristianCDAM
 
Engineering Drawing section of solid
Engineering Drawing     section of solidEngineering Drawing     section of solid
Engineering Drawing section of solidnamansinghjarodiya
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfalene1
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Romil Mishra
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfisabel213075
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdfCaalaaAbdulkerim
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfManish Kumar
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTSneha Padhiar
 

Recently uploaded (20)

"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...
 
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdfPaper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communication
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
TEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACHTEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACH
 
Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodLevelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument method
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based question
 
OOP concepts -in-Python programming language
OOP concepts -in-Python programming languageOOP concepts -in-Python programming language
OOP concepts -in-Python programming language
 
Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdf
 
Engineering Drawing section of solid
Engineering Drawing     section of solidEngineering Drawing     section of solid
Engineering Drawing section of solid
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdf
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdf
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
 

Lecture on Dimensional Modeling for Data Warehouse Design

  • 2.  Problems with early COBOLian data processing systems.  Data redundancies  From flat file to Table, each entity ultimately becomes a Table in the physical schema.  Simple O(n2 ) Join to work with Tables 2
  • 3. ◦ Coupled with normalization drives out all the redundancy out of the database. ◦ Change (or add or delete) the data at just one point. ◦ Can be used with indexing for very fast access. ◦ Resulted in success of OLTP systems. 3
  • 4.  Lets have a look at a typical ER data model first.  Some Observations: ◦ All tables look-alike, as a consequence it is difficult to identify:  Which table is more important ?  Which is the largest?  Which tables contain numerical measurements of the business?  Which table contain nearly static descriptive attributes? 4
  • 5. ◦ Many topologies for the same ER diagram, all appearing different.  Very hard to visualize and remember.  A large number of possible connections to any two (or more) tables 5 1 10 3 12 2 6 5 11 4 7 8 9 1 10 3 12 2 6 5 11 4 7 8 9
  • 6.  The Paradox: Trying to make information accessible using tables resulted in an inability to query them!  ER and Normalization result in large number of tables which are: ◦ Hard to understand by the users (DB programmers) ◦ Hard to navigate optimally by DBMS software  Real value of ER is in using tables individually or in pairs  Too complex for queries that span multiple tables with a large number of records 6
  • 7. ER DM Constituted to optimize OLTP performance. Constituted to optimize DSS query performance. Models the micro relationships among data elements. Models the macro relationships among data elements with an overall deterministic strategy. A wild variability of the structure of ER models. All dimensions serve as equal entry points to the fact table. Very vulnerable to changes in the user's querying habits, because such schemas are asymmetrical. Changes in users' querying habits can be accommodated by automatic SQL generators. 7
  • 8. Two general methods: ◦ De-Normalization ◦ Dimensional Modeling (DM) 8
  • 9.  A simpler logical model optimized for decision support.  Inherently dimensional in nature, with a single central fact table and a set of smaller dimensional tables.  Multi-part key for the fact table  Dimensional tables with a single-part PK.  Keys are usually system generated 9
  • 10. Data cubes Dimension Table Dimension Table Fact Table ...
  • 11.  Results in a star like structure, called star schema or star join. ◦ All relationships mandatory M-1. ◦ Single path between any two levels.  Supports ROLAP operations. 11
  • 12. 12 Items Books Cloths Fiction Text Men Women MedicalEngg Analysts tend to look at the data through dimension at aAnalysts tend to look at the data through dimension at a particular “level” in the hierarchyparticular “level” in the hierarchy
  • 14. 14 CITY DISTRICT 1 ZONE CITY DISTRICTDIVISION MONTH QTR STORE # STREET ZONE ... WEEK MONTH DATE WEEK RECEIPT #STORE # DATE ... ITEM #RECEIPT # ... $ ITEM # CATEGORY ITEM # DEPTCATEGORY year month week sale_header store sale_detail item_x_cat item_x_splir cat_x_dept M 1 M 1M 1 M 1 1 M M 1 M M M1 1 M 1 1 M YEAR QTR 1 M quarter SUPPLIER DIVISIONPROVINCEM 1 BACK division district zone
  • 15. 15 RECEIPT# STORE# DATE ITEM# M Fact Table ITEM# CATEGORY DEPT SUPPLIER Product Dim M Sale Rs. M STORE# ZONE CITY PROVINCE Geography Dim DISTRICT DATE WEEK QUARTER YEAR Time Dim MONTH . . . 1 1 1 facts DIVISION
  • 16. 16 Beauty lies in close correspondence with the business, evident even to business users.
  • 17. Dimensional hierarchies are collapsed into a single table for each dimension. Loss of Information? A single fact table created with a single header from the detail records, resulting in: ◦ A vastly simplified physical data model! ◦ Fewer tables (thousands of tables in some ERP systems). ◦ Fewer joins resulting in high performance. ◦ Some requirement of additional space. 17

Editor's Notes

  1. There were utitlity companies which goes house by house and collect info like meter reading. Now the data is placed on books, and at a centeral place info is entered in computer. Now address remain same, but the reading changes forever. Now the info become redundant. So if data changes it needs to be reflected at a lot of places. So a solution of the problem was normalization which are based on er modeling. The problem was of the slow joins. The er diagram was turned into tables. Which were joined with other tables to collect the info.
  2. When things were fine then why we need the DMs. Now look a schema which is in the third normal form. See the next slide Now there are some observations about er diagram. Some questions mentioned above. Now an example from real life. If you go somewhere and you want to know which person is the most important one. Yes, he will be one which has people arround him listening what he is saying. But now can you tell which table is more important? One with largest header size and few rows of record or viceversa. Numerical measurements: e.g. sales data, no of items sold and revenue, the factual data. Descriptive: or dimensional information containing data. So what is the benefit of the simplicity if it may raise more questions at every step.
  3. So all the previous points take us to the new representation demand. This is explained using graph theory: An ER model can have different shape based on the designer. Every model looks different. The above two graphs are same, but different representation. The left graph is more difficult to understand. So this is the graph isomorphism problem, that you have to tell, which two graphs are same and this is a very computationaly tough problem. So the same prob exists with ER diagram, that models appear different for every problem. So these complexities are taking us towards the need of DM.
  4. Paradox: conflict. An example is that you went in an hospital and said how was the operation, they said the operation was successful but patiant died. So what is the benefit of such successful operation, which could not save a patients life so a paradox. The problem is complex because of so many tables due to normalization. And in erp system this may be in thousands. The real value of er modeling is when you query a single table or few tables then you will have good performance but in dss we by defualt join many many tables, so performance will suddenly go down this is a paradox.
  5. So a comparison of er against dm. Er modeling is for oltp and dm for dss. Suppose you have a bike, and you decide that when you make home and decide to load the cement for house making the result is your bike will destroy. But if you do it on a truck it will never have any affect. So the problem is using the right thing for wrong problem. In dss we are concerned with higher level or aggregation, so we will not go on minor details. Er diagrams are different for same problem. But when you make system then all systems will have a lot of variation. But in dss the schema do not change normally. There are smart enviornment which generates sql automatically but they may become in a difficulty while optimizing if the schema always changes. But in DM or star schema, it is very difficult to generate the sql. Er schema changes when business changes, so sql generating tool faces difficulty. But in dss the schema remains constant even with the change of business.
  6. ER model can be simplified using de-normalization and DM.
  7. So what is a dm or how we tell about a schema that it is optimized for the dss enviornment. The slide points. So the key point is it is simple, logical and intutive. So if it is easy to understand for programmers, it assures better solution. It has two tables fact and dimensional. Fact tables are large and dimensional tables are small. Fact tables are table which store numerical data i.e how much sale, sale revenew. The dimension table has info about dimension i.e time, geography etc. Keys should be sys generated not the business key, so if the business change key should not need the maintainance.
  8. Map business analyst representation to relational model Data cubes with dimensions and measures Relational design with tables and 1-M relationships (FKs) Dimensions to dimension tables Measures to fact tables Group fact and dimension tables Grain: most detailed measure values stored
  9. How fact and dimension table connects? In the form of star topology where fact table is in center. Dm is designed to support the rolap operations, where we can run on the go queries.
  10. Dimensions have hirarchies. i.e books have fiction and text, but you cannot mix them. So the benefit is decision maker can enter at a point in hirerchy to see the details of other hirarchies.
  11. The above task can be done by two schemas. Star are simple, either you rotate flip or reposition it wont change, but for snowflake if you do this, you will loose the entire meaning. Star schema represents a complete business process e.g. sales, purchases, inventory etc. For each business process we will have different stars.
  12. Star schema of the previous slide, and things become simplified. We create the fact tables having real (physical) records, we do not run the joins on run time. This is the reason that in pivot4j we analyze a physical and real star by placing the dimensions of our requirements and mdx generates automatically. Once a star is created it doesn’t matter how you analyze it. suppose there are hundered records in each table and 4 tables are involved in a query which needs a join, now against the join the output returns 40 rows for a specific join query. Now to retrieve these 40 rows we have computed 100x100x100x100 steps. Now if these 40 records are placed in a table (fact table) which has 1000 total rows then in worst case we will achieve the correct output in 1000 steps in star instead of 100000000 steps. So ultimately we will achieve enormous performance.
  13. When we get star schema, we collapsed the hirarchies and make a single table i.e time is now in a single table means we will avoid the sub tables in the form of pk and fk relations, now the name of a column say city will be used in dimensional table instead of FK, it may result the loss of info i.e every city may have the province name fk but now we will not be able to tell the dependency of cities by just looking the diagram. Its disadvantage is that you cannot tell, which element is subset of which element, and what is the level of element in hirerchy. So loss of information. The benefit is that simple schema with few tables as compare to previously hundreds of tables, another disadvantage is the additional space. The simple example could be on next stage.