SlideShare une entreprise Scribd logo
1  sur  26
Data modeling
Dimensions
Structure of dimension tables I
•Must have a primary key
•Use surrogate keys that are meaningless
integers or identifiers
•Best performance when the joins between facts
and dimensions are on single integer fields
•Natural key versus surrogate key - natural key is
a more meaningful col
•When dimensions are static, there is a 1-1
mapping between natural key and surrogate key
•Descriptive components of the dimension table
Structure of dimension tables II
•efficient generation and maintenance of the
surrogate keys is important for success
•Surrogate keys should not be
smart/intelligent/meaningful because
–definition required them to be meaningless so that
they don’t have to change over time
–performance can be better instead of using
concatenated natural keys
–data type mismatch is ensured. no alpha numeric
values should be allowed. data type should always
be integer/number
Creating dimension tables I
•may be done internally by the ETL system like
lookup tables
•can be extracted from data sources
•data cleaning is important for big complex
dimensions
•conforming consists of aligning the content
from different parts of the data warehouse
•data delivery module for looking after SCDs
Flat and snowflake dimensions I
•Dimensions are flat denormalized tables
•should have small cardinality
•limited values
•if staging data has data in 3NF, these attended
2NF dimensions are easily produced with a
simple query on the 3NF source
•only requirement is that every attribute is
single valued (atomic) in the primary key
•disadvantage of snowflake schema -- Many to
1 relations are hard to look at, complex schemas
Flat and snowflake dimensions II
Flat and snowflake dimensions III
•If an attribute takes multiple values in the
presence of a primary key, then it cannot be
included in the dimension.
•Eq. Cash Register id of retail store: grain is
individual store
•For every new dimension record a fresh
surrogate key should be assigned
Some typical dimensions I
•Date and time dimensions:
a huge table with a primary key and all
possible fields of date, day, month, year,
timestamp, fiscal year, quarter etc.
•Small dimensions:
Mainly used for lookups
•Big dimensions:
merging and de-duplicating
multiple attributes for same
dimension, from different data
sources.
Dimension tables structures I
•Dimensions can be modeled as flat and
snowflakes
•Flat dimensions are denormalized tables
•The Dimension tables should be modeled in
such a way that they have small cardinality
•Every dimension table should have limited
values
•Populating the data from staging to data
warehouse:
–If staging data has data in 3NF, the attended 2NF
dimensions are easily produced with a simple query
Dimension tables structures II
•Snow flaking is defined as creating sub-
dimensions or
•dimensions of other dimensions. This makes
the schema much more cleaner
•If an attribute takes multiple values in the
presence of a primary key, then it cannot be
included in the dimension for every new
dimension records a fresh surrogate key should
be assigned
•A good design practice: identify the correlation
Roles, sub-dimensions and empty
dimensions I
•Roles:
–Concept of using the same table attached to a fact
multiple times
– E.g. 2 roles of employee dimension - manager and
employee
Roles, sub-dimensions and empty
dimensions II
•Sub dimensions:
–Sub dimensions are defined using foreign keys in
the parent dimension table
–they are called as outriggers
Roles, sub-dimensions and empty
dimensions III
•Degenerate dimensions:
–Problem: Parent-child data relationship into
dimensional framework, the natural key of the
parent is usually left as orphan
–Solution: to avoid this, the natural key of parent is
given a special status called empty or degenerate
dimension arises in every parent-child relations
–e.g. : order number, shipment number, billing
number etc.
–They often play an integral role in the fact table’s
primary key.
Roles, sub-dimensions and empty
dimensions II
•ETL dimensional delivery module must convert
selected fields in the input data for the
dimension to foreign key references
•About multi valued dimensions and bridge
tables
–May be linked to a fact table via bridge tables
–Helps to avoid many-many joins by “creating a
group entity"
–Time-varying bridge tables are seen in case of
Type2 SCDs
–Performance overheads for updates and queries
Roles, sub-dimensions and empty
dimensions III
Roles, sub-dimensions and empty
dimensions IV
•Ragged hierarchies
–Arise due to the hierarchies seen in the
organization
–May be related to people, roles, products, billing
information etc.
–Pre-dominant characteristic: The parent member
of at least one member of a dimension is not in the
level immediately above the member
•It can be implemented as recursive pointer
(pointer to a dimension within the same table to
another field) or as a hierarchy bridge table
SCDs - recap I
•type 0
–Passive
–Data never changes
•type 1
–Overwrite data
–Using update or insert
functionality
–May cause performance
problems
–May need support of rollback
SCDs - recap II
• type 2
–Also called partitioning history
–Every time a change happens a new surrogate
primary key is assigned and all fact tables now
onwards use this new foreign key
–A cube-form representation with time as one of
the dimension
SCDs - recap III
•type 3
–Called as alternative realities
–Old value of the attribute remains valid as a second
choice
–Creates a new column for a change
SCDs - recap IV
SCDs - recap V
•Hybrid slowly changing dimension
Handling late arriving data
•Required fixes
–insert a new records capturing the change
–scan the dimension after the required date of
modification and destructively overwrite the changed
attribute
–update the fact tables that reference the modified
dimension
•In real time systems, the dimension data usually
arrives after the fact data.
•In such cases it is important to point the fact
table records to a special placeholder and then
Process of loading dimensions I
•Some dimensions are created automatically
without involving the ETL process
•Operational code translated into words and
have no external
•sources
•Remaining ones are extracted from outside
sources. They need special processing as below:
–Data cleaning: identifying and correcting or
removing inaccurate, incorrect , incomplete data
–Data conforming: aligning the content of some
fields in the dimension with similar fields
Process of loading dimensions II
I
•All dimensions are attended out - denormalized
•Create a snowflake dimension if it is not
possible to logically attend out the dimensions
•Identify the fields in every dimension table as
primary / surrogate key (joins with the foreign
key in the fact table),natural key (descriptor of
the data), descriptive attributes (textual details)

Contenu connexe

Tendances

Basics+of+Datawarehousing
Basics+of+DatawarehousingBasics+of+Datawarehousing
Basics+of+Datawarehousing
theextraaedge
 
Data processing and analysis final
Data processing and analysis finalData processing and analysis final
Data processing and analysis final
Akul10
 
Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional Modelling
Ashish Chandwani
 
DW DIMENSN MODELNG
DW DIMENSN MODELNGDW DIMENSN MODELNG
DW DIMENSN MODELNG
Divya Tadi
 

Tendances (20)

Basics+of+Datawarehousing
Basics+of+DatawarehousingBasics+of+Datawarehousing
Basics+of+Datawarehousing
 
Dimensional Modelling - Basic Concept
Dimensional Modelling - Basic ConceptDimensional Modelling - Basic Concept
Dimensional Modelling - Basic Concept
 
Dimensional data model
Dimensional data modelDimensional data model
Dimensional data model
 
Data Warehouse Designing: Dimensional Modelling and E-R Modelling
Data Warehouse Designing: Dimensional Modelling and E-R ModellingData Warehouse Designing: Dimensional Modelling and E-R Modelling
Data Warehouse Designing: Dimensional Modelling and E-R Modelling
 
Dominick’s finer foods
Dominick’s finer foodsDominick’s finer foods
Dominick’s finer foods
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Slowly changing dimension
Slowly changing dimension Slowly changing dimension
Slowly changing dimension
 
MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra
 
Data processing and analysis final
Data processing and analysis finalData processing and analysis final
Data processing and analysis final
 
Dimensional model | | Fact Tables | | Types
Dimensional model | | Fact Tables | | TypesDimensional model | | Fact Tables | | Types
Dimensional model | | Fact Tables | | Types
 
Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional Modelling
 
Data processing
Data processingData processing
Data processing
 
Fact less fact Tables & Aggregate Tables
Fact less fact Tables & Aggregate Tables Fact less fact Tables & Aggregate Tables
Fact less fact Tables & Aggregate Tables
 
Business Intelligence and Multidimensional Database
Business Intelligence and Multidimensional DatabaseBusiness Intelligence and Multidimensional Database
Business Intelligence and Multidimensional Database
 
DW DIMENSN MODELNG
DW DIMENSN MODELNGDW DIMENSN MODELNG
DW DIMENSN MODELNG
 
Data science institutes in hyderabad
Data science institutes in hyderabadData science institutes in hyderabad
Data science institutes in hyderabad
 
Dimensional data modeling
Dimensional data modelingDimensional data modeling
Dimensional data modeling
 
Star schema
Star schemaStar schema
Star schema
 
E-R vs Starschema
E-R vs StarschemaE-R vs Starschema
E-R vs Starschema
 
Data modeling facts
Data modeling factsData modeling facts
Data modeling facts
 

Similaire à Data modeling dimensions

Asper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling TopicsAsper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling Topics
Terry Bunio
 
Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClasses
InformaticaTrainingClasses
 
The final frontier v3
The final frontier v3The final frontier v3
The final frontier v3
Terry Bunio
 
chapter09-120827115409-phpapp01.pdf
chapter09-120827115409-phpapp01.pdfchapter09-120827115409-phpapp01.pdf
chapter09-120827115409-phpapp01.pdf
AxmedMaxamuud6
 
Data flow in Extraction of ETL data warehousing
Data flow in Extraction of ETL data warehousingData flow in Extraction of ETL data warehousing
Data flow in Extraction of ETL data warehousing
Dr. Dipti Patil
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
extraganesh
 
1 introductory slides (1)
1 introductory slides (1)1 introductory slides (1)
1 introductory slides (1)
tafosepsdfasg
 

Similaire à Data modeling dimensions (20)

Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Asper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling TopicsAsper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling Topics
 
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
 
Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClasses
 
Data modeling tips from the trenches
Data modeling tips from the trenchesData modeling tips from the trenches
Data modeling tips from the trenches
 
Data warehouse 19 dimensional model
Data warehouse 19 dimensional modelData warehouse 19 dimensional model
Data warehouse 19 dimensional model
 
Datawarehouse
DatawarehouseDatawarehouse
Datawarehouse
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousing
 
The final frontier v3
The final frontier v3The final frontier v3
The final frontier v3
 
Chapter 9 Data Design .pptxInformation Technology Project Management
Chapter 9 Data Design .pptxInformation Technology Project ManagementChapter 9 Data Design .pptxInformation Technology Project Management
Chapter 9 Data Design .pptxInformation Technology Project Management
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
chapter09-120827115409-phpapp01.pdf
chapter09-120827115409-phpapp01.pdfchapter09-120827115409-phpapp01.pdf
chapter09-120827115409-phpapp01.pdf
 
Data flow in Extraction of ETL data warehousing
Data flow in Extraction of ETL data warehousingData flow in Extraction of ETL data warehousing
Data flow in Extraction of ETL data warehousing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data modeling trends for Analytics
Data modeling trends for AnalyticsData modeling trends for Analytics
Data modeling trends for Analytics
 
Ch~2.pdf
Ch~2.pdfCh~2.pdf
Ch~2.pdf
 
LECTURE 7.ppt.pdf
LECTURE 7.ppt.pdfLECTURE 7.ppt.pdf
LECTURE 7.ppt.pdf
 
1 introductory slides (1)
1 introductory slides (1)1 introductory slides (1)
1 introductory slides (1)
 
AIS PPt.pptx
AIS PPt.pptxAIS PPt.pptx
AIS PPt.pptx
 
Chapter 2 - Introduction to Data Science.pptx
Chapter 2 - Introduction to Data Science.pptxChapter 2 - Introduction to Data Science.pptx
Chapter 2 - Introduction to Data Science.pptx
 

Dernier

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 

Dernier (20)

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 

Data modeling dimensions

  • 2. Structure of dimension tables I •Must have a primary key •Use surrogate keys that are meaningless integers or identifiers •Best performance when the joins between facts and dimensions are on single integer fields •Natural key versus surrogate key - natural key is a more meaningful col •When dimensions are static, there is a 1-1 mapping between natural key and surrogate key •Descriptive components of the dimension table
  • 3. Structure of dimension tables II •efficient generation and maintenance of the surrogate keys is important for success •Surrogate keys should not be smart/intelligent/meaningful because –definition required them to be meaningless so that they don’t have to change over time –performance can be better instead of using concatenated natural keys –data type mismatch is ensured. no alpha numeric values should be allowed. data type should always be integer/number
  • 4. Creating dimension tables I •may be done internally by the ETL system like lookup tables •can be extracted from data sources •data cleaning is important for big complex dimensions •conforming consists of aligning the content from different parts of the data warehouse •data delivery module for looking after SCDs
  • 5. Flat and snowflake dimensions I •Dimensions are flat denormalized tables •should have small cardinality •limited values •if staging data has data in 3NF, these attended 2NF dimensions are easily produced with a simple query on the 3NF source •only requirement is that every attribute is single valued (atomic) in the primary key •disadvantage of snowflake schema -- Many to 1 relations are hard to look at, complex schemas
  • 6. Flat and snowflake dimensions II
  • 7. Flat and snowflake dimensions III •If an attribute takes multiple values in the presence of a primary key, then it cannot be included in the dimension. •Eq. Cash Register id of retail store: grain is individual store •For every new dimension record a fresh surrogate key should be assigned
  • 8. Some typical dimensions I •Date and time dimensions: a huge table with a primary key and all possible fields of date, day, month, year, timestamp, fiscal year, quarter etc. •Small dimensions: Mainly used for lookups
  • 9. •Big dimensions: merging and de-duplicating multiple attributes for same dimension, from different data sources.
  • 10. Dimension tables structures I •Dimensions can be modeled as flat and snowflakes •Flat dimensions are denormalized tables •The Dimension tables should be modeled in such a way that they have small cardinality •Every dimension table should have limited values •Populating the data from staging to data warehouse: –If staging data has data in 3NF, the attended 2NF dimensions are easily produced with a simple query
  • 11. Dimension tables structures II •Snow flaking is defined as creating sub- dimensions or •dimensions of other dimensions. This makes the schema much more cleaner •If an attribute takes multiple values in the presence of a primary key, then it cannot be included in the dimension for every new dimension records a fresh surrogate key should be assigned •A good design practice: identify the correlation
  • 12. Roles, sub-dimensions and empty dimensions I •Roles: –Concept of using the same table attached to a fact multiple times – E.g. 2 roles of employee dimension - manager and employee
  • 13. Roles, sub-dimensions and empty dimensions II •Sub dimensions: –Sub dimensions are defined using foreign keys in the parent dimension table –they are called as outriggers
  • 14. Roles, sub-dimensions and empty dimensions III •Degenerate dimensions: –Problem: Parent-child data relationship into dimensional framework, the natural key of the parent is usually left as orphan –Solution: to avoid this, the natural key of parent is given a special status called empty or degenerate dimension arises in every parent-child relations –e.g. : order number, shipment number, billing number etc. –They often play an integral role in the fact table’s primary key.
  • 15. Roles, sub-dimensions and empty dimensions II •ETL dimensional delivery module must convert selected fields in the input data for the dimension to foreign key references •About multi valued dimensions and bridge tables –May be linked to a fact table via bridge tables –Helps to avoid many-many joins by “creating a group entity" –Time-varying bridge tables are seen in case of Type2 SCDs –Performance overheads for updates and queries
  • 16. Roles, sub-dimensions and empty dimensions III
  • 17. Roles, sub-dimensions and empty dimensions IV •Ragged hierarchies –Arise due to the hierarchies seen in the organization –May be related to people, roles, products, billing information etc. –Pre-dominant characteristic: The parent member of at least one member of a dimension is not in the level immediately above the member •It can be implemented as recursive pointer (pointer to a dimension within the same table to another field) or as a hierarchy bridge table
  • 18.
  • 19. SCDs - recap I •type 0 –Passive –Data never changes •type 1 –Overwrite data –Using update or insert functionality –May cause performance problems –May need support of rollback
  • 20. SCDs - recap II • type 2 –Also called partitioning history –Every time a change happens a new surrogate primary key is assigned and all fact tables now onwards use this new foreign key –A cube-form representation with time as one of the dimension
  • 21. SCDs - recap III •type 3 –Called as alternative realities –Old value of the attribute remains valid as a second choice –Creates a new column for a change
  • 23. SCDs - recap V •Hybrid slowly changing dimension
  • 24. Handling late arriving data •Required fixes –insert a new records capturing the change –scan the dimension after the required date of modification and destructively overwrite the changed attribute –update the fact tables that reference the modified dimension •In real time systems, the dimension data usually arrives after the fact data. •In such cases it is important to point the fact table records to a special placeholder and then
  • 25. Process of loading dimensions I •Some dimensions are created automatically without involving the ETL process •Operational code translated into words and have no external •sources •Remaining ones are extracted from outside sources. They need special processing as below: –Data cleaning: identifying and correcting or removing inaccurate, incorrect , incomplete data –Data conforming: aligning the content of some fields in the dimension with similar fields
  • 26. Process of loading dimensions II I •All dimensions are attended out - denormalized •Create a snowflake dimension if it is not possible to logically attend out the dimensions •Identify the fields in every dimension table as primary / surrogate key (joins with the foreign key in the fact table),natural key (descriptor of the data), descriptive attributes (textual details)