SlideShare a Scribd company logo
1 of 38
Download to read offline
Data Warehouse Design 
Best Practices
About me 
 Project Manager @ 
 12 years professional experience 
 .NET Web Development MCPD 
 SQL Server 2012 (MCSA) 
 Business Interests 
 Web Development, SOA, Integration 
 Security  Performance Optimization 
 Horizon2020, Open BIM, GIS, Mapping 
 Contact me 
 ivelin.andreev@icb.bg 
 www.linkedin.com/in/ivelin 
 www.slideshare.net/ivoandreev 
2 |
About me 
 Senior Developer @ 
 .NET Web Development MCPD 
 Business Interests 
 Web Development, WCF, Integration 
 SQL Server – Query Optimization and Tuning 
 Data Warehousing 
 Contact me 
 georgi.mishev@icb.bg 
 www.linkedin.com/in/georgimishev
Sponsors
Agenda 
 Why Data Warehouse 
 Main DW Architectures 
 Dimensional Modeling 
 Patterns  Practices 
 DW Maintenance 
 ETL Process 
 SSIS Demo
Lots of Data Everywhere 
 Can’t find data? 
 Data scattered over the network 
 Can’t get data? 
 Need an expert to get the data 
 Can’t understand data? 
 Data poorly documented 
 Can’t use data found? 
 Data needs to be transformed
Data Warehouse? 
Def: Central repository where data are organized, cleansed 
and in standardized format. 
 Integrated 
 Heterogeneous sources 
 Data clean and conversion ($, €, 元) 
 Focus on subject 
 i.e. Customer, Sale, Product 
 Time variant 
 Timestamp every key 
 Historical data (10+ years)
Different Problems - Different Solutions 
OLTP Database Data Warehouse 
Users Customer Knowledge worker 
Design Normalized, Data Integrity Denormalized 
Function Daily operation Decision making 
Data Current, Detailed Historical, Aggregated 
Usage Real time Ad-hoc 
Access Short R/W transactions Complex R/O queries 
Data accessed Comparatively lower Large Amounts 
# Records x100 x1’000’000 
# Users x1’000 x10 
DB Size x10 GB x100GB-TB
Different DW Architectures
B.Inmon Model 
Top-Down Approach 
 Warehouse (3NF) 
 Data Mart  OLAP (MD) 
http://sqlschoolgr.files.wordpress.com/2012/03/clip_image003_thumb.png?w=640h=368
R.Kimball Model 
Bottom-Up Approach 
 Data Marts (3NF or MD) 
 Warehouse  OLAP (MD) 
http://sqlschoolgr.files.wordpress.com/2012/03/clip_image005_thumb.png?w=640h=369
Data Vault (by Dan Linstedt) 
 Hubs 
 List of unique business keys 
 Links 
 Unique relationships between keys 
 Satellites 
 Hub and Link details and history
It is irrelevant which camp you belong… 
as far as you understand why!
Making Your Choice 
• Kimball (MD) 
+ Start small, scale big 
+ Faster ROI 
+ Analytical tools 
- Low reusability 
• Data Vault 
• Inmon (3NF) 
+ Structured 
+ Easy to maintain 
+ Easier data mining 
- Timely to build 
Backend Data Warehouse 
+ Multiple sources; Full history; Incremental build 
- Up-front work; Long-term payoff; Many joins
Dimensional modeling as de-facto standard
Dimensions 
Def: The object of BI interest 
 Keys 
 Surrogate key 
 Business key 
 Hierarchical attributes 
 Analysis and Drill Down 
 Member properties 
 Presentation labels 
 Auditing information (not for end users)
Slowly Changing Dimensions 
Def: Scheme for recording changes over time 
 Type 1 - Overwrite 
 Type 2 – Multiple Records
Facts 
Def: Measurement of a business process 
 Keys 
 FK from all dimensional tables (in the star) 
 PK - Composite (usually) or Surrogate 
 Measures 
 Numeric columns, that are of interest to the business 
 Additive, Non-additive, Semi-additive 
 Factless facts 
 Auditing information (optional)
Practices and Design Patterns
Data Warehouse Pitfalls 
 Admit it is not as it seems to be 
 You need education 
 Find what is of business value 
 Rather than focus on performance 
 Spend a lot of time in Extract-Transform-Load 
 Homogenize data from different sources 
 Find (and resolve) problems in source systems
Prepare your Sources 
 Data integrity 
 Avoid redundancy 
 Data quality 
 Master data source 
 Data validation 
 Auditing 
 CreatedDate / CreatedBy 
 ChangedDate / ChangedBy 
 Nightly jobs
Dimension Design 
 Business key with non-clustered index 
 Include date (if dimension has history) 
 Surrogate key 
 The smallest possible integer 
 Clustered index 
 FK constraints 
 Do not enforce (WITH NOCHECK) 
 Document the relation 
 Faster load 
 Data validation 
 Task for the Source system
Conformed Dimensions 
Def. Having the same meaning and content 
when referred from multiple fact tables 
 Date Dimension 
 Partitioning best candidate 
 Granularity 
 Do not store every hour, when reporting daily 
 Avoid surrogate keys 
 Saves lookup and joins 
 Integer representing date (yyyyMMdd, days after 1/1/1900)
Pre-join Hierarchies 
 Recursive relationships 
 Fast drill and report 
 Pre-computed aggregations 
Hierarchy Bridge 
 For each dimension row 
 1 association with self 
 1 row for each subordinate
Determine the Facts 
The center of a Star schema 
 Identify subject areas 
 Identify key business events 
 Identify dimensions 
 Start from OLTP logical model 
 Identify historical requirements 
 Identify attributes
The Grain 
Def: The level of detail of a fact table 
 What is the business objective? 
 Fine grain - behaviour and frequency analysis 
 Coarse grain - overall and trend analysis 
 Aggregates 
 DO NOT summarize prematurely 
 DO NOT mix detail and summary 
 DO use “summary tables”
C3-PO is fluent in 6M forms of communication. 
What about your customers?
Multinational DW 
 What parts need translation? 
 Where to store various language versions? 
 How to support future languages? 
 Dimensions 
 Add language attribute 
 Include text data in the dimension 
 Problem 1: The dimension key? 
 Replicate PK for every language 
Fact.DimId = Dim.Id AND Dim.Lang=[Lang] 
 Problem 2: Storage = [Dim] x [Lang] 
 Sub-dimension with language attributes 
TxtId Attr1 Attr2 LangId 
1 large Yes En 
2 small No En 
1 stor Ja No 
2 liten Nei No 
3 … … …
Data warehouse maintenance
How Large is “Large” 
Is big really big?
Partitioning 
 Why 
 Faster index maintenance 
 Faster load 
 Faster queries 
 When 
 Tables 10GB+ 
 How 
 Do not partition dimension tables 
 Partition by date (most analysis are time-based) 
 Eliminate partitions (WHERE [PartitionKey]=…) 
 Avoid split and merge of existing partitions 
 Can cause inefficient log generation
Columnstore Index 
 Non-clustered in SQL 2012 
 Clustered in SQL 2014 
 Pros 
 Better data compression 
 High performance on table scan 
 Clustered CSI Limitations 
 No other indexes allowed 
 Little advantage on seek operations 
 No XML, computed column or replication
Extract-Transform-Load 
 Extract data from OLTP 
 Data transformations 
 Data loads 
 DW maintenance
Efficient Load Process 
 Use simple recovery model during data load 
 Staging 
 Avoid indexing 
 Populate in parallel 
 Maintain DW 
 Disable indexes on load 
 Rebuild manually after load 
 Automatic stats update slow down SQL Server
To SSIS, or not to SSIS ? 
Pros 
 Minimum coding to none 
 Extensive support of various data sources 
 Parallel execution of migration tasks 
 Better organization of the ETL process 
Cons 
 Another way of thinking 
 Hidden options 
 T-SQL developer would do much faster 
 Auto-generated flows need optimization 
 Sometimes simply does not work (i.e. Sort by GUID)
Takeaways 
 Books 
 The Data Warehouse Toolkit (3rd ed), Ralph Kimball 
 Implementing DW with Microsoft SQL Server 2012 
 Data Warehousing Fundamentals, Paulraj Ponniah 
 Articles 
 Best Practices in Data Warehouse (Hanover Research Council) 
 http://www.kimballgroup.com/category/design-tips/ 
 http://sqlmag.com/business-intelligence 
 Resources 
 http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/ 
dimensional-modeling-techniques/ 
 http://www.databaseanswers.org/data_models/index.htm
Data Warehouse Design and Best Practices

More Related Content

What's hot

What's hot (20)

Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Data modelling 101
Data modelling 101Data modelling 101
Data modelling 101
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Lakehouse in Azure
Lakehouse in AzureLakehouse in Azure
Lakehouse in Azure
 
Data Architecture Brief Overview
Data Architecture Brief OverviewData Architecture Brief Overview
Data Architecture Brief Overview
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Data warehouse presentaion
Data warehouse presentaionData warehouse presentaion
Data warehouse presentaion
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 

Similar to Data Warehouse Design and Best Practices

AnalysisServices
AnalysisServicesAnalysisServices
AnalysisServices
webuploader
 

Similar to Data Warehouse Design and Best Practices (20)

Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Data Vault Overview
Data Vault OverviewData Vault Overview
Data Vault Overview
 
Business Intelligence with SQL Server
Business Intelligence with SQL ServerBusiness Intelligence with SQL Server
Business Intelligence with SQL Server
 
ITReady DW Day2
ITReady DW Day2ITReady DW Day2
ITReady DW Day2
 
CV | Sham Sunder | Data | Database | Business Intelligence | .Net
CV | Sham Sunder | Data | Database | Business Intelligence | .NetCV | Sham Sunder | Data | Database | Business Intelligence | .Net
CV | Sham Sunder | Data | Database | Business Intelligence | .Net
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with Cassandra
 
AnalysisServices
AnalysisServicesAnalysisServices
AnalysisServices
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Arquitectura de Datos en Azure
Arquitectura de Datos en AzureArquitectura de Datos en Azure
Arquitectura de Datos en Azure
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
 
2014.11.14 Data Opportunities with Azure
2014.11.14 Data Opportunities with Azure2014.11.14 Data Opportunities with Azure
2014.11.14 Data Opportunities with Azure
 
Overview of business intelligence
Overview of business intelligenceOverview of business intelligence
Overview of business intelligence
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
 

More from Ivo Andreev

More from Ivo Andreev (20)

Cybersecurity and Generative AI - for Good and Bad vol.2
Cybersecurity and Generative AI - for Good and Bad vol.2Cybersecurity and Generative AI - for Good and Bad vol.2
Cybersecurity and Generative AI - for Good and Bad vol.2
 
Architecting AI Solutions in Azure for Business
Architecting AI Solutions in Azure for BusinessArchitecting AI Solutions in Azure for Business
Architecting AI Solutions in Azure for Business
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and Bad
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AI
 
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersHow do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
 
OpenAI GPT in Depth - Questions and Misconceptions
OpenAI GPT in Depth - Questions and MisconceptionsOpenAI GPT in Depth - Questions and Misconceptions
OpenAI GPT in Depth - Questions and Misconceptions
 
Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for Everyone
 
Collecting and Analysing Spaceborn Data
Collecting and Analysing Spaceborn DataCollecting and Analysing Spaceborn Data
Collecting and Analysing Spaceborn Data
 
Collecting and Analysing Satellite Data with Azure Orbital
Collecting and Analysing Satellite Data with Azure OrbitalCollecting and Analysing Satellite Data with Azure Orbital
Collecting and Analysing Satellite Data with Azure Orbital
 
Language Studio and Custom Models
Language Studio and Custom ModelsLanguage Studio and Custom Models
Language Studio and Custom Models
 
CosmosDB for IoT Scenarios
CosmosDB for IoT ScenariosCosmosDB for IoT Scenarios
CosmosDB for IoT Scenarios
 
Forecasting time series powerful and simple
Forecasting time series powerful and simpleForecasting time series powerful and simple
Forecasting time series powerful and simple
 
Constrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project BonsaiConstrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project Bonsai
 
Azure security guidelines for developers
Azure security guidelines for developers Azure security guidelines for developers
Azure security guidelines for developers
 
Autonomous Machines with Project Bonsai
Autonomous Machines with Project BonsaiAutonomous Machines with Project Bonsai
Autonomous Machines with Project Bonsai
 
Global azure virtual 2021 - Azure Lighthouse
Global azure virtual 2021 - Azure LighthouseGlobal azure virtual 2021 - Azure Lighthouse
Global azure virtual 2021 - Azure Lighthouse
 
Flux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JSFlux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JS
 
Azure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challengesAzure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challenges
 
Industrial IoT on Azure
Industrial IoT on AzureIndustrial IoT on Azure
Industrial IoT on Azure
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 

Recently uploaded

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
shivangimorya083
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
 

Recently uploaded (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 

Data Warehouse Design and Best Practices

  • 1. Data Warehouse Design Best Practices
  • 2. About me Project Manager @ 12 years professional experience .NET Web Development MCPD SQL Server 2012 (MCSA) Business Interests Web Development, SOA, Integration Security Performance Optimization Horizon2020, Open BIM, GIS, Mapping Contact me ivelin.andreev@icb.bg www.linkedin.com/in/ivelin www.slideshare.net/ivoandreev 2 |
  • 3. About me Senior Developer @ .NET Web Development MCPD Business Interests Web Development, WCF, Integration SQL Server – Query Optimization and Tuning Data Warehousing Contact me georgi.mishev@icb.bg www.linkedin.com/in/georgimishev
  • 5. Agenda Why Data Warehouse Main DW Architectures Dimensional Modeling Patterns Practices DW Maintenance ETL Process SSIS Demo
  • 6. Lots of Data Everywhere Can’t find data? Data scattered over the network Can’t get data? Need an expert to get the data Can’t understand data? Data poorly documented Can’t use data found? Data needs to be transformed
  • 7. Data Warehouse? Def: Central repository where data are organized, cleansed and in standardized format. Integrated Heterogeneous sources Data clean and conversion ($, €, 元) Focus on subject i.e. Customer, Sale, Product Time variant Timestamp every key Historical data (10+ years)
  • 8. Different Problems - Different Solutions OLTP Database Data Warehouse Users Customer Knowledge worker Design Normalized, Data Integrity Denormalized Function Daily operation Decision making Data Current, Detailed Historical, Aggregated Usage Real time Ad-hoc Access Short R/W transactions Complex R/O queries Data accessed Comparatively lower Large Amounts # Records x100 x1’000’000 # Users x1’000 x10 DB Size x10 GB x100GB-TB
  • 10. B.Inmon Model Top-Down Approach Warehouse (3NF) Data Mart OLAP (MD) http://sqlschoolgr.files.wordpress.com/2012/03/clip_image003_thumb.png?w=640h=368
  • 11. R.Kimball Model Bottom-Up Approach Data Marts (3NF or MD) Warehouse OLAP (MD) http://sqlschoolgr.files.wordpress.com/2012/03/clip_image005_thumb.png?w=640h=369
  • 12. Data Vault (by Dan Linstedt) Hubs List of unique business keys Links Unique relationships between keys Satellites Hub and Link details and history
  • 13. It is irrelevant which camp you belong… as far as you understand why!
  • 14. Making Your Choice • Kimball (MD) + Start small, scale big + Faster ROI + Analytical tools - Low reusability • Data Vault • Inmon (3NF) + Structured + Easy to maintain + Easier data mining - Timely to build Backend Data Warehouse + Multiple sources; Full history; Incremental build - Up-front work; Long-term payoff; Many joins
  • 15. Dimensional modeling as de-facto standard
  • 16. Dimensions Def: The object of BI interest Keys Surrogate key Business key Hierarchical attributes Analysis and Drill Down Member properties Presentation labels Auditing information (not for end users)
  • 17. Slowly Changing Dimensions Def: Scheme for recording changes over time Type 1 - Overwrite Type 2 – Multiple Records
  • 18. Facts Def: Measurement of a business process Keys FK from all dimensional tables (in the star) PK - Composite (usually) or Surrogate Measures Numeric columns, that are of interest to the business Additive, Non-additive, Semi-additive Factless facts Auditing information (optional)
  • 20. Data Warehouse Pitfalls Admit it is not as it seems to be You need education Find what is of business value Rather than focus on performance Spend a lot of time in Extract-Transform-Load Homogenize data from different sources Find (and resolve) problems in source systems
  • 21. Prepare your Sources Data integrity Avoid redundancy Data quality Master data source Data validation Auditing CreatedDate / CreatedBy ChangedDate / ChangedBy Nightly jobs
  • 22. Dimension Design Business key with non-clustered index Include date (if dimension has history) Surrogate key The smallest possible integer Clustered index FK constraints Do not enforce (WITH NOCHECK) Document the relation Faster load Data validation Task for the Source system
  • 23. Conformed Dimensions Def. Having the same meaning and content when referred from multiple fact tables Date Dimension Partitioning best candidate Granularity Do not store every hour, when reporting daily Avoid surrogate keys Saves lookup and joins Integer representing date (yyyyMMdd, days after 1/1/1900)
  • 24. Pre-join Hierarchies Recursive relationships Fast drill and report Pre-computed aggregations Hierarchy Bridge For each dimension row 1 association with self 1 row for each subordinate
  • 25. Determine the Facts The center of a Star schema Identify subject areas Identify key business events Identify dimensions Start from OLTP logical model Identify historical requirements Identify attributes
  • 26. The Grain Def: The level of detail of a fact table What is the business objective? Fine grain - behaviour and frequency analysis Coarse grain - overall and trend analysis Aggregates DO NOT summarize prematurely DO NOT mix detail and summary DO use “summary tables”
  • 27. C3-PO is fluent in 6M forms of communication. What about your customers?
  • 28. Multinational DW What parts need translation? Where to store various language versions? How to support future languages? Dimensions Add language attribute Include text data in the dimension Problem 1: The dimension key? Replicate PK for every language Fact.DimId = Dim.Id AND Dim.Lang=[Lang] Problem 2: Storage = [Dim] x [Lang] Sub-dimension with language attributes TxtId Attr1 Attr2 LangId 1 large Yes En 2 small No En 1 stor Ja No 2 liten Nei No 3 … … …
  • 30. How Large is “Large” Is big really big?
  • 31. Partitioning Why Faster index maintenance Faster load Faster queries When Tables 10GB+ How Do not partition dimension tables Partition by date (most analysis are time-based) Eliminate partitions (WHERE [PartitionKey]=…) Avoid split and merge of existing partitions Can cause inefficient log generation
  • 32. Columnstore Index Non-clustered in SQL 2012 Clustered in SQL 2014 Pros Better data compression High performance on table scan Clustered CSI Limitations No other indexes allowed Little advantage on seek operations No XML, computed column or replication
  • 33. Extract-Transform-Load Extract data from OLTP Data transformations Data loads DW maintenance
  • 34. Efficient Load Process Use simple recovery model during data load Staging Avoid indexing Populate in parallel Maintain DW Disable indexes on load Rebuild manually after load Automatic stats update slow down SQL Server
  • 35. To SSIS, or not to SSIS ? Pros Minimum coding to none Extensive support of various data sources Parallel execution of migration tasks Better organization of the ETL process Cons Another way of thinking Hidden options T-SQL developer would do much faster Auto-generated flows need optimization Sometimes simply does not work (i.e. Sort by GUID)
  • 36.
  • 37. Takeaways Books The Data Warehouse Toolkit (3rd ed), Ralph Kimball Implementing DW with Microsoft SQL Server 2012 Data Warehousing Fundamentals, Paulraj Ponniah Articles Best Practices in Data Warehouse (Hanover Research Council) http://www.kimballgroup.com/category/design-tips/ http://sqlmag.com/business-intelligence Resources http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/ dimensional-modeling-techniques/ http://www.databaseanswers.org/data_models/index.htm