SlideShare une entreprise Scribd logo
1  sur  24
Data WarehouseDesign Considerations 
Ram Kedem
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Slowly Changing Dimensions 
•Type 1 SCD 
•OLTP updates are moved into the DW 
•Any changes overwrites the current DW data 
•Past actual data history is lost 
•Historical data may be change if it doesn’t contain important business details (such as store location)
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Slowly Changing Dimensions 
•Type 2 SCD 
•Data is not overwritten in the DW 
•A new row for the customer must be inserted 
•Usually created Primary Key Issues 
•For example –if customer details got changed, this approach suggest you insert another row in the Dimension for the same customer 
•You must add a Surrogate Key (DWH Key) 
•Incremented number for each update, same idea as Primary Key that consists from two columns. 
•You must also add another column or two 
•To flag the current value 
•To provide date / time perspective
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Slowly Changing Dimensions 
•Type 1 SCD
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Slowly Changing Dimensions 
•Type 2 SCD
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Indexing 
•Indexing affects how data is stored and managed in SQL Server 
•There are four main indexing options in SQL Server 
•Clustered Index 
•Non Clustered Index 
•Filtered Non Clustered Index 
•Columnstoreindex (include)
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Indexing 
•Clustered Index 
•Determines the physical storage order of the data 
•There can be only one clustered index on a table 
•Non Clustered Index 
•Sorts data in a column or columns and stores pointers to the actual data row 
•We can have up to 999 non clustered indexes on a table
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Indexing 
•Filtered Non Clustered Index 
•Creates a non clustered index on a subset of values in a column 
•ColumnstoreIndex 
•A non clustered index placed on a single column 
•The column is stored and searched speratelyfrom the data row 
•Adding a columnstoreindex to a column makes the column read- only 
•https://www.simple-talk.com/sql/database- administration/columnstore-indexes-in-sql-server-2012/
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
ColumnstoreIndex 
CREATE NONCLUSTERED COLUMNSTORE INDEX csi_products 
ON dbo.products 
(productName, UnitPrice, unitsinstock); 
SELECT productName, UnitPrice, unitsinstock 
FROM products ;
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Indexing the Data Warehouse 
•Indexing in the Data Warehouse can be tricky 
•Too few indexes will allow data loads to be quick But query response time will be slow 
•Too many indexes slow down load, and storage requirements go up But query response is good
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Indexing the Data Warehouse 
•General rule of thumb 
•Dimension tables 
•Place a clustered index on the surrogate key 
•If the table has a lot of columns, create non-clustered indexes on the most popular columns 
•Fact tables 
•Place a non-clustered index on the single-column foreign keys to the dimension tables 
•If the primary key is a composite of all the dimension foreign keys, make it a non-unique clustered index.
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Index Views 
•What is a view 
•A result set of a query that is a virtual table 
•The virtual table is not stored permanently in the database. 
•The view can be referenced like a table in TSQL 
•Indexing a view 
•You can create a unique clustered index on a view 
•The view result set get stored in the database, just like a regular table with a clustered index.
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Index Views 
•Advantages 
•Improve the performance of joins and aggregations that process many rows
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Data Compression 
•SQL Server 2012 Supports data compression 
•Data compression reduces the size of the database 
•Packs more data onto few data pages 
•Fewer data page reads are required to satisfy queries 
•Lower IO means faster response; lower processing load on the server 
•Extra CPU resource are required for data decompression / compression 
•DWH usually doesn’t have much updates (other than Bulk Loading)
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Data Compression 
•SQL Server 2012 supports three compression types 
•Page compression 
•Focuses on duplicated values within the data page 
•Stores one value, place a pointer at all other locations 
•Row Compression 
•Remove any unused bytes in a fixed data type 
•CHAR(25) 
•Unicode compression 
•Reduces storage space for Unicode data that doesn’t require that space
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Data Compression 
•Which compression should you use 
•Page compression 
•It automatically uses row compression when page compression is used 
•If you use row compression, you cant use page compression 
•Facttables usually benefit the most from compression 
•Compression is only available in SQL Server Enterprise Edition.
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Data Lineage 
•What is data lineage 
•Data origination and flow details 
•Where it is from, where it is going, how it is transformed in the process 
•Same concept as comments in programming
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Data Lineage 
•Why do we need Data Lineage 
•To provide meta-data context in the DWH 
•Future business rules may change, affecting some data 
•Making it invalid 
•Making it suspect 
•Making it more important 
•Data lineage allows us to identify this data
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Data Lineage 
•Two main options for adding Data Lineage 
•SSIS system variables 
•TSQL System functions
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Data Lineage using TSQL 
SELECT 
APP_NAME () , 
DATABASE_PRINCIPAL_ID (), 
USER_NAME () 
SUSER_NAME (), 
GETDATE () , 
CURRENT_TIMESTAMP () , 
CONNECTIONPROPERTY (‘Client_net_address’)
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Using Partitions 
•Fact tables become very large tables over time 
•Very large database tables present serious challenges 
•What if you need to delete large portion of the data ? 
•TRUNCATE TABLE command performs deletion with minimal logging, but it deletes the entire table. 
•Large data inserts become time consuming 
•Index maintenance and storage can become problematic 
•Table partitions deal with all these issues
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Using Partitions 
•What is a table partition 
•A large table is stored in multiple files 
•Divided by rows (based on condition) 
•Usually date / time 
•SQL SERVER 2012 allows up to 15,000 partitions on a single table 
•Partitions and data are managed in the background
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Using Partitions
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Identifying our Dimensions / Fact Tables

Contenu connexe

Tendances (20)

How to build a data dictionary
How to build a data dictionaryHow to build a data dictionary
How to build a data dictionary
 
Types of Database Models
Types of Database ModelsTypes of Database Models
Types of Database Models
 
CS8080 IRT UNIT - III SLIDES IN PDF.pdf
CS8080  IRT UNIT - III  SLIDES IN PDF.pdfCS8080  IRT UNIT - III  SLIDES IN PDF.pdf
CS8080 IRT UNIT - III SLIDES IN PDF.pdf
 
FILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMSFILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMS
 
Database systems introduction
Database systems introductionDatabase systems introduction
Database systems introduction
 
RDBMS
RDBMSRDBMS
RDBMS
 
Database systems
Database systemsDatabase systems
Database systems
 
Lecture2 oracle ppt
Lecture2 oracle pptLecture2 oracle ppt
Lecture2 oracle ppt
 
Introduction to Databases
Introduction to DatabasesIntroduction to Databases
Introduction to Databases
 
Data warehouse
Data warehouse Data warehouse
Data warehouse
 
File system structure
File system structureFile system structure
File system structure
 
Deductive databases
Deductive databasesDeductive databases
Deductive databases
 
Parallel Database
Parallel DatabaseParallel Database
Parallel Database
 
Fundamentals of Database system
Fundamentals of Database systemFundamentals of Database system
Fundamentals of Database system
 
Database architecture
Database architectureDatabase architecture
Database architecture
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
 
Transaction Processing in DBMS.pptx
Transaction Processing in DBMS.pptxTransaction Processing in DBMS.pptx
Transaction Processing in DBMS.pptx
 
Information retrieval (introduction)
Information  retrieval (introduction) Information  retrieval (introduction)
Information retrieval (introduction)
 
SQL Commands
SQL Commands SQL Commands
SQL Commands
 
Data Dictionary
Data DictionaryData Dictionary
Data Dictionary
 

Similaire à Data Warehouse Design Considerations

Data Warehouse Basics
Data Warehouse BasicsData Warehouse Basics
Data Warehouse BasicsRam Kedem
 
Managing and Configuring Databases
Managing and Configuring DatabasesManaging and Configuring Databases
Managing and Configuring DatabasesRam Kedem
 
Designing dashboards for performance shridhar wip 040613
Designing dashboards for performance shridhar wip 040613Designing dashboards for performance shridhar wip 040613
Designing dashboards for performance shridhar wip 040613Mrunal Shridhar
 
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASTechnical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASAshnikbiz
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauSam Palani
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server DatabasesColdFusionConference
 
Column Statistics in Hive
Column Statistics in HiveColumn Statistics in Hive
Column Statistics in Hivevshreepadma
 
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...Amazon Web Services
 
Powering GIS Application with PostgreSQL and Postgres Plus
Powering GIS Application with PostgreSQL and Postgres Plus Powering GIS Application with PostgreSQL and Postgres Plus
Powering GIS Application with PostgreSQL and Postgres Plus Ashnikbiz
 
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...Amazon Web Services
 
Optimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak PerformanceOptimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak PerformanceAmazon Web Services
 
Introduction to SQL
Introduction to SQLIntroduction to SQL
Introduction to SQLRam Kedem
 
Ashnik EnterpriseDB PostgreSQL - A real alternative to Oracle
Ashnik EnterpriseDB PostgreSQL - A real alternative to Oracle Ashnik EnterpriseDB PostgreSQL - A real alternative to Oracle
Ashnik EnterpriseDB PostgreSQL - A real alternative to Oracle Ashnikbiz
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopDataWorks Summit
 
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users Happy
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users HappyGeek Sync | Top 5 Tips to Keep Always On Always Humming and Users Happy
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users HappyIDERA Software
 
What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?FlyData Inc.
 

Similaire à Data Warehouse Design Considerations (20)

Data Warehouse Basics
Data Warehouse BasicsData Warehouse Basics
Data Warehouse Basics
 
Managing and Configuring Databases
Managing and Configuring DatabasesManaging and Configuring Databases
Managing and Configuring Databases
 
Designing dashboards for performance shridhar wip 040613
Designing dashboards for performance shridhar wip 040613Designing dashboards for performance shridhar wip 040613
Designing dashboards for performance shridhar wip 040613
 
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASTechnical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPAS
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server Databases
 
Column Statistics in Hive
Column Statistics in HiveColumn Statistics in Hive
Column Statistics in Hive
 
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
 
Powering GIS Application with PostgreSQL and Postgres Plus
Powering GIS Application with PostgreSQL and Postgres Plus Powering GIS Application with PostgreSQL and Postgres Plus
Powering GIS Application with PostgreSQL and Postgres Plus
 
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
 
Optimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak PerformanceOptimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak Performance
 
Introduction to SQL
Introduction to SQLIntroduction to SQL
Introduction to SQL
 
In Memory Cahce Structure
In Memory Cahce StructureIn Memory Cahce Structure
In Memory Cahce Structure
 
Ashnik EnterpriseDB PostgreSQL - A real alternative to Oracle
Ashnik EnterpriseDB PostgreSQL - A real alternative to Oracle Ashnik EnterpriseDB PostgreSQL - A real alternative to Oracle
Ashnik EnterpriseDB PostgreSQL - A real alternative to Oracle
 
Redis meetup
Redis meetupRedis meetup
Redis meetup
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on Hadoop
 
Redis - Partitioning
Redis - PartitioningRedis - Partitioning
Redis - Partitioning
 
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users Happy
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users HappyGeek Sync | Top 5 Tips to Keep Always On Always Humming and Users Happy
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users Happy
 
What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?
 
Revision
RevisionRevision
Revision
 

Plus de Ram Kedem

Impala use case @ edge
Impala use case @ edgeImpala use case @ edge
Impala use case @ edgeRam Kedem
 
Advanced SQL Webinar
Advanced SQL WebinarAdvanced SQL Webinar
Advanced SQL WebinarRam Kedem
 
Managing oracle Database Instance
Managing oracle Database InstanceManaging oracle Database Instance
Managing oracle Database InstanceRam Kedem
 
Power Pivot and Power View
Power Pivot and Power ViewPower Pivot and Power View
Power Pivot and Power ViewRam Kedem
 
Data Mining in SSAS
Data Mining in SSASData Mining in SSAS
Data Mining in SSASRam Kedem
 
Data mining In SSAS
Data mining In SSASData mining In SSAS
Data mining In SSASRam Kedem
 
SQL Injections - Oracle
SQL Injections - OracleSQL Injections - Oracle
SQL Injections - OracleRam Kedem
 
SSAS Attributes
SSAS AttributesSSAS Attributes
SSAS AttributesRam Kedem
 
DDL Practice (Hebrew)
DDL Practice (Hebrew)DDL Practice (Hebrew)
DDL Practice (Hebrew)Ram Kedem
 
DML Practice (Hebrew)
DML Practice (Hebrew)DML Practice (Hebrew)
DML Practice (Hebrew)Ram Kedem
 
Exploring Oracle Database Architecture (Hebrew)
Exploring Oracle Database Architecture (Hebrew)Exploring Oracle Database Architecture (Hebrew)
Exploring Oracle Database Architecture (Hebrew)Ram Kedem
 
Deploy SSRS Project - SQL Server 2014
Deploy SSRS Project - SQL Server 2014Deploy SSRS Project - SQL Server 2014
Deploy SSRS Project - SQL Server 2014Ram Kedem
 
Pig - Processing XML data
Pig - Processing XML dataPig - Processing XML data
Pig - Processing XML dataRam Kedem
 
SSAS Cubes & Hierarchies
SSAS Cubes & HierarchiesSSAS Cubes & Hierarchies
SSAS Cubes & HierarchiesRam Kedem
 
SSRS Basic Parameters
SSRS Basic ParametersSSRS Basic Parameters
SSRS Basic ParametersRam Kedem
 
SSRS Conditional Formatting
SSRS Conditional FormattingSSRS Conditional Formatting
SSRS Conditional FormattingRam Kedem
 
SSRS Calculated Fields
SSRS Calculated FieldsSSRS Calculated Fields
SSRS Calculated FieldsRam Kedem
 

Plus de Ram Kedem (20)

Impala use case @ edge
Impala use case @ edgeImpala use case @ edge
Impala use case @ edge
 
Advanced SQL Webinar
Advanced SQL WebinarAdvanced SQL Webinar
Advanced SQL Webinar
 
Managing oracle Database Instance
Managing oracle Database InstanceManaging oracle Database Instance
Managing oracle Database Instance
 
Power Pivot and Power View
Power Pivot and Power ViewPower Pivot and Power View
Power Pivot and Power View
 
Data Mining in SSAS
Data Mining in SSASData Mining in SSAS
Data Mining in SSAS
 
Data mining In SSAS
Data mining In SSASData mining In SSAS
Data mining In SSAS
 
SQL Injections - Oracle
SQL Injections - OracleSQL Injections - Oracle
SQL Injections - Oracle
 
SSAS Attributes
SSAS AttributesSSAS Attributes
SSAS Attributes
 
SSRS Matrix
SSRS MatrixSSRS Matrix
SSRS Matrix
 
DDL Practice (Hebrew)
DDL Practice (Hebrew)DDL Practice (Hebrew)
DDL Practice (Hebrew)
 
DML Practice (Hebrew)
DML Practice (Hebrew)DML Practice (Hebrew)
DML Practice (Hebrew)
 
Exploring Oracle Database Architecture (Hebrew)
Exploring Oracle Database Architecture (Hebrew)Exploring Oracle Database Architecture (Hebrew)
Exploring Oracle Database Architecture (Hebrew)
 
Deploy SSRS Project - SQL Server 2014
Deploy SSRS Project - SQL Server 2014Deploy SSRS Project - SQL Server 2014
Deploy SSRS Project - SQL Server 2014
 
Pig - Processing XML data
Pig - Processing XML dataPig - Processing XML data
Pig - Processing XML data
 
SSAS Cubes & Hierarchies
SSAS Cubes & HierarchiesSSAS Cubes & Hierarchies
SSAS Cubes & Hierarchies
 
SSRS Basic Parameters
SSRS Basic ParametersSSRS Basic Parameters
SSRS Basic Parameters
 
SSRS Gauges
SSRS GaugesSSRS Gauges
SSRS Gauges
 
SSRS Conditional Formatting
SSRS Conditional FormattingSSRS Conditional Formatting
SSRS Conditional Formatting
 
SSRS Calculated Fields
SSRS Calculated FieldsSSRS Calculated Fields
SSRS Calculated Fields
 
SSRS Groups
SSRS GroupsSSRS Groups
SSRS Groups
 

Dernier

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Dernier (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Data Warehouse Design Considerations

  • 2. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Slowly Changing Dimensions •Type 1 SCD •OLTP updates are moved into the DW •Any changes overwrites the current DW data •Past actual data history is lost •Historical data may be change if it doesn’t contain important business details (such as store location)
  • 3. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Slowly Changing Dimensions •Type 2 SCD •Data is not overwritten in the DW •A new row for the customer must be inserted •Usually created Primary Key Issues •For example –if customer details got changed, this approach suggest you insert another row in the Dimension for the same customer •You must add a Surrogate Key (DWH Key) •Incremented number for each update, same idea as Primary Key that consists from two columns. •You must also add another column or two •To flag the current value •To provide date / time perspective
  • 4. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Slowly Changing Dimensions •Type 1 SCD
  • 5. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Slowly Changing Dimensions •Type 2 SCD
  • 6. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Indexing •Indexing affects how data is stored and managed in SQL Server •There are four main indexing options in SQL Server •Clustered Index •Non Clustered Index •Filtered Non Clustered Index •Columnstoreindex (include)
  • 7. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Indexing •Clustered Index •Determines the physical storage order of the data •There can be only one clustered index on a table •Non Clustered Index •Sorts data in a column or columns and stores pointers to the actual data row •We can have up to 999 non clustered indexes on a table
  • 8. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Indexing •Filtered Non Clustered Index •Creates a non clustered index on a subset of values in a column •ColumnstoreIndex •A non clustered index placed on a single column •The column is stored and searched speratelyfrom the data row •Adding a columnstoreindex to a column makes the column read- only •https://www.simple-talk.com/sql/database- administration/columnstore-indexes-in-sql-server-2012/
  • 9. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com ColumnstoreIndex CREATE NONCLUSTERED COLUMNSTORE INDEX csi_products ON dbo.products (productName, UnitPrice, unitsinstock); SELECT productName, UnitPrice, unitsinstock FROM products ;
  • 10. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Indexing the Data Warehouse •Indexing in the Data Warehouse can be tricky •Too few indexes will allow data loads to be quick But query response time will be slow •Too many indexes slow down load, and storage requirements go up But query response is good
  • 11. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Indexing the Data Warehouse •General rule of thumb •Dimension tables •Place a clustered index on the surrogate key •If the table has a lot of columns, create non-clustered indexes on the most popular columns •Fact tables •Place a non-clustered index on the single-column foreign keys to the dimension tables •If the primary key is a composite of all the dimension foreign keys, make it a non-unique clustered index.
  • 12. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Index Views •What is a view •A result set of a query that is a virtual table •The virtual table is not stored permanently in the database. •The view can be referenced like a table in TSQL •Indexing a view •You can create a unique clustered index on a view •The view result set get stored in the database, just like a regular table with a clustered index.
  • 13. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Index Views •Advantages •Improve the performance of joins and aggregations that process many rows
  • 14. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Data Compression •SQL Server 2012 Supports data compression •Data compression reduces the size of the database •Packs more data onto few data pages •Fewer data page reads are required to satisfy queries •Lower IO means faster response; lower processing load on the server •Extra CPU resource are required for data decompression / compression •DWH usually doesn’t have much updates (other than Bulk Loading)
  • 15. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Data Compression •SQL Server 2012 supports three compression types •Page compression •Focuses on duplicated values within the data page •Stores one value, place a pointer at all other locations •Row Compression •Remove any unused bytes in a fixed data type •CHAR(25) •Unicode compression •Reduces storage space for Unicode data that doesn’t require that space
  • 16. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Data Compression •Which compression should you use •Page compression •It automatically uses row compression when page compression is used •If you use row compression, you cant use page compression •Facttables usually benefit the most from compression •Compression is only available in SQL Server Enterprise Edition.
  • 17. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Data Lineage •What is data lineage •Data origination and flow details •Where it is from, where it is going, how it is transformed in the process •Same concept as comments in programming
  • 18. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Data Lineage •Why do we need Data Lineage •To provide meta-data context in the DWH •Future business rules may change, affecting some data •Making it invalid •Making it suspect •Making it more important •Data lineage allows us to identify this data
  • 19. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Data Lineage •Two main options for adding Data Lineage •SSIS system variables •TSQL System functions
  • 20. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Data Lineage using TSQL SELECT APP_NAME () , DATABASE_PRINCIPAL_ID (), USER_NAME () SUSER_NAME (), GETDATE () , CURRENT_TIMESTAMP () , CONNECTIONPROPERTY (‘Client_net_address’)
  • 21. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Using Partitions •Fact tables become very large tables over time •Very large database tables present serious challenges •What if you need to delete large portion of the data ? •TRUNCATE TABLE command performs deletion with minimal logging, but it deletes the entire table. •Large data inserts become time consuming •Index maintenance and storage can become problematic •Table partitions deal with all these issues
  • 22. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Using Partitions •What is a table partition •A large table is stored in multiple files •Divided by rows (based on condition) •Usually date / time •SQL SERVER 2012 allows up to 15,000 partitions on a single table •Partitions and data are managed in the background
  • 23. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Using Partitions
  • 24. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Identifying our Dimensions / Fact Tables