SlideShare une entreprise Scribd logo
1  sur  25
The Application of Data Vault to DW2.0 © Dan Linstedt, 2011-2012 all rights reserved
A bit about me… 2 Author, Inventor, Speaker – and part time photographer… 25+ years in the IT industry Worked in DoD, US Gov’t, Fortune 50, and so on… Find out more about the Data Vault: http://www.youtube.com/LearnDataVault http://LearnDataVault.com Full profile on http://www.LinkedIn.com/dlinstedt
Agenda Defining The Needs for the Data Vault DW2.0 Architecture DW2.0 Drivers for Data Modeling Divergence of Data Models over Time Data Vault in DW2.0 Defining the Data Vault What does one look like? Modeling in DW2.0 Applying Data Vault to Global DW2.0 Applying Data Vault to Time-Value DW2.0 Compliance in DW2.0 Applying Data Vault to System of Record The Paradox of DW2.0 Volume, Latency, Complexity,Normalization andTransformation ability 10/5/2011 Do Not Duplicate Without Written Permission 3
DW2.0 Architecture 10/5/2011 Do Not Duplicate Without Written Permission 4 Enterprise Service Bus ESB Connectivity: ,[object Object]
EII
ETL / ELT
Web ServicesCube  Processing Temporal Indexing Semantic Management Active  Data Mining Transformation Active Cleansing Unstructured Data: ,[object Object]
Plain Text
Word Docs
ImagesM E T A D A T A Interactive Tactical Data Models Must be consistently applied throughout all layers. Integrated Strategic ESB Management: ,[object Object]
Email
Spread Sheets
Transaction
Structured InformationNear-Line Extended Archival Historical Enterprise Data Warehouse
DW2.0 Drivers for Data Modeling 10/5/2011 Do Not Duplicate Without Written Permission 5 Technical Drivers Business Drivers Flexibility Compliance Volume Frequency Data Model Data Model Understandability Granularity Data Models are one of the main integration points between Technical and Business drivers. Business Keys drive understandability, and granularity Normalization drives flexibility, and frequency of load Raw data sets in the EDW/ADW drive compliance and volume
Divergence of Data Models over Time Data models (both logical and physical) have diverged from business drivers and direction over time. The Data Models have driven towards physical improvements instead of towards business improvements. The Data Vault Architecture drives data modeling back to the business sides of the house. 10/5/2011 Do Not Duplicate Without Written Permission 6
Agenda Defining The Needs for the Data Vault DW2.0 Architecture DW2.0 Drivers for Data Modeling Divergence of Data Models over Time Data Vault in DW2.0 Defining the Data Vault What does one look like? Modeling in DW2.0 Applying Data Vault to Global DW2.0 Applying Data Vault to Time-Value DW2.0 Compliance in DW2.0 Applying Data Vault to System of Record The Paradox of DW2.0 Volume, Latency, Complexity,Normalization andTransformation ability 10/5/2011 Do Not Duplicate Without Written Permission 7 Image is from - What The Bleep Do We Know?
Defining the Data Vault 10/5/2011 Do Not Duplicate Without Written Permission 8 The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business.  It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise. It is a data model that is architected specifically to meet the needs of today’s enterprise data warehouses. Defining the Data Vault TDAN.com Article
What Does One Look Like? 10/5/2011 Do Not Duplicate Without Written Permission 9 Records a history of the interaction Account Information Sat Sat Sat Link Account F(x) F(x) Sat Sat Invoice ID Sat F(x) Sat Invoice / Billing Information Customer Information Sat Elements: ,[object Object]
Link
SatelliteSat Customer F(x) Sat The impact of linking disparate systems together, is inside the shaded area.
Modeling in DW2.0 Bill Says: DW2.0 must be brought down to a very finite level of detail. The starting point for DW2.0 is the modeling process. The data model applies to the integrated sector, the near line sector, and the archival sector. The way that data warehouses are built is in an incremental manner The Data Vault specializes in: Providing finite grain at the lowest level possible, Mapping business process models to data models Existing in all sectors simultaneously without changes. Flexibility and managing change so that impacts are not a mile-wide and 10 miles deep. 10/5/2011 Do Not Duplicate Without Written Permission 10
Elements in a Data Vault Hub Unique List of Business Keys, tracked by the first time the warehouse saw them appear. Link Relationships between business keys, also representing a grain shift, or a hierarchical roll-up. Satellite Data over time, granular, and descriptive about the business key.  Also setup according to type of information, and rate of change. 10/5/2011 Do Not Duplicate Without Written Permission 11
Applying the Data Vault to Global DW2.0 10/5/2011 Do Not Duplicate Without Written Permission 12 Manufacturing EDW  in China Planning in Brazil Hub Hub Link Sat Sat Link Sat Sat Link Hub Link Hub Hub Sat Sat Sat Sat Sat Sat Sat Sat Base EDW Created in Corporate Financials in USA
Applying the Data Vault to Time-Value DW2.0 10/5/2011 Do Not Duplicate Without Written Permission 13 Satellite Data Over Time Row 1 Row 2 Row 3 Row 4 Satellite entities in the Data Vault house data over time.  They are split by type of information and rate of change.  This is an example set of data for a customer name satellite.

Contenu connexe

Tendances

(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault ModelingKent Graziano
 
Présentation data vault et bi v20120508
Présentation data vault et bi v20120508Présentation data vault et bi v20120508
Présentation data vault et bi v20120508Empowered Holdings, LLC
 
Data Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the CloudData Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the CloudMichael Rainey
 
Data Warehouse Agility Array Conference2011
Data Warehouse Agility Array Conference2011Data Warehouse Agility Array Conference2011
Data Warehouse Agility Array Conference2011Hans Hultgren
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogDATAVERSITY
 
Agile BI via Data Vault and Modelstorming
Agile BI via Data Vault and ModelstormingAgile BI via Data Vault and Modelstorming
Agile BI via Data Vault and ModelstormingDaniel Upton
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing conceptspcherukumalla
 
Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Empowered Holdings, LLC
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture DesignKujambu Murugesan
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesIvo Andreev
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptxchennakesava44
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
 
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureData Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureKent Graziano
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureDATAVERSITY
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data EngineeringHarald Erb
 

Tendances (20)

Operational Data Vault
Operational Data VaultOperational Data Vault
Operational Data Vault
 
Data Vault Introduction
Data Vault IntroductionData Vault Introduction
Data Vault Introduction
 
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
 
Présentation data vault et bi v20120508
Présentation data vault et bi v20120508Présentation data vault et bi v20120508
Présentation data vault et bi v20120508
 
Data Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the CloudData Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the Cloud
 
Data Warehouse Agility Array Conference2011
Data Warehouse Agility Array Conference2011Data Warehouse Agility Array Conference2011
Data Warehouse Agility Array Conference2011
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
 
Agile BI via Data Vault and Modelstorming
Agile BI via Data Vault and ModelstormingAgile BI via Data Vault and Modelstorming
Agile BI via Data Vault and Modelstorming
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform System
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptx
 
Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureData Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data Architecture
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data Engineering
 

En vedette

IRM UK - 2009: DV Modeling And Methodology
IRM UK - 2009: DV Modeling And MethodologyIRM UK - 2009: DV Modeling And Methodology
IRM UK - 2009: DV Modeling And MethodologyEmpowered Holdings, LLC
 
Best Practices: Data Admin & Data Management
Best Practices: Data Admin & Data ManagementBest Practices: Data Admin & Data Management
Best Practices: Data Admin & Data ManagementEmpowered Holdings, LLC
 
Oracle Database Vault
Oracle Database VaultOracle Database Vault
Oracle Database VaultKhalid ALLILI
 
Data vault seminar May 5-6 Dommel - The factory and the workshop
Data vault seminar May 5-6 Dommel - The factory and the workshopData vault seminar May 5-6 Dommel - The factory and the workshop
Data vault seminar May 5-6 Dommel - The factory and the workshopjohannesvdb
 
Atul Randive CV_IKnowSolutions_ENv2
Atul Randive CV_IKnowSolutions_ENv2Atul Randive CV_IKnowSolutions_ENv2
Atul Randive CV_IKnowSolutions_ENv2atul randive
 
Data Vault ReConnect Speed Presenting PM Part Four
Data Vault ReConnect Speed Presenting PM Part FourData Vault ReConnect Speed Presenting PM Part Four
Data Vault ReConnect Speed Presenting PM Part FourHans Hultgren
 
Lean Data Warehouse via Data Vault
Lean Data Warehouse via Data VaultLean Data Warehouse via Data Vault
Lean Data Warehouse via Data VaultDaniel Upton
 
Data Vault ReConnect Speed Presenting AM Part One
Data Vault ReConnect Speed Presenting AM Part OneData Vault ReConnect Speed Presenting AM Part One
Data Vault ReConnect Speed Presenting AM Part OneHans Hultgren
 
Data Vault ReConnect Speed Presenting AM Part Two
Data Vault ReConnect Speed Presenting AM Part TwoData Vault ReConnect Speed Presenting AM Part Two
Data Vault ReConnect Speed Presenting AM Part TwoHans Hultgren
 
Data Vault ReConnect Speed Presenting PM Part Three
Data Vault ReConnect Speed Presenting PM Part ThreeData Vault ReConnect Speed Presenting PM Part Three
Data Vault ReConnect Speed Presenting PM Part ThreeHans Hultgren
 
Guru4Pro Data Vault Best Practices
Guru4Pro Data Vault Best PracticesGuru4Pro Data Vault Best Practices
Guru4Pro Data Vault Best PracticesCGI
 
Metadaten und Data Vault (Meta Vault)
Metadaten und Data Vault (Meta Vault)Metadaten und Data Vault (Meta Vault)
Metadaten und Data Vault (Meta Vault)Andreas Buckenhofer
 
CDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
CDC und Data Vault für den Aufbau eines DWH in der AutomobilindustrieCDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
CDC und Data Vault für den Aufbau eines DWH in der AutomobilindustrieAndreas Buckenhofer
 
Data Vault: Data Warehouse Design Goes Agile
Data Vault: Data Warehouse Design Goes AgileData Vault: Data Warehouse Design Goes Agile
Data Vault: Data Warehouse Design Goes AgileDaniel Upton
 
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data PresentationAgile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data PresentationVishal Kumar
 

En vedette (19)

Data vault what's Next: Part 2
Data vault what's Next: Part 2Data vault what's Next: Part 2
Data vault what's Next: Part 2
 
IRM UK - 2009: DV Modeling And Methodology
IRM UK - 2009: DV Modeling And MethodologyIRM UK - 2009: DV Modeling And Methodology
IRM UK - 2009: DV Modeling And Methodology
 
Data vault: What's Next
Data vault: What's NextData vault: What's Next
Data vault: What's Next
 
Best Practices: Data Admin & Data Management
Best Practices: Data Admin & Data ManagementBest Practices: Data Admin & Data Management
Best Practices: Data Admin & Data Management
 
Visual Data Vault
Visual Data VaultVisual Data Vault
Visual Data Vault
 
Oracle Database Vault
Oracle Database VaultOracle Database Vault
Oracle Database Vault
 
Data vault seminar May 5-6 Dommel - The factory and the workshop
Data vault seminar May 5-6 Dommel - The factory and the workshopData vault seminar May 5-6 Dommel - The factory and the workshop
Data vault seminar May 5-6 Dommel - The factory and the workshop
 
Atul Randive CV_IKnowSolutions_ENv2
Atul Randive CV_IKnowSolutions_ENv2Atul Randive CV_IKnowSolutions_ENv2
Atul Randive CV_IKnowSolutions_ENv2
 
Data Vault ReConnect Speed Presenting PM Part Four
Data Vault ReConnect Speed Presenting PM Part FourData Vault ReConnect Speed Presenting PM Part Four
Data Vault ReConnect Speed Presenting PM Part Four
 
Lean Data Warehouse via Data Vault
Lean Data Warehouse via Data VaultLean Data Warehouse via Data Vault
Lean Data Warehouse via Data Vault
 
Data Vault ReConnect Speed Presenting AM Part One
Data Vault ReConnect Speed Presenting AM Part OneData Vault ReConnect Speed Presenting AM Part One
Data Vault ReConnect Speed Presenting AM Part One
 
Data Vault ReConnect Speed Presenting AM Part Two
Data Vault ReConnect Speed Presenting AM Part TwoData Vault ReConnect Speed Presenting AM Part Two
Data Vault ReConnect Speed Presenting AM Part Two
 
Data Vault ReConnect Speed Presenting PM Part Three
Data Vault ReConnect Speed Presenting PM Part ThreeData Vault ReConnect Speed Presenting PM Part Three
Data Vault ReConnect Speed Presenting PM Part Three
 
Guru4Pro Data Vault Best Practices
Guru4Pro Data Vault Best PracticesGuru4Pro Data Vault Best Practices
Guru4Pro Data Vault Best Practices
 
Metadaten und Data Vault (Meta Vault)
Metadaten und Data Vault (Meta Vault)Metadaten und Data Vault (Meta Vault)
Metadaten und Data Vault (Meta Vault)
 
CDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
CDC und Data Vault für den Aufbau eines DWH in der AutomobilindustrieCDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
CDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
 
Data Vault: Data Warehouse Design Goes Agile
Data Vault: Data Warehouse Design Goes AgileData Vault: Data Warehouse Design Goes Agile
Data Vault: Data Warehouse Design Goes Agile
 
Big Data Modeling
Big Data ModelingBig Data Modeling
Big Data Modeling
 
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data PresentationAgile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
 

Similaire à Data Vault and DW2.0

Data Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroData Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroDenodo
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes Denodo
 
Introduction to data vault ilja dmitrijev
Introduction to data vault   ilja dmitrijevIntroduction to data vault   ilja dmitrijev
Introduction to data vault ilja dmitrijevIlja Dmitrijevs
 
Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)Denodo
 
Data Ninja Webinar Series: Accelerating Business Value with Data Virtualizati...
Data Ninja Webinar Series: Accelerating Business Value with Data Virtualizati...Data Ninja Webinar Series: Accelerating Business Value with Data Virtualizati...
Data Ninja Webinar Series: Accelerating Business Value with Data Virtualizati...Denodo
 
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data LakesData Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data LakesDenodo
 
Data API as a Foundation for Systems of Engagement
Data API as a Foundation for Systems of EngagementData API as a Foundation for Systems of Engagement
Data API as a Foundation for Systems of EngagementVictor Olex
 
Data warehouse 2.0 and sql server architecture and vision
Data warehouse 2.0 and sql server architecture and visionData warehouse 2.0 and sql server architecture and vision
Data warehouse 2.0 and sql server architecture and visionKlaudiia Jacome
 
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?Denodo
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonCapgemini
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
The technology of the business data lake
The technology of the business data lakeThe technology of the business data lake
The technology of the business data lakeCapgemini
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaJeffrey T. Pollock
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionDenodo
 
Data Vault 2.0 Demystified: East Coast Tour
Data Vault 2.0 Demystified: East Coast TourData Vault 2.0 Demystified: East Coast Tour
Data Vault 2.0 Demystified: East Coast TourWhereScape
 
Sql server briefing sept
Sql server briefing septSql server briefing sept
Sql server briefing septMark Kromer
 
An Overview of Data Lake
An Overview of Data LakeAn Overview of Data Lake
An Overview of Data LakeIRJET Journal
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerDataWorks Summit
 

Similaire à Data Vault and DW2.0 (20)

Data vault
Data vaultData vault
Data vault
 
Data Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroData Virtualization: From Zero to Hero
Data Virtualization: From Zero to Hero
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes
 
Introduction to data vault ilja dmitrijev
Introduction to data vault   ilja dmitrijevIntroduction to data vault   ilja dmitrijev
Introduction to data vault ilja dmitrijev
 
Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)
 
Data Ninja Webinar Series: Accelerating Business Value with Data Virtualizati...
Data Ninja Webinar Series: Accelerating Business Value with Data Virtualizati...Data Ninja Webinar Series: Accelerating Business Value with Data Virtualizati...
Data Ninja Webinar Series: Accelerating Business Value with Data Virtualizati...
 
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data LakesData Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
 
Data API as a Foundation for Systems of Engagement
Data API as a Foundation for Systems of EngagementData API as a Foundation for Systems of Engagement
Data API as a Foundation for Systems of Engagement
 
Data warehouse 2.0 and sql server architecture and vision
Data warehouse 2.0 and sql server architecture and visionData warehouse 2.0 and sql server architecture and vision
Data warehouse 2.0 and sql server architecture and vision
 
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A Comparison
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
The technology of the business data lake
The technology of the business data lakeThe technology of the business data lake
The technology of the business data lake
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An Introduction
 
Data Vault 2.0 Demystified: East Coast Tour
Data Vault 2.0 Demystified: East Coast TourData Vault 2.0 Demystified: East Coast Tour
Data Vault 2.0 Demystified: East Coast Tour
 
Sql server briefing sept
Sql server briefing septSql server briefing sept
Sql server briefing sept
 
An Overview of Data Lake
An Overview of Data LakeAn Overview of Data Lake
An Overview of Data Lake
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
 

Dernier

Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Centuryrwgiffor
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...anilsa9823
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst SummitHolger Mueller
 
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...Aggregage
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyEthan lee
 
A305_A2_file_Batkhuu progress report.pdf
A305_A2_file_Batkhuu progress report.pdfA305_A2_file_Batkhuu progress report.pdf
A305_A2_file_Batkhuu progress report.pdftbatkhuu1
 
Unlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfUnlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfOnline Income Engine
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLSeo
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Lviv Startup Club
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communicationskarancommunications
 
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒anilsa9823
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxWorkforce Group
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfPaul Menig
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...Paul Menig
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesDipal Arora
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayNZSG
 
Understanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key InsightsUnderstanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key Insightsseri bangash
 
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxB.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxpriyanshujha201
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMRavindra Nath Shukla
 

Dernier (20)

Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Century
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst Summit
 
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
 
A305_A2_file_Batkhuu progress report.pdf
A305_A2_file_Batkhuu progress report.pdfA305_A2_file_Batkhuu progress report.pdf
A305_A2_file_Batkhuu progress report.pdf
 
Unlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfUnlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdf
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communications
 
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptx
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdf
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
Understanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key InsightsUnderstanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key Insights
 
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxB.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSM
 

Data Vault and DW2.0

  • 1. The Application of Data Vault to DW2.0 © Dan Linstedt, 2011-2012 all rights reserved
  • 2. A bit about me… 2 Author, Inventor, Speaker – and part time photographer… 25+ years in the IT industry Worked in DoD, US Gov’t, Fortune 50, and so on… Find out more about the Data Vault: http://www.youtube.com/LearnDataVault http://LearnDataVault.com Full profile on http://www.LinkedIn.com/dlinstedt
  • 3. Agenda Defining The Needs for the Data Vault DW2.0 Architecture DW2.0 Drivers for Data Modeling Divergence of Data Models over Time Data Vault in DW2.0 Defining the Data Vault What does one look like? Modeling in DW2.0 Applying Data Vault to Global DW2.0 Applying Data Vault to Time-Value DW2.0 Compliance in DW2.0 Applying Data Vault to System of Record The Paradox of DW2.0 Volume, Latency, Complexity,Normalization andTransformation ability 10/5/2011 Do Not Duplicate Without Written Permission 3
  • 4.
  • 5. EII
  • 7.
  • 10.
  • 11. Email
  • 14. Structured InformationNear-Line Extended Archival Historical Enterprise Data Warehouse
  • 15. DW2.0 Drivers for Data Modeling 10/5/2011 Do Not Duplicate Without Written Permission 5 Technical Drivers Business Drivers Flexibility Compliance Volume Frequency Data Model Data Model Understandability Granularity Data Models are one of the main integration points between Technical and Business drivers. Business Keys drive understandability, and granularity Normalization drives flexibility, and frequency of load Raw data sets in the EDW/ADW drive compliance and volume
  • 16. Divergence of Data Models over Time Data models (both logical and physical) have diverged from business drivers and direction over time. The Data Models have driven towards physical improvements instead of towards business improvements. The Data Vault Architecture drives data modeling back to the business sides of the house. 10/5/2011 Do Not Duplicate Without Written Permission 6
  • 17. Agenda Defining The Needs for the Data Vault DW2.0 Architecture DW2.0 Drivers for Data Modeling Divergence of Data Models over Time Data Vault in DW2.0 Defining the Data Vault What does one look like? Modeling in DW2.0 Applying Data Vault to Global DW2.0 Applying Data Vault to Time-Value DW2.0 Compliance in DW2.0 Applying Data Vault to System of Record The Paradox of DW2.0 Volume, Latency, Complexity,Normalization andTransformation ability 10/5/2011 Do Not Duplicate Without Written Permission 7 Image is from - What The Bleep Do We Know?
  • 18. Defining the Data Vault 10/5/2011 Do Not Duplicate Without Written Permission 8 The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise. It is a data model that is architected specifically to meet the needs of today’s enterprise data warehouses. Defining the Data Vault TDAN.com Article
  • 19.
  • 20. Link
  • 21. SatelliteSat Customer F(x) Sat The impact of linking disparate systems together, is inside the shaded area.
  • 22. Modeling in DW2.0 Bill Says: DW2.0 must be brought down to a very finite level of detail. The starting point for DW2.0 is the modeling process. The data model applies to the integrated sector, the near line sector, and the archival sector. The way that data warehouses are built is in an incremental manner The Data Vault specializes in: Providing finite grain at the lowest level possible, Mapping business process models to data models Existing in all sectors simultaneously without changes. Flexibility and managing change so that impacts are not a mile-wide and 10 miles deep. 10/5/2011 Do Not Duplicate Without Written Permission 10
  • 23. Elements in a Data Vault Hub Unique List of Business Keys, tracked by the first time the warehouse saw them appear. Link Relationships between business keys, also representing a grain shift, or a hierarchical roll-up. Satellite Data over time, granular, and descriptive about the business key. Also setup according to type of information, and rate of change. 10/5/2011 Do Not Duplicate Without Written Permission 11
  • 24. Applying the Data Vault to Global DW2.0 10/5/2011 Do Not Duplicate Without Written Permission 12 Manufacturing EDW in China Planning in Brazil Hub Hub Link Sat Sat Link Sat Sat Link Hub Link Hub Hub Sat Sat Sat Sat Sat Sat Sat Sat Base EDW Created in Corporate Financials in USA
  • 25. Applying the Data Vault to Time-Value DW2.0 10/5/2011 Do Not Duplicate Without Written Permission 13 Satellite Data Over Time Row 1 Row 2 Row 3 Row 4 Satellite entities in the Data Vault house data over time. They are split by type of information and rate of change. This is an example set of data for a customer name satellite.
  • 26. Batch and Real-Time Data Arrival 10/5/2011 Do Not Duplicate Without Written Permission 14 All Inserts All the time Transaction ID Date Stamp Customer Account # Amount Sat Transaction Type Hub Customer Link Transaction Hub Acct Sat Customer Sat Acct 3, 6 or 12 Hr Load Window Batch Load Customer Info Acct Data
  • 27. Star Schema Real-Time Data Issues 10/5/2011 Do Not Duplicate Without Written Permission 15 Updates are REQUIRED! Transaction ID Date Stamp Customer Account # Amount Type 3, 6 or 12 Hr Load Window Dimension Customer Fact Transaction Dimension Account Batch Load Customer Info Acct Data Cleansing & Quality must occur before the data can reach the target tables, cleansing and quality introduce unwanted latency!
  • 28. Compliance in DW2.0 10/5/2011 Do Not Duplicate Without Written Permission 16 Changes to Source Information Source Systems EDW / ADW Data Vault Data Marts Data Delivery Raw Detail = auditable Loads in Real-Time or in Batch Integrated by Business Key Flexible, allows business changes (with little to no impact) No delay in loading data Data type conformity Semantic Integration True Marts Raw Integration Business Rules User or Auditor Continuous Data Improvement Error Mart Quality Direction of Information Flow Master Data (Operational)
  • 29. Applying the Data Vault to System Of Record 10/5/2011 Do Not Duplicate Without Written Permission 17 Master Data or Conformed Dimensions Normalized EDW Source Systems SOR Definition 2 SOR Definition 3 SOR Definition 1 SOR 1 Data Capture, Data Produced by system algorithms SOR 2 Raw Detailed Integrated Data over time, Integrated by Horizontal (functional) Business Key. Auditable. SOR 3 Current view of the business, merged, quality cleansed, single copy, single source, feeds operational systems.
  • 30. DW2.0 Paradoxes DW2.0 incorporates: Unstructured, Semi-Structured, Real-Time, and Batch Data Global views All of which drive volumes of data. Volume causes latency in transformation. Volume is directly proportional to transformation complexity. Real-Time data arrival is inversely proportional to complexity and volume. Time for “quality, cleansing, and transformation” on the way in to the EDW diminishes as near-real-time is approached, or massive volumes of batch data are found within a shrinking batch window. Transformation can destroy data audit ability and compliance of the EDW / ADW. 10/5/2011 Do Not Duplicate Without Written Permission 18
  • 31. DW2.0 Paradoxes - Imagery 10/5/2011 Do Not Duplicate Without Written Permission 19 Drives DW2.0 Real-Time Transactions Unstructured Data Low-Level Grain Pushes Increases Low Latency Volume Fights Requires Merging, Quality, Cleansing Fights Data Model Denormalization Fights Data Model Normalization & Raw Details Inhibits Requires Inhibits Auditability & Compliance Provides
  • 32. DW2.0 Paradox Hypothesis As we reach near-real time, the ability to transform data and “wait” for parent dependencies directly decreases, the data decay rates increase, and therefore can cause data death if not processed in time. Normalization of the data model increases flexibility, and scalability. The closer we get to near-real-time, the more normalized the data model in the EDW/ADW must become. In order to process high volumes of batch data extremely fast, the “business transformations” must be removed from the load stream of the EDW. 10/5/2011 Do Not Duplicate Without Written Permission 20
  • 33. Data Vault Volumetrics 10/5/2011 Do Not Duplicate Without Written Permission 21 Volumetrics (10% null Data) Upon Initial Investigation, the 12 month growth rate for new customers is 197.4 MB per year…. Now let’s factor in the DELTA’s.
  • 34. Data Vault Growth 10/5/2011 Do Not Duplicate Without Written Permission 22 Volumetrics (10% null Data) – Delta Growth Only Original Dimension: 497.16 MB per Year New Data Vault:317.03 MB Per Year
  • 35. Data Vault VS Dimension Growth 10/5/2011 Do Not Duplicate Without Written Permission 23 How does the extensive growth rate affect queries?
  • 36. Summarization Business: Lack of a single view of a customer, product, service, etc... Lack of visibility into ALL information across the enterprise. Competition does it better, faster, cheaper. Unable to identify and forecast business trends and their impacts. WHERE’S THE KNOWLEDGE? OR IS IT JUST ALL DATA? 10/5/2011 Do Not Duplicate Without Written Permission 24 Technical: Near-Real-Time (Active) Huge Data Volumes Massive Data Dis-Integration Spread-Marts Convergence of Operational and Strategic Questions Duplication of data in the ODS, Warehouse, and Data Marts! Dimension-itis!! ODS Ulcer! Fact Table Granularity JUNK tables, Helper Tables
  • 37. Where To Learn More The Technical Modeling Book: http://LearnDataVault.com The Discussion Forums: & eventshttp://LinkedIn.com – Data Vault Discussions Contact me:http://DanLinstedt.com - web siteDanLinstedt@gmail.com - email World wide User Group (Free)http://dvusergroup.com 25