SlideShare une entreprise Scribd logo
1  sur  88
Data Vault Model &Methodology © Dan Linstedt, 2011-2012 all rights reserved 1
Agenda Introduction – why are you here? What is a Data Vault?  Where does it come from? Star Schema, 3nf, and Data Vault pros and cons AS AN EDW solution.. When is a Data Vault a good fit? Benefits of Data Vault Modeling & Methodology <BREAK> When to NOT use a Data Vault Fundamental Paradigm Shift Business Keys & Business Processes Technical Review Query Performance (PIT & Bridge) What wasn’t covered in this presentation… 2
A bit about me… 3 Author, Inventor, Speaker – and part time photographer… 25+ years in the IT industry Worked in DoD, US Gov’t, Fortune 50, and so on… Find out more about the Data Vault: http://www.youtube.com/LearnDataVault http://LearnDataVault.com Full profile on http://www.LinkedIn.com/dlinstedt
Why Are YOU Here? 4 Your Expectations? Your Questions? Your Background? Areas of Interest? Biggest question: What are the top 3 pains your current EDW / BI solution is experiencing?
What is it?Where did it come from?  Defining the Data Vault Space 5
Data Vault Time Line E.F. Codd invented relational modeling 1976 Dr Peter Chen Created E-R Diagramming 1990 – Dan Linstedt Begins R&D on Data Vault Modeling Chris Date and Hugh Darwen  Maintained and Refined Modeling Mid 70’s AC Nielsen  Popularized Dimension & Fact Terms 1970 2000 1960 1980 1990 Late 80’s – Barry Devlin and Dr Kimball Release “Business Data Warehouse” Early 70’s Bill Inmon Began Discussing Data Warehousing Mid 80’s Bill Inmon Popularizes Data Warehousing Mid 60’s Dimension & Fact Modeling  presented by General Mills and Dartmouth University 2000 – Dan Linstedt releases first 5 articles on Data Vault Modeling Mid – Late 80’s Dr Kimball  Popularizes Star Schema 6
Data Vault Modeling… Took 10 years of Research and Design, including TESTING  to become  flexible, consistent, and scalable 7
What IS a Data Vault? (Business Definition) Data Vault Model Detail oriented Historical traceability Uniquely linked set of normalized tables Supports one or more functional areas of business 8 ,[object Object]
CMMI, Project Plan
Risk, Governance, Versioning
Peer Reviews, Release Cycles
Repeatable, Consistent, Optimized
Complete with Best Practices for BI/DWBusiness Keys Span  / Cross Lines of Business Sales Contracts Planning Delivery Finance Operations Procurement Functional Area
The Data Vault Model The Data Vault model is a data modeling approach 		…so it fits into the family of modeling approaches: 3rd Normal Form Data Vault Star Schema ,[object Object],		…andStar Schema is optimal for OLAP Delivery / Data Marts 		…the Data Vault is optimal for the Data Warehouse (EDW) 9
Supply Chain Analogy 10 Source  Systems Data Vault (EDW) Data Marts
What Does One Look Like? Records a history of the interaction Customer Product Sat Sat Sat Sat Sat Link Customer Product F(x) F(x) F(x) Sat Sat Sat Sat Order F(x) Sat Order Elements: ,[object Object]
Link
Satellite11 Hub = List of Unique Business Keys Link = List of Relationships, Associations Satellites = Descriptive Data
Colorized Perspective… Data Vault 3rd NF & Star Schema (separation) Business Keys Associations Details HUB Satellite The Data Vault uniquely separates the Business Keys (Hubs) from the Associations (Links)  and both of these from the Detailsthat describe them and provide context (Satellites).   LINK Satellite (Colors Concept Originated By: Hans Hultgren) 12
Star Schemas, 3NF, Data Vault:Pros & Cons Defining the Data Vault Space Why NOT use Star Schemas as an EDW? Why NOT use 3NF as an EDW? Why NOT use Data Vault as a Data Delivery Model? 13
Star Schema Pros/Cons as an EDW PROS Good for multi-dimensional analysis Subject oriented answers Excellent for aggregation points Rapid development / deployment Great for some historical storage CONS Not cross-business functional Use of junk / helper tables Trouble with VLDW Unable to provide integrated enterprise information Can’t handle ODS or exploration warehouse requirements Trouble with data explosion in near-real-time environments Trouble with updates to type 2 dimension primary keys Trouble with late arriving data in dimensions to support real-time arriving transactions Not granular enough information to support real-time data integration 14
3nf Pros/Cons as an EDW PROS Many to many linkages Handle lots of information Tightly integrated information Highly structured Conducive to near-real time loads Relatively easy to extend CONS Time driven PK issues Parent-child complexities Cascading change impacts Difficult to load Not conducive to BI tools Not conducive to drill-down Difficult to architect for an enterprise Not conducive to spiral/scope controlled implementation Physical design usually doesn’t follow business processes 15
Data Vault Pros/Cons as an EDW CONS Not conducive to OLAP processing Requires business analysis to be firm Introduces many join operations PROS Supports near-real time and batch feeds Supports functional business linking Extensible / flexible Provides rapid build / delivery of star schema’s Supports VLDB / VLDW Designed for EDW Supports data mining and AI Provides granular detail Incrementally built 16
Analogy: The Porsche, the SUV and the Big Rig Which would you use to win a race? Which would you use to move a house? Would you adapt the truck and enter a race with Porches and expect to win? 17
A Quick Look at Methodology Issues Business Rule Processing, Lack of Agility, and  Future proofing your new solution 18
EDW Architecture: Generation 1 19 Enterprise BI Solution Sales (batch) Staging (EDW) Star Schemas Complex  Business  Rules #2 Finance Conformed Dimensions Junk Tables Helper Tables Factless Facts Staging + History Complex Business  Rules +Dependencies Contracts ,[object Object]
Cross-system dependencies
Source data filtering
In-process data manipulation
High risk of incorrect data aggregation
Larger system = increased impact
Often re-engineered at the SOURCE
History can be destroyed (completely re-computed),[object Object]
Re-Engineering Business Rules Data Flow (Mapping) Current Sources Sales Customer Source Join Finance Customer Transactions Customer Purchases IMPACT!! ** NEW SYSTEM** 21
Federated Star Schema Inhibiting Agility Data Mart 3 High Effort & Cost Data Mart 2 Data Mart 1 Changing and Adjusting conformed dimensions causes an exponential rise in the cost curve over time RESULT: Business builds their own Data Marts! Low Maintenance Cycle Begins Time Start 22 The main driver for this is the maintenance costs, and re-engineering of the existing system which occurs for each new “federated/conformed” effort.  This increases delivery time, difficulty, and maintenance costs.
EDW Architecture: Generation 2 SOA Enterprise BI Solution Star Schemas (real-time) Sales (batch) EDW (Data Vault) (batch) Staging Error Marts Finance Contracts Complex Business  Rules Report Collections Unstructured Data FUNDAMENTAL GOALS ,[object Object]
Consistent
Fault-tolerant
Supports phased release
Scalable
AuditableThe business rules are moved closer to the business, improving IT reaction time, reducing cost and minimizing impacts to the enterprise data warehouse (EDW) 23
NO Re-Engineering Current Sources Data Vault Sales Stage Copy Hub Customer Customer Finance Stage Copy Link Transaction Customer Transactions Hub Acct Hub Product Customer Purchases Stage Copy NO IMPACT!!! NO RE-ENGINEERING! ** NEW SYSTEM** IMPACT!! 24
Progressive Agility and  Responsiveness of IT High Effort & Cost Low Maintenance Cycle Begins Time Start 25 Foundational Base Built New Functional Areas Added Initial DV Build Out Re-Engineering does NOT occur with a Data Vault Model.  This keeps costs down, and maintenance easy.  It also reduces complexity of the existing architecture.
What’s Wrong With the OLD METHODOLOGY? Using Star Schemas as your Data Warehouse leads to…. 26
Dimensionitis DimensionItis: Incurable Disease, the symptoms are the creation of new dimensions because the cost and time to conform existing dimensions with new attributes rises beyond the business ability to pay… 27 …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... Business Says:  Avoid the re-engineering costs, just “copy” the dimensions and create a new one for OUR department…   What can it hurt? …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………...
Deformed Dimensions Deformity: The URGE to continue “slamming data” into an existing conformed dimension until it simply cannot sustain any further changes, the result: a deformed dimension and a HUGE re-engineering cost / nightmare. 28 Business Wants a Change! Business said: Just add that to the existing Dimension, it will be easy right? Business Change Business Change V1 Business Change V2 ………………… …………………  …………………  …………………  …………………  …………………  …………………  …………………  Complex Load V3 ……………… ……………… ……………… ……………… ……………… ……………… ……………… ……………… ……………… ……………… ……………… ……………… ……………… ……………… …………………  …………………  …………………  …………………  …………………  …………………  …………………  …………………  …………………  …………………  …………………  …………………  …………………  …………………  …………………  …………………  …………………  …………………  …………………  …………………  …………………  ………………… …………………  Complex Load Complex Load 90 days, $125k 120 days, $200k Re-Engineering  the  Load Processes EACH TIME! 180 days, $275k
Silo Building / IT Non-Agility Business Says: Take the dimension you have, copy it, and change it… This should be cheap, and easy right? 29 SALES Business Change To Modify Existing Star =  180 days, $275k We built our own because IT costs too much… First Star FINANCE Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Customer_Tag Customer_Score Customer_Region Customer_Stats Customer_Phone Customer_Type Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Customer_Tag Customer_Score Customer_Region Customer_Stats Customer_Phone Customer_Type We built our own because IT took too long… Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Fact_ABC Fact_DEF Fact_PDQ Fact_MYFACT MARKETING Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Customer_Tag Customer_Score Customer_Region Customer_Stats Customer_Phone Customer_Type Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Customer_Tag Customer_Score Customer_Region Customer_Stats Customer_Phone Customer_Type We built our own because we needed customized dimension data…
Why is Data Vault a Good Fit? 30
What are the top businessobstacles in your data warehousetoday? 31
Poor Agility Inconsistent Answer Sets Needs Accountability Demands Auditability Desires IT Transparency Are you feeling Pinned Down? 32
What are the top technologyobstacles in yourdata warehousetoday? 33
Complex Systems Real-Time Data Arrival Unimaginable Data Growth Master Data Alignment Bad Data Quality Late Delivery/Over Budget Are your systems CRUMBLING? 34
Yugo Existing Solutions Worlds Worst Car Have lead you down a painful path… 35
Projects Cancelled & Restarted Re-engineering required to absorb new systems Complexity drives maintenance cost Sky high Disparate Silo Solutions provide inaccurate answers! Severe lack of Accountability 36
How can youovercomethese obstacles? There must be a better way… There IS a better way! 37
It’s Called the Data Vault Model andMethodology 38
What is it? It’s a simple Easy-to-use Plan To build your  valuable Data Warehouse! 39
What’s the Value? Painless Auditability  Understandable Standards Rapid Adaptability Simple Build-out Uncomplicated Design Effortless Scalability Pursue Your Goals! 40
Why Bother With Something New? Old Chinese proverb:  'Unless you change direction, you're apt to end up where you're headed.' 41
What Are the Issues? This is NOT what you want happening to your project! Business… Changes Frequently IT…. Needs Accountability Takes Too Long Demands Auditability Is Over-budget Has No Visibility Too Complex Wants More Control Can’t Sustain Growth THE GAP!! 42
What Are the Foundational Keys? Flexibility Scalability Productivity 43
Key: Flexibility Enabling rapid change on a massive scale without downstream impacts! 44
Key: Scalability Providing no foreseeable barrier to increased size and scope People, Process, & Architecture! 45
Key: Productivity Enabling low complexity systems with high value output at a rapid pace 46
< BREAK TIME > 47
How does it work? Bringing the Data Vault to Your Project 48
Key: Flexibility No Re-Engineering! Addingnew components to the EDW has NEAR ZERO impact to: ,[object Object]
Existing Data Model
Existing Reporting & BI Functions
Existing Source Systems
Existing Star Schemas and Data Marts49
Case In Point: Result of flexibility of the Data Vault Model allowed them to merge 3 companies in 90 days – that is ALL systems, ALL DATA! 50
Key: Scalability in Architecture Scalingis easy, its based on the following principles ,[object Object]
MPP Shared-Nothing Architecture
Scale Free Networks51
Case In Point: Result of scalability was to produce a Data Vault model that scaled to 3 Petabytes in size, and is still growing today! 52
Key: Scalability in Team Size You should be able to SCALE your TEAM as well! With the Data Vault methodology, you can: Scale your team when desired, at different points in the project! 53
Case In Point: (Dutch Tax Authority) Result of scalability was to increase ETL developers for each new source system, and reassign them when the system was completely loaded to the Data Vault 54
Key: Productivity Increasing Productivity requires a reduction in complexity. The Data Vault Model simplifies all of the following: ,[object Object]
Real-Time Ingestion of Data
Data Modeling for the EDW
Enhancing and Adapting for Change to the Model
Ease of Monitoring, managing and optimizing processes55
Case in Point: Result of Productivity was: 2 people in 2 weeks merged 3 systems, built a full Data Vault EDW, 5 star schemas and 3 reports.   These individuals generated: ,[object Object]
100% of the Staging Data Model
75% of the finished EDW data Model
75% of the star schema data model56
The Competing Bid? The competition bid this with 15 people and 3 months to completion, at a cost of $250k! (they bid a Very complex system) Our total cost?  $30k and 2 weeks! 57

Contenu connexe

Tendances

Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
Denodo
 
MDM Strategy & Roadmap
MDM Strategy & RoadmapMDM Strategy & Roadmap
MDM Strategy & Roadmap
victorlbrown
 

Tendances (20)

Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
 
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
 
Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Modern Data Architecture
Modern Data Architecture Modern Data Architecture
Modern Data Architecture
 
Reference master data management
Reference master data managementReference master data management
Reference master data management
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Agile & Data Modeling – How Can They Work Together?
Agile & Data Modeling – How Can They Work Together?Agile & Data Modeling – How Can They Work Together?
Agile & Data Modeling – How Can They Work Together?
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Operational Data Vault
Operational Data VaultOperational Data Vault
Operational Data Vault
 
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and Governance
 
MDM Strategy & Roadmap
MDM Strategy & RoadmapMDM Strategy & Roadmap
MDM Strategy & Roadmap
 
Data Architecture Brief Overview
Data Architecture Brief OverviewData Architecture Brief Overview
Data Architecture Brief Overview
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
 

En vedette

En vedette (17)

Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012
 
IRM UK - 2009: DV Modeling And Methodology
IRM UK - 2009: DV Modeling And MethodologyIRM UK - 2009: DV Modeling And Methodology
IRM UK - 2009: DV Modeling And Methodology
 
Data vault what's Next: Part 2
Data vault what's Next: Part 2Data vault what's Next: Part 2
Data vault what's Next: Part 2
 
Data vault: What's Next
Data vault: What's NextData vault: What's Next
Data vault: What's Next
 
Présentation data vault et bi v20120508
Présentation data vault et bi v20120508Présentation data vault et bi v20120508
Présentation data vault et bi v20120508
 
Best Practices: Data Admin & Data Management
Best Practices: Data Admin & Data ManagementBest Practices: Data Admin & Data Management
Best Practices: Data Admin & Data Management
 
Visual Data Vault
Visual Data VaultVisual Data Vault
Visual Data Vault
 
Data vault
Data vaultData vault
Data vault
 
Lean Data Warehouse via Data Vault
Lean Data Warehouse via Data VaultLean Data Warehouse via Data Vault
Lean Data Warehouse via Data Vault
 
Data Vault ReConnect Speed Presenting PM Part Four
Data Vault ReConnect Speed Presenting PM Part FourData Vault ReConnect Speed Presenting PM Part Four
Data Vault ReConnect Speed Presenting PM Part Four
 
Data Vault ReConnect Speed Presenting AM Part One
Data Vault ReConnect Speed Presenting AM Part OneData Vault ReConnect Speed Presenting AM Part One
Data Vault ReConnect Speed Presenting AM Part One
 
Data Vault ReConnect Speed Presenting AM Part Two
Data Vault ReConnect Speed Presenting AM Part TwoData Vault ReConnect Speed Presenting AM Part Two
Data Vault ReConnect Speed Presenting AM Part Two
 
Data Vault ReConnect Speed Presenting PM Part Three
Data Vault ReConnect Speed Presenting PM Part ThreeData Vault ReConnect Speed Presenting PM Part Three
Data Vault ReConnect Speed Presenting PM Part Three
 
Guru4Pro Data Vault Best Practices
Guru4Pro Data Vault Best PracticesGuru4Pro Data Vault Best Practices
Guru4Pro Data Vault Best Practices
 
CDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
CDC und Data Vault für den Aufbau eines DWH in der AutomobilindustrieCDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
CDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
 
Big Data Modeling
Big Data ModelingBig Data Modeling
Big Data Modeling
 
Agile KPIs
Agile KPIsAgile KPIs
Agile KPIs
 

Similaire à Data Vault Overview

Rev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRev_3 Components of a Data Warehouse
Rev_3 Components of a Data Warehouse
Ryan Andhavarapu
 
Day 02 sap_bi_overview_and_terminology
Day 02 sap_bi_overview_and_terminologyDay 02 sap_bi_overview_and_terminology
Day 02 sap_bi_overview_and_terminology
tovetrivel
 
Introduction to data vault ilja dmitrijev
Introduction to data vault   ilja dmitrijevIntroduction to data vault   ilja dmitrijev
Introduction to data vault ilja dmitrijev
Ilja Dmitrijevs
 
Data Provisioning & Optimization
Data Provisioning & OptimizationData Provisioning & Optimization
Data Provisioning & Optimization
Ambareesh Kulkarni
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
Harvinder Atwal
 
How to Place Data at the Center of Digital Transformation in BFSI
How to Place Data at the Center of Digital Transformation in BFSIHow to Place Data at the Center of Digital Transformation in BFSI
How to Place Data at the Center of Digital Transformation in BFSI
Denodo
 
Warehouse components
Warehouse componentsWarehouse components
Warehouse components
ganblues
 
Informatica agile virtualization apr17 2012
Informatica agile virtualization apr17 2012Informatica agile virtualization apr17 2012
Informatica agile virtualization apr17 2012
sahatwilliams
 

Similaire à Data Vault Overview (20)

Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Data wirehouse
Data wirehouseData wirehouse
Data wirehouse
 
dw_concepts_2_day_course.ppt
dw_concepts_2_day_course.pptdw_concepts_2_day_course.ppt
dw_concepts_2_day_course.ppt
 
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationKASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
 
Rev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRev_3 Components of a Data Warehouse
Rev_3 Components of a Data Warehouse
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 
Day 02 sap_bi_overview_and_terminology
Day 02 sap_bi_overview_and_terminologyDay 02 sap_bi_overview_and_terminology
Day 02 sap_bi_overview_and_terminology
 
Introduction to data vault ilja dmitrijev
Introduction to data vault   ilja dmitrijevIntroduction to data vault   ilja dmitrijev
Introduction to data vault ilja dmitrijev
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt
 
ITReady DW Day2
ITReady DW Day2ITReady DW Day2
ITReady DW Day2
 
Data Provisioning & Optimization
Data Provisioning & OptimizationData Provisioning & Optimization
Data Provisioning & Optimization
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
 
Mastering in data warehousing & BusinessIintelligence
Mastering in data warehousing & BusinessIintelligenceMastering in data warehousing & BusinessIintelligence
Mastering in data warehousing & BusinessIintelligence
 
Original: Lean Data Model Storming for the Agile Enterprise
Original: Lean Data Model Storming for the Agile EnterpriseOriginal: Lean Data Model Storming for the Agile Enterprise
Original: Lean Data Model Storming for the Agile Enterprise
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A Comparison
 
How to Place Data at the Center of Digital Transformation in BFSI
How to Place Data at the Center of Digital Transformation in BFSIHow to Place Data at the Center of Digital Transformation in BFSI
How to Place Data at the Center of Digital Transformation in BFSI
 
Why do Data Warehousing & Business Intelligence go hand in hand?
Why do Data Warehousing & Business Intelligence go hand in hand? Why do Data Warehousing & Business Intelligence go hand in hand?
Why do Data Warehousing & Business Intelligence go hand in hand?
 
Warehouse components
Warehouse componentsWarehouse components
Warehouse components
 
Informatica agile virtualization apr17 2012
Informatica agile virtualization apr17 2012Informatica agile virtualization apr17 2012
Informatica agile virtualization apr17 2012
 

Dernier

Mckinsey foundation level Handbook for Viewing
Mckinsey foundation level Handbook for ViewingMckinsey foundation level Handbook for Viewing
Mckinsey foundation level Handbook for Viewing
Nauman Safdar
 
Mifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pills
Mifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pillsMifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pills
Mifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pills
Abortion pills in Kuwait Cytotec pills in Kuwait
 
Structuring and Writing DRL Mckinsey (1).pdf
Structuring and Writing DRL Mckinsey (1).pdfStructuring and Writing DRL Mckinsey (1).pdf
Structuring and Writing DRL Mckinsey (1).pdf
laloo_007
 
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan CytotecJual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
ZurliaSoop
 

Dernier (20)

Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Century
 
Falcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to ProsperityFalcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to Prosperity
 
Mckinsey foundation level Handbook for Viewing
Mckinsey foundation level Handbook for ViewingMckinsey foundation level Handbook for Viewing
Mckinsey foundation level Handbook for Viewing
 
Putting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptxPutting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptx
 
Mifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pills
Mifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pillsMifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pills
Mifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pills
 
Falcon Invoice Discounting: Aviate Your Cash Flow Challenges
Falcon Invoice Discounting: Aviate Your Cash Flow ChallengesFalcon Invoice Discounting: Aviate Your Cash Flow Challenges
Falcon Invoice Discounting: Aviate Your Cash Flow Challenges
 
Pre Engineered Building Manufacturers Hyderabad.pptx
Pre Engineered  Building Manufacturers Hyderabad.pptxPre Engineered  Building Manufacturers Hyderabad.pptx
Pre Engineered Building Manufacturers Hyderabad.pptx
 
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
 
HomeRoots Pitch Deck | Investor Insights | April 2024
HomeRoots Pitch Deck | Investor Insights | April 2024HomeRoots Pitch Deck | Investor Insights | April 2024
HomeRoots Pitch Deck | Investor Insights | April 2024
 
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All Time
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All TimeCall 7737669865 Vadodara Call Girls Service at your Door Step Available All Time
Call 7737669865 Vadodara Call Girls Service at your Door Step Available All Time
 
Marel Q1 2024 Investor Presentation from May 8, 2024
Marel Q1 2024 Investor Presentation from May 8, 2024Marel Q1 2024 Investor Presentation from May 8, 2024
Marel Q1 2024 Investor Presentation from May 8, 2024
 
Organizational Transformation Lead with Culture
Organizational Transformation Lead with CultureOrganizational Transformation Lead with Culture
Organizational Transformation Lead with Culture
 
Over the Top (OTT) Market Size & Growth Outlook 2024-2030
Over the Top (OTT) Market Size & Growth Outlook 2024-2030Over the Top (OTT) Market Size & Growth Outlook 2024-2030
Over the Top (OTT) Market Size & Growth Outlook 2024-2030
 
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
 
Structuring and Writing DRL Mckinsey (1).pdf
Structuring and Writing DRL Mckinsey (1).pdfStructuring and Writing DRL Mckinsey (1).pdf
Structuring and Writing DRL Mckinsey (1).pdf
 
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfDr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
 
Lucknow Housewife Escorts by Sexy Bhabhi Service 8250092165
Lucknow Housewife Escorts  by Sexy Bhabhi Service 8250092165Lucknow Housewife Escorts  by Sexy Bhabhi Service 8250092165
Lucknow Housewife Escorts by Sexy Bhabhi Service 8250092165
 
Buy Verified TransferWise Accounts From Seosmmearth
Buy Verified TransferWise Accounts From SeosmmearthBuy Verified TransferWise Accounts From Seosmmearth
Buy Verified TransferWise Accounts From Seosmmearth
 
Falcon Invoice Discounting: The best investment platform in india for investors
Falcon Invoice Discounting: The best investment platform in india for investorsFalcon Invoice Discounting: The best investment platform in india for investors
Falcon Invoice Discounting: The best investment platform in india for investors
 
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan CytotecJual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
 

Data Vault Overview

  • 1. Data Vault Model &Methodology © Dan Linstedt, 2011-2012 all rights reserved 1
  • 2. Agenda Introduction – why are you here? What is a Data Vault? Where does it come from? Star Schema, 3nf, and Data Vault pros and cons AS AN EDW solution.. When is a Data Vault a good fit? Benefits of Data Vault Modeling & Methodology <BREAK> When to NOT use a Data Vault Fundamental Paradigm Shift Business Keys & Business Processes Technical Review Query Performance (PIT & Bridge) What wasn’t covered in this presentation… 2
  • 3. A bit about me… 3 Author, Inventor, Speaker – and part time photographer… 25+ years in the IT industry Worked in DoD, US Gov’t, Fortune 50, and so on… Find out more about the Data Vault: http://www.youtube.com/LearnDataVault http://LearnDataVault.com Full profile on http://www.LinkedIn.com/dlinstedt
  • 4. Why Are YOU Here? 4 Your Expectations? Your Questions? Your Background? Areas of Interest? Biggest question: What are the top 3 pains your current EDW / BI solution is experiencing?
  • 5. What is it?Where did it come from? Defining the Data Vault Space 5
  • 6. Data Vault Time Line E.F. Codd invented relational modeling 1976 Dr Peter Chen Created E-R Diagramming 1990 – Dan Linstedt Begins R&D on Data Vault Modeling Chris Date and Hugh Darwen Maintained and Refined Modeling Mid 70’s AC Nielsen Popularized Dimension & Fact Terms 1970 2000 1960 1980 1990 Late 80’s – Barry Devlin and Dr Kimball Release “Business Data Warehouse” Early 70’s Bill Inmon Began Discussing Data Warehousing Mid 80’s Bill Inmon Popularizes Data Warehousing Mid 60’s Dimension & Fact Modeling presented by General Mills and Dartmouth University 2000 – Dan Linstedt releases first 5 articles on Data Vault Modeling Mid – Late 80’s Dr Kimball Popularizes Star Schema 6
  • 7. Data Vault Modeling… Took 10 years of Research and Design, including TESTING to become flexible, consistent, and scalable 7
  • 8.
  • 13. Complete with Best Practices for BI/DWBusiness Keys Span / Cross Lines of Business Sales Contracts Planning Delivery Finance Operations Procurement Functional Area
  • 14.
  • 15. Supply Chain Analogy 10 Source Systems Data Vault (EDW) Data Marts
  • 16.
  • 17. Link
  • 18. Satellite11 Hub = List of Unique Business Keys Link = List of Relationships, Associations Satellites = Descriptive Data
  • 19. Colorized Perspective… Data Vault 3rd NF & Star Schema (separation) Business Keys Associations Details HUB Satellite The Data Vault uniquely separates the Business Keys (Hubs) from the Associations (Links) and both of these from the Detailsthat describe them and provide context (Satellites). LINK Satellite (Colors Concept Originated By: Hans Hultgren) 12
  • 20. Star Schemas, 3NF, Data Vault:Pros & Cons Defining the Data Vault Space Why NOT use Star Schemas as an EDW? Why NOT use 3NF as an EDW? Why NOT use Data Vault as a Data Delivery Model? 13
  • 21. Star Schema Pros/Cons as an EDW PROS Good for multi-dimensional analysis Subject oriented answers Excellent for aggregation points Rapid development / deployment Great for some historical storage CONS Not cross-business functional Use of junk / helper tables Trouble with VLDW Unable to provide integrated enterprise information Can’t handle ODS or exploration warehouse requirements Trouble with data explosion in near-real-time environments Trouble with updates to type 2 dimension primary keys Trouble with late arriving data in dimensions to support real-time arriving transactions Not granular enough information to support real-time data integration 14
  • 22. 3nf Pros/Cons as an EDW PROS Many to many linkages Handle lots of information Tightly integrated information Highly structured Conducive to near-real time loads Relatively easy to extend CONS Time driven PK issues Parent-child complexities Cascading change impacts Difficult to load Not conducive to BI tools Not conducive to drill-down Difficult to architect for an enterprise Not conducive to spiral/scope controlled implementation Physical design usually doesn’t follow business processes 15
  • 23. Data Vault Pros/Cons as an EDW CONS Not conducive to OLAP processing Requires business analysis to be firm Introduces many join operations PROS Supports near-real time and batch feeds Supports functional business linking Extensible / flexible Provides rapid build / delivery of star schema’s Supports VLDB / VLDW Designed for EDW Supports data mining and AI Provides granular detail Incrementally built 16
  • 24. Analogy: The Porsche, the SUV and the Big Rig Which would you use to win a race? Which would you use to move a house? Would you adapt the truck and enter a race with Porches and expect to win? 17
  • 25. A Quick Look at Methodology Issues Business Rule Processing, Lack of Agility, and Future proofing your new solution 18
  • 26.
  • 30. High risk of incorrect data aggregation
  • 31. Larger system = increased impact
  • 33.
  • 34. Re-Engineering Business Rules Data Flow (Mapping) Current Sources Sales Customer Source Join Finance Customer Transactions Customer Purchases IMPACT!! ** NEW SYSTEM** 21
  • 35. Federated Star Schema Inhibiting Agility Data Mart 3 High Effort & Cost Data Mart 2 Data Mart 1 Changing and Adjusting conformed dimensions causes an exponential rise in the cost curve over time RESULT: Business builds their own Data Marts! Low Maintenance Cycle Begins Time Start 22 The main driver for this is the maintenance costs, and re-engineering of the existing system which occurs for each new “federated/conformed” effort. This increases delivery time, difficulty, and maintenance costs.
  • 36.
  • 41. AuditableThe business rules are moved closer to the business, improving IT reaction time, reducing cost and minimizing impacts to the enterprise data warehouse (EDW) 23
  • 42. NO Re-Engineering Current Sources Data Vault Sales Stage Copy Hub Customer Customer Finance Stage Copy Link Transaction Customer Transactions Hub Acct Hub Product Customer Purchases Stage Copy NO IMPACT!!! NO RE-ENGINEERING! ** NEW SYSTEM** IMPACT!! 24
  • 43. Progressive Agility and Responsiveness of IT High Effort & Cost Low Maintenance Cycle Begins Time Start 25 Foundational Base Built New Functional Areas Added Initial DV Build Out Re-Engineering does NOT occur with a Data Vault Model. This keeps costs down, and maintenance easy. It also reduces complexity of the existing architecture.
  • 44. What’s Wrong With the OLD METHODOLOGY? Using Star Schemas as your Data Warehouse leads to…. 26
  • 45. Dimensionitis DimensionItis: Incurable Disease, the symptoms are the creation of new dimensions because the cost and time to conform existing dimensions with new attributes rises beyond the business ability to pay… 27 …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... Business Says: Avoid the re-engineering costs, just “copy” the dimensions and create a new one for OUR department… What can it hurt? …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………...
  • 46. Deformed Dimensions Deformity: The URGE to continue “slamming data” into an existing conformed dimension until it simply cannot sustain any further changes, the result: a deformed dimension and a HUGE re-engineering cost / nightmare. 28 Business Wants a Change! Business said: Just add that to the existing Dimension, it will be easy right? Business Change Business Change V1 Business Change V2 ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… Complex Load V3 ……………… ……………… ……………… ……………… ……………… ……………… ……………… ……………… ……………… ……………… ……………… ……………… ……………… ……………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… Complex Load Complex Load 90 days, $125k 120 days, $200k Re-Engineering the Load Processes EACH TIME! 180 days, $275k
  • 47. Silo Building / IT Non-Agility Business Says: Take the dimension you have, copy it, and change it… This should be cheap, and easy right? 29 SALES Business Change To Modify Existing Star = 180 days, $275k We built our own because IT costs too much… First Star FINANCE Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Customer_Tag Customer_Score Customer_Region Customer_Stats Customer_Phone Customer_Type Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Customer_Tag Customer_Score Customer_Region Customer_Stats Customer_Phone Customer_Type We built our own because IT took too long… Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Fact_ABC Fact_DEF Fact_PDQ Fact_MYFACT MARKETING Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Customer_Tag Customer_Score Customer_Region Customer_Stats Customer_Phone Customer_Type Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Customer_Tag Customer_Score Customer_Region Customer_Stats Customer_Phone Customer_Type We built our own because we needed customized dimension data…
  • 48. Why is Data Vault a Good Fit? 30
  • 49. What are the top businessobstacles in your data warehousetoday? 31
  • 50. Poor Agility Inconsistent Answer Sets Needs Accountability Demands Auditability Desires IT Transparency Are you feeling Pinned Down? 32
  • 51. What are the top technologyobstacles in yourdata warehousetoday? 33
  • 52. Complex Systems Real-Time Data Arrival Unimaginable Data Growth Master Data Alignment Bad Data Quality Late Delivery/Over Budget Are your systems CRUMBLING? 34
  • 53. Yugo Existing Solutions Worlds Worst Car Have lead you down a painful path… 35
  • 54. Projects Cancelled & Restarted Re-engineering required to absorb new systems Complexity drives maintenance cost Sky high Disparate Silo Solutions provide inaccurate answers! Severe lack of Accountability 36
  • 55. How can youovercomethese obstacles? There must be a better way… There IS a better way! 37
  • 56. It’s Called the Data Vault Model andMethodology 38
  • 57. What is it? It’s a simple Easy-to-use Plan To build your valuable Data Warehouse! 39
  • 58. What’s the Value? Painless Auditability Understandable Standards Rapid Adaptability Simple Build-out Uncomplicated Design Effortless Scalability Pursue Your Goals! 40
  • 59. Why Bother With Something New? Old Chinese proverb: 'Unless you change direction, you're apt to end up where you're headed.' 41
  • 60. What Are the Issues? This is NOT what you want happening to your project! Business… Changes Frequently IT…. Needs Accountability Takes Too Long Demands Auditability Is Over-budget Has No Visibility Too Complex Wants More Control Can’t Sustain Growth THE GAP!! 42
  • 61. What Are the Foundational Keys? Flexibility Scalability Productivity 43
  • 62. Key: Flexibility Enabling rapid change on a massive scale without downstream impacts! 44
  • 63. Key: Scalability Providing no foreseeable barrier to increased size and scope People, Process, & Architecture! 45
  • 64. Key: Productivity Enabling low complexity systems with high value output at a rapid pace 46
  • 65. < BREAK TIME > 47
  • 66. How does it work? Bringing the Data Vault to Your Project 48
  • 67.
  • 69. Existing Reporting & BI Functions
  • 71. Existing Star Schemas and Data Marts49
  • 72. Case In Point: Result of flexibility of the Data Vault Model allowed them to merge 3 companies in 90 days – that is ALL systems, ALL DATA! 50
  • 73.
  • 76. Case In Point: Result of scalability was to produce a Data Vault model that scaled to 3 Petabytes in size, and is still growing today! 52
  • 77. Key: Scalability in Team Size You should be able to SCALE your TEAM as well! With the Data Vault methodology, you can: Scale your team when desired, at different points in the project! 53
  • 78. Case In Point: (Dutch Tax Authority) Result of scalability was to increase ETL developers for each new source system, and reassign them when the system was completely loaded to the Data Vault 54
  • 79.
  • 82. Enhancing and Adapting for Change to the Model
  • 83. Ease of Monitoring, managing and optimizing processes55
  • 84.
  • 85. 100% of the Staging Data Model
  • 86. 75% of the finished EDW data Model
  • 87. 75% of the star schema data model56
  • 88. The Competing Bid? The competition bid this with 15 people and 3 months to completion, at a cost of $250k! (they bid a Very complex system) Our total cost? $30k and 2 weeks! 57
  • 89. Results? Changing the direction of the river takes less effort than stopping the flow of water 58
  • 90. When NOT to use the Data Vault Model & Methodology 59
  • 91. When NOT to Use the Data Vault You have: a small set of point solution requirements a very short time-frame for delivery To use the data one-time, then throw it away a single source system, single source application A single business analyst in the entire company You do NOT have: audit requirements forcing you to keep history multiple data center consolidation efforts near-real-time to worry about massive batch data to integrate External data feeds outside your control Requirements to do trend analysis of all your data Pain – that forces you to reengineer every time you ask for a change to your current data warehousing systems 60
  • 92. Fundamental Paradigm Shift Exploring differences in the architecture, implementation, and process design. 61
  • 93. It’s Not Just a Data Model… Model Methodology SUCCESS! 62
  • 94. Different From ANYTHING ELSE! The Business Rules go after the Data Warehouse! Data is interpreted on the way OUT! Hold on… We do distinguish between HARD and SOFT business rules… Ok, now tell my WHY this is important? 63
  • 95. EDW: The Old Way of Loading Corporate Fraud Accountability Title XI consists of seven sections. Section 1101 recommends a name for this title as “Corporate Fraud Accountability Act of 2002”. It identifies corporate fraud and records tamperingas criminal offenses and joins those offenses to specific penalties. It also revises sentencing guidelines and strengthens their penalties. This enables the SEC to temporarily freeze large or unusual payments. Source 1 HR Mart Business Rules Change Data! Sales Mart Source 2 Staging Are changes to data ON THE WAY IN to the EDW equivalent to records tampering? Finance Mart Source 3 64
  • 96. EDW: The New Compliant Way Implement a Raw Data Vault Data Warehouse Move the business rules “downstream” 65
  • 97. Business Keys & Business Processes 66
  • 98. Business Keys & Business Processes 67 Excel Spreadsheet SLS123 *P123MFG SLS123 SLS123 *P123MFG Procurement Sales Manual Process NO VISIBILITY! Customer Contact $$ Revenue Time Delivery Sales Contracts Planning Procurement Manufacturing Finance
  • 99. Technical Review Hub, Link, Satellite - Definitions 68
  • 100. HUB Data Examples HUB_PART_NUMBER HUB_CUST_ACCT SQN PART_NUM LOAD_DTS RECORD_SRC 1 MFG-25862 10-14-2000 MANUFACT 2 MFG*25266 10-14-2000 MANUFACT 3 *P25862 10-14-2000 PLANNING 4 MFG_25862 10-15-2000 DELIVERY 5 CN*25266 10-16-2000 DELIVERY SQN CUST_ACCT LOAD_DTS RECORD_SRC 1 ABC123 10-14-2000 SALES 2 ABC-123 10-14-2000 SALES 3 *ABC-123 10-14-2000 FINANCE 4 123,ABCD 10-15-2000 CONTRACTS 5 PEF-2956 10-16-2000 CONTRACTS Hub Structure SEQUENCE <BUSINESS KEY> {LAST SEEN DATE} <LOAD DATE> <RECORD SOURCE> } Unique Index } Optional 69
  • 101. Link Structures Link_Product_Supplier Link_Customer_Account_Employee LPS_SQN PRODUCT_SQN SUPPLIER_SQN LPS_LOAD_DTS LPS_REC_SOURCE LPS_ENCR_KEY LCAE_SQN CUSTOMER_SQN ACCOUNT_SQN EMPLOYEE_SQN LCAE_LOAD_DTS LCAE_REC_SOURCE Unique Index Link Structure SEQUENCE <HUB KEY SQN 1> <HUB KEY SQN 2> <HUB KEY SQN N> {LAST SEEN DATE} {CONFIDENCE} {STRENGTH} <LOAD DATE> <RECORD SOURCE> Unique Index } Optional Dynamic Link 70
  • 102. Satellites Split By Source System SAT_FINANCE_CUST SAT_CONTRACTS_CUST SAT_SALES_CUST PARENT SEQUENCE LOAD DATE <LOAD-END-DATE> <RECORD-SOURCE> Contact Name Contact Email Contact Phone Number PARENT SEQUENCE LOAD DATE <LOAD-END-DATE> <RECORD-SOURCE> First Name Last Name Guardian Full Name Co-Signer Full Name Phone Number Address City State/Province Zip Code PARENT SEQUENCE LOAD DATE <LOAD-END-DATE> <RECORD-SOURCE> Name Phone Number Best time of day to reach Do Not Call Flag Satellite Structure PARENT SEQUENCE LOAD DATE <LOAD-END-DATE> <RECORD-SOURCE> {user defined descriptive data} {or temporal based timelines} Primary Key 71
  • 103. Why do we build Links this way? 72
  • 104. History Teaches Us… If we model for ONE relationship in the EDW, we BREAK the others! 73 Portfolio The EDW is designed to handle TODAY’S relationship, as soon as history is loaded, it breaks the model! 1 Today: M Customer Hub Portfolio X 1 Portfolio 5 years From now M M M Customer Hub Customer X Portfolio M 10 Years ago 1 This situation forces re-engineering of the model, load routines, and queries! Customer
  • 105. History Teaches Us… If we model with a LINK table, we can handle ALL the requirements! 74 Portfolio 1 Today: Hub Portfolio M Customer 1 M Portfolio LNK Cust-Port 5 years from now M M M Customer 1 Hub Customer Portfolio M 10 Years ago This design is flexible, handles past, present, and future relationship changes with NO RE-ENGINEERING! 1 Customer
  • 106. Applying the Data Vault to Global DW2.0 Manufacturing EDW in China Planning in Brazil Hub Hub Link Sat Sat Link Sat Sat Link Hub Link Hub Hub Sat Sat Sat Sat Sat Sat Sat Sat Base EDW Created in Corporate Financials in USA 75
  • 107. 76 Extreme Data Vault Partitioning
  • 108. Query Performance Point-in-time and Bridge Tables, overcoming query issues 77
  • 109. Purpose Of PIT & Bridge To reduce the number of joins, and to reduce the amount of data being queried for a given range of time. These two together, allow “direct table match”, as well as table elimination in the queries to occur. These tables are not necessary for the entire model; only when: Massive amounts of data are found Large numbers of Satellites surround a Hub or Link Large query across multiple Hubs & Links is necessary Real-time-data is flowing in, uninterrupted What are they? Snapshot tables – Specifically built for query speed 78
  • 110. PIT Table Architecture Satellite: Point In Time Primary Key PARENT SEQUENCE LOAD DATE {Satellite 1 Load Date} {Satellite 2 Load Date} {Satellite 3 Load Date} {…} {Satellite N Load Date} PIT Sat Sat 1 Sat 2 Hub Order PIT Sat Sat 3 Sat 1 Sat 4 Sat 2 Sat 1 Hub Customer Hub Product Sat 2 Sat 3 Link Line Item Sat 4 Satellite Line Item 79
  • 111. PIT Table Example SAT_CUST_CONTACT_CELL SAT_CUST_CONTACT_ADDR SAT_CUST_CONTACT_NAME SQN LOAD_DTSCELL 1 10-14-2000999-555-1212 1 10-15-2000 999-111-1234 1 10-16-2000 999-252-2834 1 10-17-2000 999.257-2837 1 10-18-2000 999-273-5555 SQN LOAD_DTSADDR 1 08-01-200026 Prospect 109-29-200026 Prosp St. 112-17-200028 November 1 01-01-200126 Prospect St SQN LOAD_DTSNAME 1 10-14-2000 Dan L 1 11-01-2000Dan Linedt 112-31-2000Dan Linstedt SQN LOAD_DTSSAT_NAME_LDTS SAT_CELL_LDTS SAT_ADDR_LDTS 1 08-01-2000NULL NULL 08-01-2000 1 09-01-2000 NULL NULL 08-01-2000 1 10-01-2000 NULL NULL 09-29-2000 1 11-01-200011-01-200010-18-200009-29-2000 1 12-01-200011-01-200010-18-200009-29-2000 1 01-01-200112-31-200010-18-200001-01-2001 Snapshot Date 80
  • 112. BridgeTable Architecture Satellite: Bridge Primary Key UNIQUE SEQUENCE LOAD DATE {Hub 1 Sequence #} {Hub 2 Sequence #} {Hub 3 Sequence #} {Link 1 Sequence #} {Link 2 Sequence #} {…} {Link N Sequence #} {Hub 1 Business Key} {Hub 2 Business Key} {…} {Hub N Business Key} Bridge Sat 1 Sat 2 Hub Parts Hub Seller Hub Product Link Link Sat 3 Sat 4 Satellite Satellite 81
  • 113. Bridge Table Data Example Bridge Table: Seller by Product by Part SQN LOAD_DTSSELL_SQN SELL_ID PROD_SQN PROD_NUM PART_SQN PART_NUM 1 08-01-200015 NY*1 2756 ABC-123-9K 525 JK*2*4 209-01-200016CO*242654DEF-847-0L 324 MN*5-2 310-01-200016CO*2482374PPA-252-2A 9938 DD*2*3 411-01-200024AZ*2525222UIF-525-88 7 UF*9*0 512-01-200099NM*581DAN-347-7F 16 KI*9-2 601-01-200199NM*581DAN-347-7F 24 DL*0-5 Snapshot Date 82
  • 114. What WASN’T Covered ETL Automation ETL Implementation SQL Query Logic Balanced MPP design Data Vault Modeling on Appliances Deep Dive on Structures (Hubs, Links, Satellites) What happens when you break the rules? Project management, Risk management & mitigation, methodology & approach Automation: Automated DV modeling, Automated ETL production Change Management Temporal Data Modeling Concerns… And so on… 83
  • 117. The Experts Say… “The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework.” Bill Inmon “The Data Vault is foundationally strong and exceptionally scalable architecture.” Stephen Brobst “The Data Vault is a technique which some industry experts have predicted may spark a revolution as the next big thing in data modeling for enterprise warehousing....” Doug Laney 86
  • 118. More Notables… “This enables organizations to take control of their data warehousing destiny, supporting better and more relevant data warehouses in less time than before.” Howard Dresner “[The Data Vault] captures a practical body of knowledge for data warehouse development which both agile and traditional practitioners will benefit from..” Scott Ambler 87
  • 119. Where To Learn More The Technical Modeling Book: http://LearnDataVault.com The Discussion Forums: & eventshttp://LinkedIn.com – Data Vault Discussions Contact me:http://DanLinstedt.com - web siteDanLinstedt@gmail.com - email World wide User Group (Free)http://dvusergroup.com 88

Notes de l'éditeur

  1. Before we begin exploring how the Data Vault can help you, or even defining what a Data Vault is, we need to first understand some of the business problems that may be causing you heartburn on a daily basis.
  2. Everything from poor agility to a lack of IT Transparency plague todays’ data warehouses. I can’t begin to tell you how much pain these businesses are suffering as a result of these problems. Inconsistent Answer Sets, Lack of accountability, inadequate auditablitiy all play a part in data warehouses that are currently on the brink of falling apart.But it’s not just business issues, there are technical ones to cope with as well.
  3. There are always technology obstacles that we face in any data warehousing project. So the question is: what kinds of problems have you seen in your journey? Do they haunt you today?
  4. Complexity drives high cost, resulting in unnecessary late delivery schedules and unsustainable business logic in the integration channels.Real-time data is flooding our data warehouses, has your architecture fallen down on the job?Unstructured data and legal requirements for auditability are bringing huge data volumes.Master Data Alignment is missing from our data warehouses, as they are split in disparate systems all over the world.Bad data quality is covered up through the transformation layers on the way IN to your EDW.Data warehouses grow so large and become so difficult to maintain that IT teams are often delivering late, and beyond original costs.The foundations of your data warehouse are probably crumbling under sheer weight and pressure.
  5. Disparate data marts, unmatched answer sets, geographical problems, and worse…Projects are under fire from a number of areas. Let’s take a look at what happenswhen a data warehouse project reaches the brick wall head-on, at 90 miles an hour.
  6. I think this says it all…. Projects cancelled and restarted, Re-Engineering required to absorb changes, high complexity making it difficult to upgrade, change, and keep up at the speed of business. Disparate silo solutions screaming for consolidation, and of course – a lack of accountability on BOTH sides of the fence…All signs of an ailing BI solution on the brink of being shut-down.
  7. We have got to keep focus on the prize. Business still wants a BI systemBacked by an enterprise EDW.IT still wants a manageable system that will grow and change without major re-engineering.There is a better way, and I can help you with it.
  8. The Data Vault model is really just another name for “Common foundational architecture and design”.It’s based on 10 years of Research and design work, followed by10 years of implementation best practices.It is architected to help you solve the problems!
  9. Put quite simply: It’s an easy-to-use architecture and plan, a guide-bookFor building a repeatable, consistent, and scalable data warehouse system.So just what is the value of the Data Vault?
  10. The Data Vault model and methodology provide:Painless AuditabilityUnderstandable standardsRapid AdaptabilitySimple Build-outUncomplicated DesignAnd Effortless ScalabilityGo after your goals, build a wildly successful data warehouse just like I have.
  11. Beginning: 5 advanced ETLBy the 1st month, they 5 advanced, and 15 basic/introBy the 6th month, they 5 advanced, but 50 basicBy the end of the 8th month they went to production with 10 MF sourcesAnd their team size was: 12 people (5 advanced, 7 basic – for support).
  12. You’re not the first, nor will you be the last one to use it.Some of the worlds biggest companies are implementing Data Vaults.From Diamler Motors to Lockheed Martin, to the Department of Defense.JPMorgan and Chase used the Data Vault model to merge 3 companies in 90 days!