SlideShare une entreprise Scribd logo
1  sur  13
DecisionLab.Net
business intelligence is business performance
___________________________________________________________________________________________________________________________________________________________________________________
____________________________________________________________________________________________________________________________________________________________________________________
DecisionLab http://www.decisionlab.net dupton@decisionlab.net direct760.525.3268
http://blog.decisionlab.net Carlsbad,California,USA
Data Vault:
Data Warehouse
Design Goes
Agile
__________________________________________________________________________________________________________________________________________________________________________________
Page 2 of 13
Whitepaper
Data Vault:
Data Warehouse Design Goes Agile
by daniel upton
data warehouse modeler and architect
certified scrum master
DecisionLab.Net
business intelligence is business performance
dupton@decisionlab.net
http://www.linkedin.com/in/DanielUpton
Without my (the writer’s) explicit written permission in advance, the only permissible reproduction or copying of this written material is in the form of a
review or a brief reference to a specific concept herein, either or which must clearly specify this writing’s title, author (me), and this web address
http://www.slideshare.net/DanielUpton/lean-data-warehouse-via-data-vault . For permission to reproduce or copy any of this material other than what is
specified above, just email me at the above address.
__________________________________________________________________________________________________________________________________________________________________________________
Page 3 of 13
Open Question: When we begin considering a new Data Warehouse initiative, how clear is the
scope, really?
If weintend to design Data Marts, and we have no specified need for a data warehouseeither to become a systemof record,
or to supportMaster Data Management (MDM), then we may chooseto Dr. Ralph Kimball’s Data WarehouseBus
architecture, designing a library of conformed (standardized, re-usable) dimension and fact tables for deployment into a series
of purpose-builtdata marts. Under these requirements, wemay have no specific need for an Inmon stylethird-normalform
(3nf) EnterpriseData Warehouse(EDW) in general, or for a Data Vault in particular. In other cases, however, because
sometimes data warehousedata outlives its corresponding sourcedata inside a soon-to-retireapplication database, then, like
it or not, a data warehousemay, as Bill Inman remind us, assumea systemof record role for its data. Whereas the Kimball
Bus architecture’s tables are often not related via key fields, and in fact may not be populated at all until deployment fromthe
Bus into a specific-needs Data Mart, Kimball adherents rarely asserta system-of-record rolefor their solutions.
But, supposewedo determine that our required solution either does need to assumea systemof record role, or perhaps that
it mustsupportMaster Data Management. As such, wemay elect to design a fully functionalEDW, rather than Kimball’s DW
Bus, so that the EDW itself, and not justits dependent data marts, is a working, populated database. Now, knowing that the
creation of a classic EDW, with its requirement for an up-front, enterprise-widedesign, is a challenge with today’s
expectations for rapid delivery, some may be curious aboutnew design methodologies offer ways to accelerate EDW Design.
Data Vault, a data warehousemodeling method with a substantialfollowing in Denmark, and a growing basein the U.S., offers
specific and important benefits.
In order to set expectations early about Data Vault, readers mustunderstand that, somewhatunlike a traditional EDW, and
utterly unlike a star-schema, a Data Vault (not to be confusedwithBusiness DataVault, whichis not addressedinthis
article) cannot serve as an efficient presentationlayer appropriate for direct queries. Rather, it is morelike a historic
enterprise data staging repository that, with additional downstreamETL, will supportnotonly star-schema, reporting and data
mining, but also master data management, data quality and other enterprise data initiatives.
__________________________________________________________________________________________________________________________________________________________________________________
Page 4 of 13
Data Vault Benefits:
 Benefit #1: Allows for loading of a history-tracking DW with little or none of the typical extraction, transformation and
loading (ETL) transformations that, oncethey are finally figured out, would otherwisecontain subjective-interpretations
of the data and which purportedly enhancethe data and prepareit for reporting or analytics.
o In my view, this is almost enough of a benefit all by itself. As such, in my introduction that follows, I will focus on
proving this point.
o Agile Win: Confidently loading a DW without having to already know the fine details of business rules and
requirements and the resulting transformation requirements means that loading of historicaland incremental
data could get accomplished before the firsttarget databasedesign (3nf EDW or Data Mart) is complete.
 Benefit #2: Insofar as Data Vaultprescribes a very generic downstream‘de-constructing’ of OLTP tables, thesede-
constructing transformations can beautomated and so can it’s associated early-stageETL into Data Vault. Since, as
you’ll soon see, Data Vault causes a substantial increasein the number of tables, this automation potential is a
substantialbenefit.
o Agile Win: Automated initial design and loading, anyone?
 Benefit #3: Due to Data Vault’s generic design logic, it’s use of surrogatekeys (moreon this soon), and it’s prescription
to avoid subjective-interpretivetransformations, it’s reasonableto quickly load a Data Vaultjustwith the needed subset
of tables.
o Agile Win: More frequent releases. Quickly design for, and load, only the data needed for the next release. Use
the samegeneric design to load other tables when those User Stories fromthe ProductBacklog get placed into a
Sprint.
In the remainder of this article, I will provide a high level introduction to Data Vault, with primary emphasis on how it achieves
Benefit #1.
__________________________________________________________________________________________________________________________________________________________________________________
Page 5 of 13
High-Level IntroductiontoDataVault Methodology:
We begin with a simple OLTP databasedesign for clients purchasing products froma company’s stores. For simplicity, I
include only a minimum of fields. In the diagrams, ‘BK’ means business key, ‘FK’ means foreign key. Refer to DiagramA
below.
As is common, this simple OLTP schema does not use surrogatekeys. If a client gets a new email address, or a productgets a
new name, or a city’s re-mapping of boundary lines suddenly places an existing storein a new city, new values would
overwritethe old values, which would then be lost. Of course, in order to preservehistory, history-tracking surrogatekeys are
commonly used by practitioners of both Bill Inmon’s classic third-normalform(3nf) EDW design, as well as Dr. Ralph Kimball’s
Star Schema method, but both of these methods prescribesurrogatekeys within the context of data transformations thatalso
include subjectiveinterpretation (herein simply ‘subjectivetransformation’) in order to cleanse or purportedly enhance the
data for the purposes of integration, reporting, or analytics. Data Vault purists claim that any such subjectivetransformation
of line-of-business data introduces inappropriatedistortion to it, thereby disqualifying the Data Warehouseas systemof
record. Data Vault, importantly, provides a unique way to track historical changes in sourcedata while eliminating most, or
all, subjectivetransformations such as field renaming, selective data-quality filters, establishment of hierarchies, calculated
fields, and target values. Although analytics-driven, subjectivetransformations can still be applied, they are applied
downstreamof the Data Vault EDW, as subsequenttransformations for loads into data marts designed to analyze specific
processes. Back upstream, the Data Vault accomplishes historic change-tracking using a generic table-deconstructing
approach that I will now describe. Before beginning, I recommend against too-quickly comparisons this method others, like
star-schema design, which servedifferent needs.
__________________________________________________________________________________________________________________________________________________________________________________
Page 6 of 13
DiagramA: Simple OLTP schema (data sourcefor a Data Vault)
__________________________________________________________________________________________________________________________________________________________________________________
Page 7 of 13
Fundamentally, Data Vault prescribes three types of tables: Hubs, Satellites, and Links. The diagram’s Client table as a good
example. Hubs work according to the following simplified description:
Hub Tables:
 Define the granularity of an entity (eg. product), and thus the granularity of non-key attributes (eg. productdescription)
within the entity.
 Contain a new surrogateprimary key (PK), as well as the sourcetable’s business key, which is demotes fromits PK role.
Satellite Tables:
 Contain all non-key fields (attributes), plus a set of date-stamp fields
 Contain, as a Foreign Key (FK), the Hub’s PK, plus the load date-time stamps.
 Have a defining, dependent entity relationship to one, and only one, parent table.
 Whether that parent table is a Hub or Link, the Satellite holds the non-key fields fromthe parenttable.
 Although on initial loads, only one Satellite row will exist for each corresponding Hub row, whenever a non-key
attribute change(eg. a client’s email address changes) upstreamin the OLTP schema (often accomplished up there with
a simple over-write), a new row will be added only to the Satellite, and not the Hub, which is why many Satellite rows
relate to one Hub row. So, in this fashion, historic changes within sourcetables are gracefully tracked in the EDW.
Notice, in DiagramB that, among other tables, the Client_h_s Satellite table is dependent to the Client_h Hub table, but that,
at this stage in our design, the Client_h Hub is not yet related to Order_h Hub. When we add Links, thoserelationships will
appear. But first, have a look at the tables, the new location of existing fields, and the various added date-time stamps.
__________________________________________________________________________________________________________________________________________________________________________________
Page 8 of 13
DiagramB: Hubs and Satellite in a partially-designed Data Vault schema
__________________________________________________________________________________________________________________________________________________________________________________
Page 9 of 13
Link Tables:
 Refer to Diagram C
 Relate exactly two Hub tables together.
 Contain, now as non-key values, the primary keys of the two Hubs, plus its own surrogatePK.
 As with an ordinary association table, a Link is a child to two other tables and, as such, is able to gracefully handle
relative changes in cardinality between the two tables and, wherenecessary, can directly resolvemany-to-many
relationships that might otherwisecausea show-stopper error in thedata-loading process.
 Unlike an ordinary associationtable, the Link table, with its own surrogatePK, is able to track historic changes in the
relationship itself between the two Hubs, and thus between their two directly-related OLTP sourcetables. Specifically,
all loaded data that conformed with the initial cardinality between tables would sharethe same Link table surrogate
key, but an unexpected, future sourcedata change that either caused a cardinality reversal(so that the one becomes
the many, and vice versa), a new row, with a new surrogatekey, is generated to not only capture it now while the
original surrogatekey preserves thehistorical relationship. Slick!
 In a more sophisticated Data Vault schema than this one, we might go further by adding a add load_date and
load_date_end data_stamp fields to Link tables, too. As an (admittedly strange) example, the Order_Store_l Link table
might conceivably get date-time stamp fields so that, in coordination with its surrogatePK, an Order (perhaps for a
long-running service) that, after the Order Date, gets re-credited to a different storecan be efficiently tracked over time
in this way.
__________________________________________________________________________________________________________________________________________________________________________________
Page 10 of 13
DiagramC: Completed Data Vault Schema (Link tables added)
__________________________________________________________________________________________________________________________________________________________________________________
Page 11 of 13
Now, we’veadded Link tables. After scanning DiagramC, go back and compare it withDiagram A and note the movement of
the various non-key attributes. Undoubtedly, you will also notice, and may be concerned, that the sourceschema’s fivetables
justmorphed into the Data Vault’s twelve. Importantly, notethat the Diagram A’s Details table was transformed notinto a
Hub-and-Satellite combination, but rather into a Link table. When you consider that an order detail record (a line item) is
really justthe association between an Order and a Product(albeit an association with plenty of vital associated data), then it
makes sensethat the Link table Details_l was created. This Link table, whosesole purposeis to relate the Orders_h and
Products_h tables, of course, also needs a Details_l_s Satellite table to hold the show-stopper non-key attributes, Quantity
and Unit Price.
The Data Vault method does allow for some interpretation here. You might now be thinking, “Aha! So, we haven’t eliminated
all subjectiveinterpretation!” Perhaps not, but whatI’ll describehere is a pretty small, generic interpretation. Either way, in
this situation, it would not be patently wrong to design a Details_h Hub table (plus, of course, a Details_h_s Satellite), rather
than the Details_l Link. Added to that, if we use very simple Data-Vaultdesign automation logic, which simply de-constructs
all tables into Hub and Satellite pairs, this is whatwe would get. However, keep in mind that if we did that, we would then
have to create not one, but two Link tables, specifically Order_Order_Details_l Link table and Product_Order_Details_l Link
table to connect our tables, and these tables would contain no attributes of apparent value. Therefore, we choosethe design
that leaves us with a simpler, more efficient Data Vault design. By the way, this logic can easily be automated, but that’s
beyond the scopeof this article.
__________________________________________________________________________________________________________________________________________________________________________________
Page 12 of 13
Conclusion:
Our discussion on Data Vault opened with the idea that an EDW should load and storehistoricaldata withoutapplying any
transformations thatcontain subjectiveinterpretation of data or business-rules, becausethoseinterpretations, even if
appropriatefor specific reporting or analytics, do modify line-of-business data, and thereforeintroduce distortions into
operational data. Those interpretive transformations should occur downstreamduring ETL into presentation layer tables.
Although Data Vault does, in fact, apply a specific set of generic ‘de-construction’ transformations, thesetransformations
contain little or no subjective interpretation of business rules. They do, however, allow it to (1) apply an appropriatelevel of
referential integrity to sourcedata even wherethe sourcesystemmay lack it now or in the future; (2) gracefully capture
historical data changes, within and between tables, without endangering the success of the data load; (3) supportloading of
data froma subsetof sourcetables initially, and then load, or not load, other related sourcedata tables much later without
compromising the EDW’s referential integrity.
Lastly, and very importantly; (4) data vault design and the associated Data Vault loading ETL, which is largely generic from one
data set to another, can be automated, and thus radically accelerated in development. Although the logic of this automation
flows fromthe simplicity of data vault design, a detailed automation discussion is beyond the scope of this article.
In closing, if we can automatically design and load a Data Warehouse(albeit not it’s presentation layer), it frees up brain cells
for the higher-order logic of design of the presentation layer and the intensive, customETL to load it. As I described here, all
of this can be accomplished simultaneously.
________________________________________________
daniel upton
dupton@decisionlab.net
DecisionLab.Net
business intelligence is business performance
__________________________________________________________________________________________________________________________________________________________________________________
Page 13 of 13
DecisionLab.Net
Range of Services:
_____________________________________________________
Business Intelligence Roadmapping,Feasibility Analysis
BI ProjectEstimation and Requirement Modelstorming
BI Staff Augmentation: Data Warehouse / Mart / Dashboard Design and Development
_________________________________________________________________________________________________________________________________________________________________________
DanielUpton
DecisionLab http://www.decisionlab.net dupton@decisionlab.net
Direct760.525.3268 http://blog.decisionlab.net Carlsbad,California,USA

Contenu connexe

Tendances

Introduction to Data Vault Modeling
Introduction to Data Vault ModelingIntroduction to Data Vault Modeling
Introduction to Data Vault ModelingKent Graziano
 
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Kent Graziano
 
Talend Data Preparation Overview
Talend Data Preparation OverviewTalend Data Preparation Overview
Talend Data Preparation OverviewJean-Michel Franco
 
The Ultimate Guide To Embedded Analytics
The Ultimate Guide To Embedded Analytics The Ultimate Guide To Embedded Analytics
The Ultimate Guide To Embedded Analytics Poojitha B
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
Practical Guide to Data Governance Success
Practical Guide to Data Governance SuccessPractical Guide to Data Governance Success
Practical Guide to Data Governance SuccessAmple Insight Inc
 
Modern data warehouse presentation
Modern data warehouse presentationModern data warehouse presentation
Modern data warehouse presentationDavid Rice
 
Data Warehouse Agility Array Conference2011
Data Warehouse Agility Array Conference2011Data Warehouse Agility Array Conference2011
Data Warehouse Agility Array Conference2011Hans Hultgren
 
Business Drivers Behind Data Governance
Business Drivers Behind Data GovernanceBusiness Drivers Behind Data Governance
Business Drivers Behind Data GovernancePrecisely
 
Business analysis in data warehousing
Business analysis in data warehousingBusiness analysis in data warehousing
Business analysis in data warehousingHimanshu
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefitsRicky Barron
 
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...DATAVERSITY
 
Business intelligence- Components, Tools, Need and Applications
Business intelligence- Components, Tools, Need and ApplicationsBusiness intelligence- Components, Tools, Need and Applications
Business intelligence- Components, Tools, Need and Applicationsraj
 
Improve power bi performance
Improve power bi performanceImprove power bi performance
Improve power bi performanceAnnie Xu
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture DesignKujambu Murugesan
 
Business Intelligence tools comparison
Business Intelligence tools comparisonBusiness Intelligence tools comparison
Business Intelligence tools comparisonStratebi
 

Tendances (20)

Introduction to Data Vault Modeling
Introduction to Data Vault ModelingIntroduction to Data Vault Modeling
Introduction to Data Vault Modeling
 
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)
 
Talend Data Preparation Overview
Talend Data Preparation OverviewTalend Data Preparation Overview
Talend Data Preparation Overview
 
The Ultimate Guide To Embedded Analytics
The Ultimate Guide To Embedded Analytics The Ultimate Guide To Embedded Analytics
The Ultimate Guide To Embedded Analytics
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Practical Guide to Data Governance Success
Practical Guide to Data Governance SuccessPractical Guide to Data Governance Success
Practical Guide to Data Governance Success
 
Why Data Vault?
Why Data Vault? Why Data Vault?
Why Data Vault?
 
Modern data warehouse presentation
Modern data warehouse presentationModern data warehouse presentation
Modern data warehouse presentation
 
Data Warehouse Agility Array Conference2011
Data Warehouse Agility Array Conference2011Data Warehouse Agility Array Conference2011
Data Warehouse Agility Array Conference2011
 
Business Drivers Behind Data Governance
Business Drivers Behind Data GovernanceBusiness Drivers Behind Data Governance
Business Drivers Behind Data Governance
 
Business analysis in data warehousing
Business analysis in data warehousingBusiness analysis in data warehousing
Business analysis in data warehousing
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
 
Business intelligence- Components, Tools, Need and Applications
Business intelligence- Components, Tools, Need and ApplicationsBusiness intelligence- Components, Tools, Need and Applications
Business intelligence- Components, Tools, Need and Applications
 
Improve power bi performance
Improve power bi performanceImprove power bi performance
Improve power bi performance
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Data Vault Overview
Data Vault OverviewData Vault Overview
Data Vault Overview
 
Business Intelligence tools comparison
Business Intelligence tools comparisonBusiness Intelligence tools comparison
Business Intelligence tools comparison
 

Similaire à Data Vault: Data Warehouse Design Goes Agile

Lean Data Warehouse via Data Vault
Lean Data Warehouse via Data VaultLean Data Warehouse via Data Vault
Lean Data Warehouse via Data VaultDaniel Upton
 
Data Vault: What is it? Where does it fit? SQL Saturday #249
Data Vault: What is it?  Where does it fit?  SQL Saturday #249Data Vault: What is it?  Where does it fit?  SQL Saturday #249
Data Vault: What is it? Where does it fit? SQL Saturday #249Daniel Upton
 
Rando Veizi: Data warehouse and Pentaho suite
Rando Veizi: Data warehouse and Pentaho suiteRando Veizi: Data warehouse and Pentaho suite
Rando Veizi: Data warehouse and Pentaho suiteCarlo Vaccari
 
oracle-adw-melts snowflake-report.pdf
oracle-adw-melts snowflake-report.pdforacle-adw-melts snowflake-report.pdf
oracle-adw-melts snowflake-report.pdfssuserf8f9b2
 
Evaluation of Data Auditability, Traceability and Agility leveraging Data Vau...
Evaluation of Data Auditability, Traceability and Agility leveraging Data Vau...Evaluation of Data Auditability, Traceability and Agility leveraging Data Vau...
Evaluation of Data Auditability, Traceability and Agility leveraging Data Vau...IRJET Journal
 
Enhancing Dashboard Visuals with Multi-Dimensional Expressions (MDX)
Enhancing Dashboard Visuals with Multi-Dimensional Expressions (MDX)Enhancing Dashboard Visuals with Multi-Dimensional Expressions (MDX)
Enhancing Dashboard Visuals with Multi-Dimensional Expressions (MDX)Daniel Upton
 
Building Modern Data Platform with AWS
Building Modern Data Platform with AWSBuilding Modern Data Platform with AWS
Building Modern Data Platform with AWSDmitry Anoshin
 
Data warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designData warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designSarita Kataria
 
Multi dimensional modeling
Multi dimensional modelingMulti dimensional modeling
Multi dimensional modelingnoviari sugianto
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingKent Graziano
 
Building a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperBuilding a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperImpetus Technologies
 
Microsoft SQL Azure - Scaling Out with SQL Azure Whitepaper
Microsoft SQL Azure - Scaling Out with SQL Azure WhitepaperMicrosoft SQL Azure - Scaling Out with SQL Azure Whitepaper
Microsoft SQL Azure - Scaling Out with SQL Azure WhitepaperMicrosoft Private Cloud
 
EDW Data Model Storming for Integration of NoSQL and RDBMS by Daniel Upton
EDW Data Model Storming for Integration of NoSQL and RDBMS by Daniel UptonEDW Data Model Storming for Integration of NoSQL and RDBMS by Daniel Upton
EDW Data Model Storming for Integration of NoSQL and RDBMS by Daniel UptonDaniel Upton
 
DataWarehousingandAbInitioConcepts.ppt
DataWarehousingandAbInitioConcepts.pptDataWarehousingandAbInitioConcepts.ppt
DataWarehousingandAbInitioConcepts.pptPurnenduMaity2
 
Tips for managing a VLDB
Tips for managing a VLDBTips for managing a VLDB
Tips for managing a VLDBJohn Martin
 
DBT PU BI Lab Manual for ETL Exercise.pdf
DBT PU BI Lab Manual for ETL Exercise.pdfDBT PU BI Lab Manual for ETL Exercise.pdf
DBT PU BI Lab Manual for ETL Exercise.pdfJanakiramanS13
 
KeyAchivementsMimecast
KeyAchivementsMimecastKeyAchivementsMimecast
KeyAchivementsMimecastVera Ekimenko
 
Migration to Oracle 12c Made Easy Using Replication Technology
Migration to Oracle 12c Made Easy Using Replication TechnologyMigration to Oracle 12c Made Easy Using Replication Technology
Migration to Oracle 12c Made Easy Using Replication TechnologyDonna Guazzaloca-Zehl
 
Data Modeling for Integration of NoSQL with a Data Warehouse
Data Modeling for Integration of NoSQL with a Data WarehouseData Modeling for Integration of NoSQL with a Data Warehouse
Data Modeling for Integration of NoSQL with a Data WarehouseDaniel Upton
 

Similaire à Data Vault: Data Warehouse Design Goes Agile (20)

Lean Data Warehouse via Data Vault
Lean Data Warehouse via Data VaultLean Data Warehouse via Data Vault
Lean Data Warehouse via Data Vault
 
Data Vault: What is it? Where does it fit? SQL Saturday #249
Data Vault: What is it?  Where does it fit?  SQL Saturday #249Data Vault: What is it?  Where does it fit?  SQL Saturday #249
Data Vault: What is it? Where does it fit? SQL Saturday #249
 
Rando Veizi: Data warehouse and Pentaho suite
Rando Veizi: Data warehouse and Pentaho suiteRando Veizi: Data warehouse and Pentaho suite
Rando Veizi: Data warehouse and Pentaho suite
 
oracle-adw-melts snowflake-report.pdf
oracle-adw-melts snowflake-report.pdforacle-adw-melts snowflake-report.pdf
oracle-adw-melts snowflake-report.pdf
 
Evaluation of Data Auditability, Traceability and Agility leveraging Data Vau...
Evaluation of Data Auditability, Traceability and Agility leveraging Data Vau...Evaluation of Data Auditability, Traceability and Agility leveraging Data Vau...
Evaluation of Data Auditability, Traceability and Agility leveraging Data Vau...
 
Enhancing Dashboard Visuals with Multi-Dimensional Expressions (MDX)
Enhancing Dashboard Visuals with Multi-Dimensional Expressions (MDX)Enhancing Dashboard Visuals with Multi-Dimensional Expressions (MDX)
Enhancing Dashboard Visuals with Multi-Dimensional Expressions (MDX)
 
Building Modern Data Platform with AWS
Building Modern Data Platform with AWSBuilding Modern Data Platform with AWS
Building Modern Data Platform with AWS
 
Data warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designData warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-design
 
Multi dimensional modeling
Multi dimensional modelingMulti dimensional modeling
Multi dimensional modeling
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
 
Building a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperBuilding a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White Paper
 
Microsoft SQL Azure - Scaling Out with SQL Azure Whitepaper
Microsoft SQL Azure - Scaling Out with SQL Azure WhitepaperMicrosoft SQL Azure - Scaling Out with SQL Azure Whitepaper
Microsoft SQL Azure - Scaling Out with SQL Azure Whitepaper
 
EDW Data Model Storming for Integration of NoSQL and RDBMS by Daniel Upton
EDW Data Model Storming for Integration of NoSQL and RDBMS by Daniel UptonEDW Data Model Storming for Integration of NoSQL and RDBMS by Daniel Upton
EDW Data Model Storming for Integration of NoSQL and RDBMS by Daniel Upton
 
DataWarehousingandAbInitioConcepts.ppt
DataWarehousingandAbInitioConcepts.pptDataWarehousingandAbInitioConcepts.ppt
DataWarehousingandAbInitioConcepts.ppt
 
Tips for managing a VLDB
Tips for managing a VLDBTips for managing a VLDB
Tips for managing a VLDB
 
Course Outline Ch 2
Course Outline Ch 2Course Outline Ch 2
Course Outline Ch 2
 
DBT PU BI Lab Manual for ETL Exercise.pdf
DBT PU BI Lab Manual for ETL Exercise.pdfDBT PU BI Lab Manual for ETL Exercise.pdf
DBT PU BI Lab Manual for ETL Exercise.pdf
 
KeyAchivementsMimecast
KeyAchivementsMimecastKeyAchivementsMimecast
KeyAchivementsMimecast
 
Migration to Oracle 12c Made Easy Using Replication Technology
Migration to Oracle 12c Made Easy Using Replication TechnologyMigration to Oracle 12c Made Easy Using Replication Technology
Migration to Oracle 12c Made Easy Using Replication Technology
 
Data Modeling for Integration of NoSQL with a Data Warehouse
Data Modeling for Integration of NoSQL with a Data WarehouseData Modeling for Integration of NoSQL with a Data Warehouse
Data Modeling for Integration of NoSQL with a Data Warehouse
 

Dernier

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Dernier (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

Data Vault: Data Warehouse Design Goes Agile

  • 1. DecisionLab.Net business intelligence is business performance ___________________________________________________________________________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________________________________________________________________________________ DecisionLab http://www.decisionlab.net dupton@decisionlab.net direct760.525.3268 http://blog.decisionlab.net Carlsbad,California,USA Data Vault: Data Warehouse Design Goes Agile
  • 2. __________________________________________________________________________________________________________________________________________________________________________________ Page 2 of 13 Whitepaper Data Vault: Data Warehouse Design Goes Agile by daniel upton data warehouse modeler and architect certified scrum master DecisionLab.Net business intelligence is business performance dupton@decisionlab.net http://www.linkedin.com/in/DanielUpton Without my (the writer’s) explicit written permission in advance, the only permissible reproduction or copying of this written material is in the form of a review or a brief reference to a specific concept herein, either or which must clearly specify this writing’s title, author (me), and this web address http://www.slideshare.net/DanielUpton/lean-data-warehouse-via-data-vault . For permission to reproduce or copy any of this material other than what is specified above, just email me at the above address.
  • 3. __________________________________________________________________________________________________________________________________________________________________________________ Page 3 of 13 Open Question: When we begin considering a new Data Warehouse initiative, how clear is the scope, really? If weintend to design Data Marts, and we have no specified need for a data warehouseeither to become a systemof record, or to supportMaster Data Management (MDM), then we may chooseto Dr. Ralph Kimball’s Data WarehouseBus architecture, designing a library of conformed (standardized, re-usable) dimension and fact tables for deployment into a series of purpose-builtdata marts. Under these requirements, wemay have no specific need for an Inmon stylethird-normalform (3nf) EnterpriseData Warehouse(EDW) in general, or for a Data Vault in particular. In other cases, however, because sometimes data warehousedata outlives its corresponding sourcedata inside a soon-to-retireapplication database, then, like it or not, a data warehousemay, as Bill Inman remind us, assumea systemof record role for its data. Whereas the Kimball Bus architecture’s tables are often not related via key fields, and in fact may not be populated at all until deployment fromthe Bus into a specific-needs Data Mart, Kimball adherents rarely asserta system-of-record rolefor their solutions. But, supposewedo determine that our required solution either does need to assumea systemof record role, or perhaps that it mustsupportMaster Data Management. As such, wemay elect to design a fully functionalEDW, rather than Kimball’s DW Bus, so that the EDW itself, and not justits dependent data marts, is a working, populated database. Now, knowing that the creation of a classic EDW, with its requirement for an up-front, enterprise-widedesign, is a challenge with today’s expectations for rapid delivery, some may be curious aboutnew design methodologies offer ways to accelerate EDW Design. Data Vault, a data warehousemodeling method with a substantialfollowing in Denmark, and a growing basein the U.S., offers specific and important benefits. In order to set expectations early about Data Vault, readers mustunderstand that, somewhatunlike a traditional EDW, and utterly unlike a star-schema, a Data Vault (not to be confusedwithBusiness DataVault, whichis not addressedinthis article) cannot serve as an efficient presentationlayer appropriate for direct queries. Rather, it is morelike a historic enterprise data staging repository that, with additional downstreamETL, will supportnotonly star-schema, reporting and data mining, but also master data management, data quality and other enterprise data initiatives.
  • 4. __________________________________________________________________________________________________________________________________________________________________________________ Page 4 of 13 Data Vault Benefits:  Benefit #1: Allows for loading of a history-tracking DW with little or none of the typical extraction, transformation and loading (ETL) transformations that, oncethey are finally figured out, would otherwisecontain subjective-interpretations of the data and which purportedly enhancethe data and prepareit for reporting or analytics. o In my view, this is almost enough of a benefit all by itself. As such, in my introduction that follows, I will focus on proving this point. o Agile Win: Confidently loading a DW without having to already know the fine details of business rules and requirements and the resulting transformation requirements means that loading of historicaland incremental data could get accomplished before the firsttarget databasedesign (3nf EDW or Data Mart) is complete.  Benefit #2: Insofar as Data Vaultprescribes a very generic downstream‘de-constructing’ of OLTP tables, thesede- constructing transformations can beautomated and so can it’s associated early-stageETL into Data Vault. Since, as you’ll soon see, Data Vault causes a substantial increasein the number of tables, this automation potential is a substantialbenefit. o Agile Win: Automated initial design and loading, anyone?  Benefit #3: Due to Data Vault’s generic design logic, it’s use of surrogatekeys (moreon this soon), and it’s prescription to avoid subjective-interpretivetransformations, it’s reasonableto quickly load a Data Vaultjustwith the needed subset of tables. o Agile Win: More frequent releases. Quickly design for, and load, only the data needed for the next release. Use the samegeneric design to load other tables when those User Stories fromthe ProductBacklog get placed into a Sprint. In the remainder of this article, I will provide a high level introduction to Data Vault, with primary emphasis on how it achieves Benefit #1.
  • 5. __________________________________________________________________________________________________________________________________________________________________________________ Page 5 of 13 High-Level IntroductiontoDataVault Methodology: We begin with a simple OLTP databasedesign for clients purchasing products froma company’s stores. For simplicity, I include only a minimum of fields. In the diagrams, ‘BK’ means business key, ‘FK’ means foreign key. Refer to DiagramA below. As is common, this simple OLTP schema does not use surrogatekeys. If a client gets a new email address, or a productgets a new name, or a city’s re-mapping of boundary lines suddenly places an existing storein a new city, new values would overwritethe old values, which would then be lost. Of course, in order to preservehistory, history-tracking surrogatekeys are commonly used by practitioners of both Bill Inmon’s classic third-normalform(3nf) EDW design, as well as Dr. Ralph Kimball’s Star Schema method, but both of these methods prescribesurrogatekeys within the context of data transformations thatalso include subjectiveinterpretation (herein simply ‘subjectivetransformation’) in order to cleanse or purportedly enhance the data for the purposes of integration, reporting, or analytics. Data Vault purists claim that any such subjectivetransformation of line-of-business data introduces inappropriatedistortion to it, thereby disqualifying the Data Warehouseas systemof record. Data Vault, importantly, provides a unique way to track historical changes in sourcedata while eliminating most, or all, subjectivetransformations such as field renaming, selective data-quality filters, establishment of hierarchies, calculated fields, and target values. Although analytics-driven, subjectivetransformations can still be applied, they are applied downstreamof the Data Vault EDW, as subsequenttransformations for loads into data marts designed to analyze specific processes. Back upstream, the Data Vault accomplishes historic change-tracking using a generic table-deconstructing approach that I will now describe. Before beginning, I recommend against too-quickly comparisons this method others, like star-schema design, which servedifferent needs.
  • 7. __________________________________________________________________________________________________________________________________________________________________________________ Page 7 of 13 Fundamentally, Data Vault prescribes three types of tables: Hubs, Satellites, and Links. The diagram’s Client table as a good example. Hubs work according to the following simplified description: Hub Tables:  Define the granularity of an entity (eg. product), and thus the granularity of non-key attributes (eg. productdescription) within the entity.  Contain a new surrogateprimary key (PK), as well as the sourcetable’s business key, which is demotes fromits PK role. Satellite Tables:  Contain all non-key fields (attributes), plus a set of date-stamp fields  Contain, as a Foreign Key (FK), the Hub’s PK, plus the load date-time stamps.  Have a defining, dependent entity relationship to one, and only one, parent table.  Whether that parent table is a Hub or Link, the Satellite holds the non-key fields fromthe parenttable.  Although on initial loads, only one Satellite row will exist for each corresponding Hub row, whenever a non-key attribute change(eg. a client’s email address changes) upstreamin the OLTP schema (often accomplished up there with a simple over-write), a new row will be added only to the Satellite, and not the Hub, which is why many Satellite rows relate to one Hub row. So, in this fashion, historic changes within sourcetables are gracefully tracked in the EDW. Notice, in DiagramB that, among other tables, the Client_h_s Satellite table is dependent to the Client_h Hub table, but that, at this stage in our design, the Client_h Hub is not yet related to Order_h Hub. When we add Links, thoserelationships will appear. But first, have a look at the tables, the new location of existing fields, and the various added date-time stamps.
  • 9. __________________________________________________________________________________________________________________________________________________________________________________ Page 9 of 13 Link Tables:  Refer to Diagram C  Relate exactly two Hub tables together.  Contain, now as non-key values, the primary keys of the two Hubs, plus its own surrogatePK.  As with an ordinary association table, a Link is a child to two other tables and, as such, is able to gracefully handle relative changes in cardinality between the two tables and, wherenecessary, can directly resolvemany-to-many relationships that might otherwisecausea show-stopper error in thedata-loading process.  Unlike an ordinary associationtable, the Link table, with its own surrogatePK, is able to track historic changes in the relationship itself between the two Hubs, and thus between their two directly-related OLTP sourcetables. Specifically, all loaded data that conformed with the initial cardinality between tables would sharethe same Link table surrogate key, but an unexpected, future sourcedata change that either caused a cardinality reversal(so that the one becomes the many, and vice versa), a new row, with a new surrogatekey, is generated to not only capture it now while the original surrogatekey preserves thehistorical relationship. Slick!  In a more sophisticated Data Vault schema than this one, we might go further by adding a add load_date and load_date_end data_stamp fields to Link tables, too. As an (admittedly strange) example, the Order_Store_l Link table might conceivably get date-time stamp fields so that, in coordination with its surrogatePK, an Order (perhaps for a long-running service) that, after the Order Date, gets re-credited to a different storecan be efficiently tracked over time in this way.
  • 11. __________________________________________________________________________________________________________________________________________________________________________________ Page 11 of 13 Now, we’veadded Link tables. After scanning DiagramC, go back and compare it withDiagram A and note the movement of the various non-key attributes. Undoubtedly, you will also notice, and may be concerned, that the sourceschema’s fivetables justmorphed into the Data Vault’s twelve. Importantly, notethat the Diagram A’s Details table was transformed notinto a Hub-and-Satellite combination, but rather into a Link table. When you consider that an order detail record (a line item) is really justthe association between an Order and a Product(albeit an association with plenty of vital associated data), then it makes sensethat the Link table Details_l was created. This Link table, whosesole purposeis to relate the Orders_h and Products_h tables, of course, also needs a Details_l_s Satellite table to hold the show-stopper non-key attributes, Quantity and Unit Price. The Data Vault method does allow for some interpretation here. You might now be thinking, “Aha! So, we haven’t eliminated all subjectiveinterpretation!” Perhaps not, but whatI’ll describehere is a pretty small, generic interpretation. Either way, in this situation, it would not be patently wrong to design a Details_h Hub table (plus, of course, a Details_h_s Satellite), rather than the Details_l Link. Added to that, if we use very simple Data-Vaultdesign automation logic, which simply de-constructs all tables into Hub and Satellite pairs, this is whatwe would get. However, keep in mind that if we did that, we would then have to create not one, but two Link tables, specifically Order_Order_Details_l Link table and Product_Order_Details_l Link table to connect our tables, and these tables would contain no attributes of apparent value. Therefore, we choosethe design that leaves us with a simpler, more efficient Data Vault design. By the way, this logic can easily be automated, but that’s beyond the scopeof this article.
  • 12. __________________________________________________________________________________________________________________________________________________________________________________ Page 12 of 13 Conclusion: Our discussion on Data Vault opened with the idea that an EDW should load and storehistoricaldata withoutapplying any transformations thatcontain subjectiveinterpretation of data or business-rules, becausethoseinterpretations, even if appropriatefor specific reporting or analytics, do modify line-of-business data, and thereforeintroduce distortions into operational data. Those interpretive transformations should occur downstreamduring ETL into presentation layer tables. Although Data Vault does, in fact, apply a specific set of generic ‘de-construction’ transformations, thesetransformations contain little or no subjective interpretation of business rules. They do, however, allow it to (1) apply an appropriatelevel of referential integrity to sourcedata even wherethe sourcesystemmay lack it now or in the future; (2) gracefully capture historical data changes, within and between tables, without endangering the success of the data load; (3) supportloading of data froma subsetof sourcetables initially, and then load, or not load, other related sourcedata tables much later without compromising the EDW’s referential integrity. Lastly, and very importantly; (4) data vault design and the associated Data Vault loading ETL, which is largely generic from one data set to another, can be automated, and thus radically accelerated in development. Although the logic of this automation flows fromthe simplicity of data vault design, a detailed automation discussion is beyond the scope of this article. In closing, if we can automatically design and load a Data Warehouse(albeit not it’s presentation layer), it frees up brain cells for the higher-order logic of design of the presentation layer and the intensive, customETL to load it. As I described here, all of this can be accomplished simultaneously. ________________________________________________ daniel upton dupton@decisionlab.net DecisionLab.Net business intelligence is business performance
  • 13. __________________________________________________________________________________________________________________________________________________________________________________ Page 13 of 13 DecisionLab.Net Range of Services: _____________________________________________________ Business Intelligence Roadmapping,Feasibility Analysis BI ProjectEstimation and Requirement Modelstorming BI Staff Augmentation: Data Warehouse / Mart / Dashboard Design and Development _________________________________________________________________________________________________________________________________________________________________________ DanielUpton DecisionLab http://www.decisionlab.net dupton@decisionlab.net Direct760.525.3268 http://blog.decisionlab.net Carlsbad,California,USA