SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
Why Data Vault?
          Kent Graziano
Data Vault Master and Oracle ACE
      TrueBridge Resources
           OOW 2011
         Session #28782
My Bio
• Kent Graziano
   – Certified Data Vault Master
   – Oracle ACE (BI/DW)
   – Data Architecture and Data Warehouse Specialist
      • 30 years in IT
      • 20 years of Oracle-related work
      • 15+ years of data warehousing experience
   – Co-Author of
      • The Business of Data Vault Modeling (2008)
      • The Data Model Resource Book (1st Edition)
      • Oracle Designer: A Template for Developing an Enterprise
        Standards Document
   – Past-President of Oracle Development Tools User Group
     (ODTUG) and Rocky Mountain Oracle User Group
   – Co-Chair BIDW SIG for ODTUG
Data Vault Definition
The Data Vault is a detail oriented, historical
tracking and uniquely linked set of normalized
tables that support one or more functional areas
of business.

It is a hybrid approach encompassing the best of
breed between 3rd normal form (3NF) and star
schema. The design is flexible, scalable, consistent,
and adaptable to the needs of the enterprise. It is a
data model that is architected specifically to meet
the needs of today’s enterprise data warehouses.

                       Dan Linstedt: Defining the Data Vault
                       TDAN.com Article



                       (C) TeachDataVault.com
Where does a Data Vault Fit?




          (C) TeachDataVault.com
Where does a Data Vault Fit?
Oracle’s Next Generation Data Warehouse Reference Architecture




                              Data Vault goes here

                             (C) Oracle Corp
Why Bother With Something New?
       Old Chinese proverb:
       'Unless you change direction, you're
       apt to end up where you're headed.'




                (C) TeachDataVault.com
Why do we need it?

• We have seen issues in constructing (and
  managing) an enterprise data warehouse model
  using 3rd normal form, or Star Schema.
   – 3NF – Complex PKs with cascading snapshot
     dates (time-driven PKs)
   – Star – difficult to re-engineer fact tables for
     granularity changes
• These issues lead to break downs in
  flexibility, adaptability, and even scalability

                        (C) Kent Graziano
Data Vault Time Line
E.F. Codd invented           1976 Dr Peter Chen                          1990 – Dan Linstedt
relational modeling          Created E-R                                 Begins R&D on Data Vault
                             Diagramming                                 Modeling
  Chris Date and Hugh
  Darwen Maintained            Mid 70’s AC Nielsen
  and Refined                  Popularized
  Modeling                     Dimension & Fact Terms



1960                  1970                   1980                        1990                2000
                                                              Late 80’s – Barry Devlin and
                        Early 70’s Bill Inmon                 Dr Kimball Release “Business
                        Began Discussing Data                 Data Warehouse”
                        Warehousing

                                                          Mid 80’s Bill Inmon
                                                          Popularizes Data
        Mid 60’s Dimension & Fact Modeling                Warehousing
        presented by General Mills and                                                       2000 – Dan Linstedt
        Dartmouth University                            Mid – Late 80’s Dr Kimball           releases first 5 articles on
                                                        Popularizes Star Schema              Data Vault Modeling
                                                (C) TeachDataVault.com
Data Vault Modeling…




       (C) TeachDataVault.com
What Are the Issues?
This is NOT what you
want happening to
your project!




                (C) TeachDataVault.com
                                         THE GAP!!
What Are the Foundational Keys?

        Flexibility


                                        Scalability

            Productivity



               (C) TeachDataVault.com
Key: Flexibility (Agility)




Enabling rapid change on a massive scale
     without downstream impacts!


               (C) TeachDataVault.com
Key: Scalability




Providing no foreseeable barrier to
     increased size and scope
      People, Process, & Architecture!

           (C) TeachDataVault.com
Key: Productivity




Enabling low complexity systems with high
       value output at a rapid pace

               (C) TeachDataVault.com
Bringing the Data Vault to Your Project

HOW DOES IT WORK?


                             (C) TeachDataVault.com
Key: Flexibility (Agility)
• Goes beyond standard 3NF
 • Hyper normalized
    • Hubs and Links only holds keys and meta data
    • Satellites split by rate of change and/or source
 • Enables Agile data modeling
    • Easy to add to model without having to change existing structures
      and load routines
        • Relationships (links) can be dropped and created on-demand.
    • No more reloading history because of a missed requirement
• Based on natural business keys
 • Not system surrogate keys
 • Allows for integrating data across functions and source
   systems more easily
    • All data relationships are key driven.


                                (C) TeachDataVault.com
Key: Flexibility (Agility)




Adding new components to the EDW has NEAR ZERO impact to:
• Existing Loading Processes
• Existing Data Model
• Existing Reporting & BI Functions
• Existing Source Systems
• Existing Star Schemas and Data Marts
                 (C) TeachDataVault.com
Split and Merge ON DEMAND!
        2 weeks from now




                  6 months from now




            (C) TeachDataVault.com
Case In Point:
        Result of flexibility of the Data Vault Model
        allowed them to merge 3 companies in 90
        days – that is ALL systems, ALL DATA!




                   (C) TeachDataVault.com
Key: Scalability in Architecture




Scaling is easy, its based on the following principles
• Hub and spoke design
• MPP Shared-Nothing Architecture
• Scale Free Networks
• Can be partitioned vertically and horizontally to meet performance demands

                            (C) TeachDataVault.com
Perhaps You Wish To Split For
    Performance Reasons?
FROM THIS

                                     TO THIS!




            (C) TeachDataVault.com
Case In Point:

         Result of scalability was to produce a Data
         Vault model that scaled to 3 Petabytes in
         size, and is still growing today!




                   (C) TeachDataVault.com
Key: Scalability in Team Size




        You should be able to SCALE your TEAM as well!
          With the Data Vault methodology, you can:
Scale your team when desired, at different points in the project!


                     (C) TeachDataVault.com
Case In Point:                 (Dutch Tax Authority)

         Result of scalability was to increase ETL
         developers for each new source system,
         and reassign them when the system was
         completely loaded to the Data Vault




                   (C) TeachDataVault.com
Key: Productivity




Increasing Productivity requires a reduction in complexity.
The Data Vault Model simplifies all of the following:
• ETL Loading Routines
• Real-Time Ingestion of Data
• Data Modeling for the EDW
• Enhancing and Adapting for Change to the Model
• Ease of Monitoring, managing and optimizing processes

                   (C) TeachDataVault.com
Key: Productivity
• Standardized modeling rules
  • Highly repeatable and learnable modeling
    technique
  • Can standardize load routines
     • Delta Driven process
     • Re-startable, consistent loading patterns.
  • Can standardize extract routines
     • Rapid build of new or revised Data Marts
  • Can be automated
     • RapidACE (www.rapidace.com)

                          (C) Kent Graziano
Key: Productivity
• The Data Vault holds granular historical relationships.
     • Holds all history for all time, allowing any source
       system feeds to be reconstructed on-demand
         •   Easy generation of Audit Trails for data lineage and
             compliance.
         •   Data Mining can discover new relationships between
             elements
         •   Patterns of change emerge from the historical
             pictures and linkages.
• The Data Vault can be accessed by power-users


                           (C) Kent Graziano
Case in Point:
             Result of Productivity was: 2 people in 2
             weeks merged 3 systems, built a full Data
             Vault EDW, 5 star schemas and 3 reports.




     These individuals generated:
     • 90% of the ETL code for moving the data set
     • 100% of the Staging Data Model
     • 75% of the finished EDW data Model
     • 75% of the star schema data model

                        (C) TeachDataVault.com
The Competing Bid?
 The competition bid this with 15 people
 and 3 months to completion, at a cost of
 $250k! (they bid a Very complex system)




Actual total cost? $30k and 2 weeks!

          (C) TeachDataVault.com
Other Benefits of a Data Vault
• Modeling it as a DV forces integration of the Business Keys upfront.
     • Good for organizational alignment.
• An integrated data set with raw data extends it’s value beyond BI:
     •   Source for data quality projects
     •   Source for master data
     •   Source for data mining
     •   Source for Data as a Service (DaaS) in an SOA (Service Oriented Architecture).
• Upfront Hub integration simplifies the data integration routines
  required to load data marts.
     •   Helps divide the work a bit.
•   It is much easier to implement security on these granular pieces.
•   Granular, re-startable processes enable pin-point failure correction.
•   It is designed and optimized for real-time loading in its core
    architecture (without any tweaks or mods).

                                            (C) Kent Graziano
Conclusion?




 Changing the direction of the river takes
less effort than stopping the flow of water

               (C) TeachDataVault.com
The Experts Say…
  “The Data Vault is the optimal choice for
  modeling the EDW in the DW 2.0
  framework.” Bill Inmon

   “The Data Vault is foundationally
   strong and exceptionally scalable
   architecture.”      Stephen Brobst



        “The Data Vault is a technique which some industry
        experts have predicted may spark a revolution as the
        next big thing in data modeling for enterprise
        warehousing....”                    Doug Laney
More Notables…

   “This enables organizations to take control of their
   data warehousing destiny, supporting better and
   more relevant data warehouses in less time than
   before.”                 Howard Dresner



  “[The Data Vault] captures a practical body of
  knowledge for data warehouse development which
  both agile and traditional practitioners will benefit
  from..”               Scott Ambler
Who’s Using It?
Growing Adoption…
• The number of Data Vault users in the US
  surpassed 500 in 2010 and grows rapidly
  (http://danlinstedt.com/about/dv-
  customers/)




                    (C) Kent Graziano
In Review…
• Data Vault provides you with the tools you need to
  succeed in your DW/BI projects
• Flexibility
   • Enabling rapid change on a massive scale without
     downstream impacts!
• Scalability
   • Providing no foreseeable barrier to increased size and
     scope
• Productivity
   • Enabling low complexity systems with high value output at
     a rapid pace


                         (C) TeachDataVault.com
(C) TeachDataVault.com
Where To Learn More
     The Technical Modeling Book: http://LearnDataVault.com

      On YouTube: http://www.youtube.com/LearnDataVault

          On Facebook: www.facebook.com/learndatavault

                 Dan’s Blog: www.danlinstedt.com

The Discussion Forums: http://LinkedIn.com – Data Vault Discussions

       World wide User Group (Free): http://dvusergroup.com

              The Business of Data Vault Modeling
          by Dan Linstedt, Kent Graziano, Hans Hultgren
                  (available at www.lulu.com )
                                                                      38
10/11/2011   (C) TeachDataVault.com   39
Contact Information


                 Kent Graziano
             Kent.graziano@att.net

            Want more Data Vault?
Session # 05923: Introduction to Data Vault Modeling
     Thursday, 4:00 PM, Moscone South Rm 303

Contenu connexe

Tendances

Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
Denodo
 

Tendances (20)

Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012
 
Data Vault Introduction
Data Vault IntroductionData Vault Introduction
Data Vault Introduction
 
Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)
 
Operational Data Vault
Operational Data VaultOperational Data Vault
Operational Data Vault
 
Présentation data vault et bi v20120508
Présentation data vault et bi v20120508Présentation data vault et bi v20120508
Présentation data vault et bi v20120508
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Data Vault 2.0 DeMystified with Dan Linstedt and WhereScape
Data Vault 2.0 DeMystified with Dan Linstedt and WhereScapeData Vault 2.0 DeMystified with Dan Linstedt and WhereScape
Data Vault 2.0 DeMystified with Dan Linstedt and WhereScape
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Data Warehouse Agility Array Conference2011
Data Warehouse Agility Array Conference2011Data Warehouse Agility Array Conference2011
Data Warehouse Agility Array Conference2011
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a Service
 
ETL Technologies.pptx
ETL Technologies.pptxETL Technologies.pptx
ETL Technologies.pptx
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform System
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Reference master data management
Reference master data managementReference master data management
Reference master data management
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
 

En vedette

VAULT CONSTRUCTION
VAULT CONSTRUCTIONVAULT CONSTRUCTION
VAULT CONSTRUCTION
Aida Nesa
 

En vedette (20)

Data Vault: Data Warehouse Design Goes Agile
Data Vault: Data Warehouse Design Goes AgileData Vault: Data Warehouse Design Goes Agile
Data Vault: Data Warehouse Design Goes Agile
 
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureData Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
 
Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Extreme BI: Creating Virtualized Hybrid Type 1+2 DimensionsExtreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
 
Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)
 
Agile Data Warehousing: Using SDDM to Build a Virtualized ODS
Agile Data Warehousing: Using SDDM to Build a Virtualized ODSAgile Data Warehousing: Using SDDM to Build a Virtualized ODS
Agile Data Warehousing: Using SDDM to Build a Virtualized ODS
 
Visual Data Vault
Visual Data VaultVisual Data Vault
Visual Data Vault
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
 
Agile Methods and Data Warehousing
Agile Methods and Data WarehousingAgile Methods and Data Warehousing
Agile Methods and Data Warehousing
 
Worst Practices in Data Warehouse Design
Worst Practices in Data Warehouse DesignWorst Practices in Data Warehouse Design
Worst Practices in Data Warehouse Design
 
Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016
 
VAULT CONSTRUCTION
VAULT CONSTRUCTIONVAULT CONSTRUCTION
VAULT CONSTRUCTION
 
AnalytiX DS - Master Deck
AnalytiX DS - Master DeckAnalytiX DS - Master Deck
AnalytiX DS - Master Deck
 
From Business Intelligence to Big Data - hack/reduce Dec 2014
From Business Intelligence to Big Data - hack/reduce Dec 2014From Business Intelligence to Big Data - hack/reduce Dec 2014
From Business Intelligence to Big Data - hack/reduce Dec 2014
 
Shorter time to insight more adaptable less costly bi with end to end modelst...
Shorter time to insight more adaptable less costly bi with end to end modelst...Shorter time to insight more adaptable less costly bi with end to end modelst...
Shorter time to insight more adaptable less costly bi with end to end modelst...
 
Oracle Database Vault
Oracle Database VaultOracle Database Vault
Oracle Database Vault
 
Construyendo pruebas para un DWH usando un paradigma de modelado Data Vault
Construyendo pruebas para un DWH usando un paradigma de modelado Data VaultConstruyendo pruebas para un DWH usando un paradigma de modelado Data Vault
Construyendo pruebas para un DWH usando un paradigma de modelado Data Vault
 
EDW Data Model Storming for Integration of NoSQL and RDBMS by Daniel Upton
EDW Data Model Storming for Integration of NoSQL and RDBMS by Daniel UptonEDW Data Model Storming for Integration of NoSQL and RDBMS by Daniel Upton
EDW Data Model Storming for Integration of NoSQL and RDBMS by Daniel Upton
 
Oracle Database Vault
Oracle Database VaultOracle Database Vault
Oracle Database Vault
 
Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
 

Similaire à Why Data Vault?

The Death of the Star Schema
The Death of the Star SchemaThe Death of the Star Schema
The Death of the Star Schema
DATAVERSITY
 
Big Data LDN 2018: HOW RANK GAMING PRODUCTIONISED & AUTOMATED THE MANAGEMENT ...
Big Data LDN 2018: HOW RANK GAMING PRODUCTIONISED & AUTOMATED THE MANAGEMENT ...Big Data LDN 2018: HOW RANK GAMING PRODUCTIONISED & AUTOMATED THE MANAGEMENT ...
Big Data LDN 2018: HOW RANK GAMING PRODUCTIONISED & AUTOMATED THE MANAGEMENT ...
Matt Stubbs
 

Similaire à Why Data Vault? (20)

The Death of the Star Schema
The Death of the Star SchemaThe Death of the Star Schema
The Death of the Star Schema
 
Modern data warehouse presentation
Modern data warehouse presentationModern data warehouse presentation
Modern data warehouse presentation
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Original: Lean Data Model Storming for the Agile Enterprise
Original: Lean Data Model Storming for the Agile EnterpriseOriginal: Lean Data Model Storming for the Agile Enterprise
Original: Lean Data Model Storming for the Agile Enterprise
 
Big Data LDN 2018: HOW RANK GAMING PRODUCTIONISED & AUTOMATED THE MANAGEMENT ...
Big Data LDN 2018: HOW RANK GAMING PRODUCTIONISED & AUTOMATED THE MANAGEMENT ...Big Data LDN 2018: HOW RANK GAMING PRODUCTIONISED & AUTOMATED THE MANAGEMENT ...
Big Data LDN 2018: HOW RANK GAMING PRODUCTIONISED & AUTOMATED THE MANAGEMENT ...
 
Data Vault 2.0 Demystified: East Coast Tour
Data Vault 2.0 Demystified: East Coast TourData Vault 2.0 Demystified: East Coast Tour
Data Vault 2.0 Demystified: East Coast Tour
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Big Data/Cloudera from Excelerate Systems
Big Data/Cloudera from Excelerate SystemsBig Data/Cloudera from Excelerate Systems
Big Data/Cloudera from Excelerate Systems
 
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
 
Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
 
Data Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationData Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data Visualisation
 
Meetup 25/04/19: Big Data
Meetup 25/04/19: Big DataMeetup 25/04/19: Big Data
Meetup 25/04/19: Big Data
 

Plus de Kent Graziano

Plus de Kent Graziano (9)

Balance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudBalance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data Cloud
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
 
Rise of the Data Cloud
Rise of the Data CloudRise of the Data Cloud
Rise of the Data Cloud
 
Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeDelivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with Snowflake
 
Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)
 
Making Sense of Schema on Read
Making Sense of Schema on ReadMaking Sense of Schema on Read
Making Sense of Schema on Read
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFW
 
Top Five Cool Features in Oracle SQL Developer Data Modeler
Top Five Cool Features in Oracle SQL Developer Data ModelerTop Five Cool Features in Oracle SQL Developer Data Modeler
Top Five Cool Features in Oracle SQL Developer Data Modeler
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Why Data Vault?

  • 1. Why Data Vault? Kent Graziano Data Vault Master and Oracle ACE TrueBridge Resources OOW 2011 Session #28782
  • 2. My Bio • Kent Graziano – Certified Data Vault Master – Oracle ACE (BI/DW) – Data Architecture and Data Warehouse Specialist • 30 years in IT • 20 years of Oracle-related work • 15+ years of data warehousing experience – Co-Author of • The Business of Data Vault Modeling (2008) • The Data Model Resource Book (1st Edition) • Oracle Designer: A Template for Developing an Enterprise Standards Document – Past-President of Oracle Development Tools User Group (ODTUG) and Rocky Mountain Oracle User Group – Co-Chair BIDW SIG for ODTUG
  • 3. Data Vault Definition The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent, and adaptable to the needs of the enterprise. It is a data model that is architected specifically to meet the needs of today’s enterprise data warehouses. Dan Linstedt: Defining the Data Vault TDAN.com Article (C) TeachDataVault.com
  • 4. Where does a Data Vault Fit? (C) TeachDataVault.com
  • 5. Where does a Data Vault Fit? Oracle’s Next Generation Data Warehouse Reference Architecture Data Vault goes here (C) Oracle Corp
  • 6. Why Bother With Something New? Old Chinese proverb: 'Unless you change direction, you're apt to end up where you're headed.' (C) TeachDataVault.com
  • 7. Why do we need it? • We have seen issues in constructing (and managing) an enterprise data warehouse model using 3rd normal form, or Star Schema. – 3NF – Complex PKs with cascading snapshot dates (time-driven PKs) – Star – difficult to re-engineer fact tables for granularity changes • These issues lead to break downs in flexibility, adaptability, and even scalability (C) Kent Graziano
  • 8. Data Vault Time Line E.F. Codd invented 1976 Dr Peter Chen 1990 – Dan Linstedt relational modeling Created E-R Begins R&D on Data Vault Diagramming Modeling Chris Date and Hugh Darwen Maintained Mid 70’s AC Nielsen and Refined Popularized Modeling Dimension & Fact Terms 1960 1970 1980 1990 2000 Late 80’s – Barry Devlin and Early 70’s Bill Inmon Dr Kimball Release “Business Began Discussing Data Data Warehouse” Warehousing Mid 80’s Bill Inmon Popularizes Data Mid 60’s Dimension & Fact Modeling Warehousing presented by General Mills and 2000 – Dan Linstedt Dartmouth University Mid – Late 80’s Dr Kimball releases first 5 articles on Popularizes Star Schema Data Vault Modeling (C) TeachDataVault.com
  • 9. Data Vault Modeling… (C) TeachDataVault.com
  • 10. What Are the Issues? This is NOT what you want happening to your project! (C) TeachDataVault.com THE GAP!!
  • 11. What Are the Foundational Keys? Flexibility Scalability Productivity (C) TeachDataVault.com
  • 12. Key: Flexibility (Agility) Enabling rapid change on a massive scale without downstream impacts! (C) TeachDataVault.com
  • 13. Key: Scalability Providing no foreseeable barrier to increased size and scope People, Process, & Architecture! (C) TeachDataVault.com
  • 14. Key: Productivity Enabling low complexity systems with high value output at a rapid pace (C) TeachDataVault.com
  • 15. Bringing the Data Vault to Your Project HOW DOES IT WORK? (C) TeachDataVault.com
  • 16. Key: Flexibility (Agility) • Goes beyond standard 3NF • Hyper normalized • Hubs and Links only holds keys and meta data • Satellites split by rate of change and/or source • Enables Agile data modeling • Easy to add to model without having to change existing structures and load routines • Relationships (links) can be dropped and created on-demand. • No more reloading history because of a missed requirement • Based on natural business keys • Not system surrogate keys • Allows for integrating data across functions and source systems more easily • All data relationships are key driven. (C) TeachDataVault.com
  • 17. Key: Flexibility (Agility) Adding new components to the EDW has NEAR ZERO impact to: • Existing Loading Processes • Existing Data Model • Existing Reporting & BI Functions • Existing Source Systems • Existing Star Schemas and Data Marts (C) TeachDataVault.com
  • 18. Split and Merge ON DEMAND! 2 weeks from now 6 months from now (C) TeachDataVault.com
  • 19. Case In Point: Result of flexibility of the Data Vault Model allowed them to merge 3 companies in 90 days – that is ALL systems, ALL DATA! (C) TeachDataVault.com
  • 20. Key: Scalability in Architecture Scaling is easy, its based on the following principles • Hub and spoke design • MPP Shared-Nothing Architecture • Scale Free Networks • Can be partitioned vertically and horizontally to meet performance demands (C) TeachDataVault.com
  • 21. Perhaps You Wish To Split For Performance Reasons? FROM THIS TO THIS! (C) TeachDataVault.com
  • 22. Case In Point: Result of scalability was to produce a Data Vault model that scaled to 3 Petabytes in size, and is still growing today! (C) TeachDataVault.com
  • 23. Key: Scalability in Team Size You should be able to SCALE your TEAM as well! With the Data Vault methodology, you can: Scale your team when desired, at different points in the project! (C) TeachDataVault.com
  • 24. Case In Point: (Dutch Tax Authority) Result of scalability was to increase ETL developers for each new source system, and reassign them when the system was completely loaded to the Data Vault (C) TeachDataVault.com
  • 25. Key: Productivity Increasing Productivity requires a reduction in complexity. The Data Vault Model simplifies all of the following: • ETL Loading Routines • Real-Time Ingestion of Data • Data Modeling for the EDW • Enhancing and Adapting for Change to the Model • Ease of Monitoring, managing and optimizing processes (C) TeachDataVault.com
  • 26. Key: Productivity • Standardized modeling rules • Highly repeatable and learnable modeling technique • Can standardize load routines • Delta Driven process • Re-startable, consistent loading patterns. • Can standardize extract routines • Rapid build of new or revised Data Marts • Can be automated • RapidACE (www.rapidace.com) (C) Kent Graziano
  • 27. Key: Productivity • The Data Vault holds granular historical relationships. • Holds all history for all time, allowing any source system feeds to be reconstructed on-demand • Easy generation of Audit Trails for data lineage and compliance. • Data Mining can discover new relationships between elements • Patterns of change emerge from the historical pictures and linkages. • The Data Vault can be accessed by power-users (C) Kent Graziano
  • 28. Case in Point: Result of Productivity was: 2 people in 2 weeks merged 3 systems, built a full Data Vault EDW, 5 star schemas and 3 reports. These individuals generated: • 90% of the ETL code for moving the data set • 100% of the Staging Data Model • 75% of the finished EDW data Model • 75% of the star schema data model (C) TeachDataVault.com
  • 29. The Competing Bid? The competition bid this with 15 people and 3 months to completion, at a cost of $250k! (they bid a Very complex system) Actual total cost? $30k and 2 weeks! (C) TeachDataVault.com
  • 30. Other Benefits of a Data Vault • Modeling it as a DV forces integration of the Business Keys upfront. • Good for organizational alignment. • An integrated data set with raw data extends it’s value beyond BI: • Source for data quality projects • Source for master data • Source for data mining • Source for Data as a Service (DaaS) in an SOA (Service Oriented Architecture). • Upfront Hub integration simplifies the data integration routines required to load data marts. • Helps divide the work a bit. • It is much easier to implement security on these granular pieces. • Granular, re-startable processes enable pin-point failure correction. • It is designed and optimized for real-time loading in its core architecture (without any tweaks or mods). (C) Kent Graziano
  • 31. Conclusion? Changing the direction of the river takes less effort than stopping the flow of water (C) TeachDataVault.com
  • 32. The Experts Say… “The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework.” Bill Inmon “The Data Vault is foundationally strong and exceptionally scalable architecture.” Stephen Brobst “The Data Vault is a technique which some industry experts have predicted may spark a revolution as the next big thing in data modeling for enterprise warehousing....” Doug Laney
  • 33. More Notables… “This enables organizations to take control of their data warehousing destiny, supporting better and more relevant data warehouses in less time than before.” Howard Dresner “[The Data Vault] captures a practical body of knowledge for data warehouse development which both agile and traditional practitioners will benefit from..” Scott Ambler
  • 35. Growing Adoption… • The number of Data Vault users in the US surpassed 500 in 2010 and grows rapidly (http://danlinstedt.com/about/dv- customers/) (C) Kent Graziano
  • 36. In Review… • Data Vault provides you with the tools you need to succeed in your DW/BI projects • Flexibility • Enabling rapid change on a massive scale without downstream impacts! • Scalability • Providing no foreseeable barrier to increased size and scope • Productivity • Enabling low complexity systems with high value output at a rapid pace (C) TeachDataVault.com
  • 38. Where To Learn More The Technical Modeling Book: http://LearnDataVault.com On YouTube: http://www.youtube.com/LearnDataVault On Facebook: www.facebook.com/learndatavault Dan’s Blog: www.danlinstedt.com The Discussion Forums: http://LinkedIn.com – Data Vault Discussions World wide User Group (Free): http://dvusergroup.com The Business of Data Vault Modeling by Dan Linstedt, Kent Graziano, Hans Hultgren (available at www.lulu.com ) 38
  • 39. 10/11/2011 (C) TeachDataVault.com 39
  • 40. Contact Information Kent Graziano Kent.graziano@att.net Want more Data Vault? Session # 05923: Introduction to Data Vault Modeling Thursday, 4:00 PM, Moscone South Rm 303