SlideShare a Scribd company logo
Soumettre la recherche
Mettre en ligne
S’identifier
S’inscrire
Kent-Graziano-Intro-to-Datavault_short.pdf
Signaler
abhaybansal43
Suivre
28 May 2023
•
0 j'aime
•
4 vues
1
sur
42
Kent-Graziano-Intro-to-Datavault_short.pdf
28 May 2023
•
0 j'aime
•
4 vues
Télécharger maintenant
Télécharger pour lire hors ligne
Signaler
Données & analyses
Data Vault
abhaybansal43
Suivre
Recommandé
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Kent Graziano
4.2K vues
•
55 diapositives
DataOps - The Foundation for Your Agile Data Architecture
DATAVERSITY
1.2K vues
•
57 diapositives
Intro to Data Vault 2.0 on Snowflake
Kent Graziano
1.8K vues
•
61 diapositives
Amazon DynamoDB Deep Dive Advanced Design Patterns for DynamoDB (DAT401) - AW...
Amazon Web Services
19.4K vues
•
66 diapositives
Graphs fun vjug2
Neo4j
1.6K vues
•
61 diapositives
Continuum Analytics and Python
Travis Oliphant
4.7K vues
•
94 diapositives
Contenu connexe
Similaire à Kent-Graziano-Intro-to-Datavault_short.pdf
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Databricks
875 vues
•
61 diapositives
A Key to Real-time Insights in a Post-COVID World (ASEAN)
Denodo
113 vues
•
39 diapositives
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
Bipin Singh
1.5K vues
•
80 diapositives
SQL in the Hybrid World
Tanel Poder
36K vues
•
28 diapositives
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Kent Graziano
3.6K vues
•
48 diapositives
Building FoundationDB
FoundationDB
3.4K vues
•
38 diapositives
Similaire à Kent-Graziano-Intro-to-Datavault_short.pdf
(20)
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Databricks
•
875 vues
A Key to Real-time Insights in a Post-COVID World (ASEAN)
Denodo
•
113 vues
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
Bipin Singh
•
1.5K vues
SQL in the Hybrid World
Tanel Poder
•
36K vues
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Kent Graziano
•
3.6K vues
Building FoundationDB
FoundationDB
•
3.4K vues
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
•
901 vues
Data Vault 2.0: Big Data Meets Data Warehousing
All Things Open
•
2.2K vues
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
SnapLogic
•
836 vues
Agile Data Rationalization for Operational Intelligence
Inside Analysis
•
1.9K vues
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Data Con LA
•
682 vues
2019-Nov: Domain Driven Design (DDD) and when not to use it
Mark Windholtz
•
59 vues
Washington DC DataOps Meetup -- Nov 2019
DataKitchen
•
3.3K vues
Azure + DataStax Enterprise Powers Office 365 Per User Store
DataStax Academy
•
1.7K vues
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
•
5.3K vues
Data Insights for Breakfast, Malmö - Snowflake
Solita Oy
•
309 vues
datavault2.pptx
Mounika662749
•
3 vues
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Deepak Chandramouli
•
278 vues
Modern Data Integration Expert Session Webinar
ibi
•
1.5K vues
Modern data integration expert sessions
JessicaMurrell3
•
35 vues
Dernier
Factors Influencing the Choice of Business Education in Bangladesh..pdf
Shamim Rana
7 vues
•
36 diapositives
SUSTAINABLE NETWORKS.pptx
GeorgeDiamandis11
61 vues
•
22 diapositives
apidays London 2023 - 7 pillars of an API Factory, Patrick Brosse, Amadeus
apidays
9 vues
•
16 diapositives
Quantum Karnaugh map in NV-center Quantum Computer
ssuserb645ae
9 vues
•
17 diapositives
Predictive HR Analytics_ Mastering the HR Metric ( PDFDrive ).pdf
Santhosh Prabhu
25 vues
•
473 diapositives
apidays London 2023 - How APIs support the democratization of FAIR data and d...
apidays
9 vues
•
20 diapositives
Dernier
(20)
Factors Influencing the Choice of Business Education in Bangladesh..pdf
Shamim Rana
•
7 vues
SUSTAINABLE NETWORKS.pptx
GeorgeDiamandis11
•
61 vues
apidays London 2023 - 7 pillars of an API Factory, Patrick Brosse, Amadeus
apidays
•
9 vues
Quantum Karnaugh map in NV-center Quantum Computer
ssuserb645ae
•
9 vues
Predictive HR Analytics_ Mastering the HR Metric ( PDFDrive ).pdf
Santhosh Prabhu
•
25 vues
apidays London 2023 - How APIs support the democratization of FAIR data and d...
apidays
•
9 vues
COMPOSABLE COMPANY.pptx
GeorgeDiamandis11
•
125 vues
apidays London 2023 - Revolutionising fitness and well-being, David Turner, V...
apidays
•
5 vues
the effect of phone electromagnetig waves on the body ;docx
HimRong
•
20 vues
All-sql-cheat-sheet-a4.pdf
ssuser8392a0
•
11 vues
FavorIndexReport_R8.pdf
Favor Delivery
•
105 vues
Proposal Presentation
SolarBhai
•
9 vues
Structural Intersectional Theory.pdf
Aashish Juneja
•
35 vues
Essential numpy before you start your Machine Learning journey in python.pdf
Smrati Kumar Katiyar
•
5 vues
apidays London 2023 - When to soar and when to dive, Claire Barrett, APIsFirst
apidays
•
6 vues
Database Design
VICTOR MAESTRE RAMIREZ
•
6 vues
DIGITAL TRANSFORMATION AND STRATEGY_final.pptx
GeorgeDiamandis11
•
153 vues
apidays London 2023 - Uptime, Mean-Time, and Ahead of Your Time, Anna Daugher...
apidays
•
5 vues
PLM 200.pdf
Midhun Mohan
•
7 vues
Unlocking the Power of Integer Programming
Florian Wilhelm
•
16 vues
Kent-Graziano-Intro-to-Datavault_short.pdf
1.
AGILE DATA ENGINEERING Introduction to
Data Vault 2.0 KENT GRAZIANO, CHIEF TECHNICAL EVANGELIST KentGraziano
2.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved Agenda • Bio • Agile & DW • What is a Data Vault & Where does it fit? • How to design a Data Vault model • Foundational Keys • Benefits of Data Vault • Who is using Data Vault? • References
3.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved › Chief Technical Evangelist, Snowflake Computing › Blogger: The Data Warrior › Certified Data Vault Master and DV 2.0 Practitioner › Oracle ACE Director (Alumni) › OakTable Member › Member – DAMA Houston & DAMA International › Data Modeling, Data Architecture and Data Warehouse Specialist › 30+ years in IT › 25+ years of Oracle-related work › 20+ years of data warehousing experience › Former-Member: Boulder BI Brain Trust (http://www.boulderbibraintrust.org/) › Author & Co-Author of a bunch of books › Past-President of ODTUG and Rocky Mountain Oracle User Group My Bio
4.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved 3 years in stealth + 3 years GA 4 800+ employees Over 2000 customers today Over $850M in venture funding from leading investors First customers 2014, general availability 2015 Founded 2012 by industry veterans with over 120 database patents Queries processed in Snowflake per day: 100 million Largest single table: 68 trillion rows Largest number of tables single DB: 200,000 Single customer most data: > 40PB Single customer most users: > 10,000 Fun facts:
5.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved http://agilemanifesto.org "We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value: ★Individuals and interactions over processes and tools ★Working software over comprehensive documentation ★Customer collaboration over contract negotiation ★Responding to change over following a plan That is, while there is value in the items on the right, we value the items on the left more.” Manifesto for Agile Software Development Kent Beck Mike Beedle Arie van Bennekum Alistair Cockburn Ward Cunningham Martin Fowler James Grenning Jim Highsmith Andrew Hunt Ron Jeffries Jon Kern Brian Marick Robert C. Martin Steve Mellor Ken Schwaber Jeff Sutherland Dave Thomas
6.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved • User Stories instead of requirements documents • Time-based iterations • Iteration has a standard length • Choose one or more user stories to fit in that iteration • Rework is part of the game • There are no “Missed requirements”…only those that haven’t been discovered yet Applying Agile to DW
7.
© 2018 Snowflake
Computing Inc. All Rights Reserved Data Vault Model Definition The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise. Architected specifically to meet the needs of today’s enterprise data warehouses Dan Linstedt: Defining the Data Vault
8.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved • What are our other Enterprise Data Warehouse Options? • Third-Normal Form (3NF): Complex primary keys (PK’s) with cascading snapshot • Star Schema (Dimensional): Difficult to reengineer fact tables for granularity changes • Difficult to get it right the first time • Not adaptable to rapid business change • NOT AGILE! What is Data Vault Trying to Solve?
9.
Data Vault Timeline 1960
1970 1980 1990 2000 E.F. Codd invented relational modeling Chris Date and Hugh Harwin refined modeling concepts 1976: Dr. Peter Chen created E-R diagramming Mid 70’s: AC Nielsen popularized Dimension & Fact Terms 1990: Dan Linstedt begins R&D on Data Vault Modeling Early 70’s: Bill Inmon began discussing Data Warehousing Mid 60’s: Dimension & Fact modeling presented by General Mills and Dartmouth University Late 80’s: Barry Devlin and Dr. Kimball release “Business Data Warehouse” Mid 80’s: Bill Inmon popularized Data Warehousing Mid-Late 80’s: Dr. Kimball popularizes Star Schema 2000: Dan Linstedt release first 5 articles on Data Vault Modeling © 2018 Snowflake Computing Inc..
10.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved • The work on the Data Vault approach began in the early 1990s; completed around 1999. • Throughout 1999, 2000, and 2001, the Data Vault design was tested, refined, and deployed into specific customer sites. • In 2002, the industry thought leaders were asked to review the architecture. • Kent meets Dan at a Lunch & Learn in Denver! • In 2003, Dan began teaching the modeling techniques to the mass public. • Kent & Team take their 1st Data Vault Modeling class • In 2014, Dan introduced DV 2.0! Data Vault Evolution
11.
© 2018 Snowflake
Computing Inc. All Rights Reserved WHAT DOES THE DATA VAULT MODEL LOOK LIKE?
12.
© 2018 Snowflake
Computing Inc. All Rights Reserved Data Vault: 3 Simple Structures HUB LINK SATELLITE EDW Data Vault 1 2 3
13.
© 2018 Snowflake
Computing Inc. All Rights Reserved Data Vault Core Architecture HUBS ------------------- Unique List of Business Keys LINKS ------------------- Unique List of Relationships across Keys SATS ------------------- Descriptive Data ● Satellites have one and only one parent table ● Satellites cannot be parents to other tables ● Hubs cannot be child tables
14.
© 2018 Snowflake
Computing Inc. All Rights Reserved • Hub: List of UNIQUE business keys. • Link: List of UNIQUE relationships across keys • Satellite: Historical descriptive data. Standard Data Vault Model
15.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved 1. Hub = Business Keys Hubs = Unique Lists of Business Keys Business Keys are used to TRACK and IDENTIFY key information DV 2.0 uses MD5 Hash of the BK for the PK Natural Business Keys also acceptable for PK HUB_ORDER H * P MD5_HUB_ORDER VARCHAR2 (32) * U O_ORDERID INTEGER * LDTS TIMESTAMP * RSCR VARCHAR2 (256) O_ORDERKEY INTEGER HUB_ORDER_PK (MD5_HUB_ORDER) HUB_ORDER__UN (O_ORDERID) HUB_CUSTOMER H * P MD5_HUB_CUSTOMER VARCHAR2 (32) * U C_NAME VARCHAR2 (80) LDTS TIMESTAMP RSCR VARCHAR2 (256) C_CUSTKEY INTEGER HUB_CUSTOMER_PK (MD5_HUB_CUSTOMER) HUB_CUSTOMER__UN (C_NAME) HUB_SUPPLIER H * P MD5_HUB_SUPPLIER VARCHAR2 (32) * U S_NAME VARCHAR2 (80) * LDTS TIMESTAMP * RSCR VARCHAR2 (256) * S_SUPPKEY INTEGER HUB_SUPPLIER_PK (MD5_HUB_SUPPLIER) HUB_SUPPLIER__UN (S_NAME) HUB_PART H * P MD5_HUB_PART VARCHAR2 (32) * U P_NAME VARCHAR2 (80) * U P_BRAND VARCHAR2 (80) * U P_TYPE VARCHAR2 (80) * U P_SIZE INTEGER * U P_CONTAINER VARCHAR2 (80) * LDTS TIMESTAMP * RSCR VARCHAR2 (256) P_PARTKEY INTEGER HUB_PART_PK (MD5_HUB_PART) HUB_PART__UN (P_NAME, P_BRAND, P_TYPE, P_SIZE, P_CONTAINER)
16.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved • MD5 hash function – Snowflake • MD5 (UPPER(RTRIM(RMC.CAFCUSCHN))) • MD5 hash function – Oracle • rawtohex(sys.utl_raw.cast_to_raw(dbms_obfuscation_toolkit.md5 (input_string => ...) • NEW: dbms_crypto.HASH(utl_raw.cast_to_raw(<input string>), 2); • 2 is for MD5 algorithm option • MD5 hash function - SQL Server • CONVERT([Char](32),HASHBYTES('MD5’, UPPER(RTRIM(RMC.CAFCUSCHN)))) • Need to minimize chance of duplicates • 12||3||45 and 1||2||345 hash to same value • Need a separator between each • Example: Col1||’^’||Col2||’^’||Col3 • Need to account for NULLs too What Does the MD5 Look Like?
17.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved • To generate most consistent string: standardize! • Convert data types • If 'NUMBER', 'VARCHAR', ‘TEXT’ • THEN 'TO_CHAR(' || column_name || ‘)‘ • If LIKE 'DATE%‘ • THEN 'TO_CHAR(' || column_name || ', ''YYYY-MM-DD’’)‘ • If LIKE 'TIME%‘ • THEN 'TO_CHAR(' || column_name || ', ''YYYY-MM-DD HH24:MI:SS'')' Other Considerations
18.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved Final Input String ( UPPER(TRIM(T1.GENERICNAME)) ||'^'|| UPPER(TRIM(TO_CHAR(T1.MED_STRNG_AMT))) ||'^'|| UPPER(TRIM(T1.UOM_CD)) ||'^'|| UPPER(TRIM(T1.MED_FORM_NM)) ||'^' )
19.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved 2: Links = Associations Links = Transactions and Associations They are used to hook together multiple sets of information In DV 2.0 the BK attributes may migrate to the Links for faster query LNK_CUSTOMER_ORDER L * P MD5_LNK_CUSTOMER_ORDER VARCHAR2 (32) * UF MD5_HUB_CUSTOMER VARCHAR2 (32) * UF MD5_HUB_ORDER VARCHAR2 (32) * C_NAME VARCHAR2 (80) * O_ORDERID INTEGER * LDTS TIMESTAMP * RSCR VARCHAR2 (256) LNK_CUSTOMER_ORDER_PK (MD5_LNK_CUSTOMER_ORDER) LNK_CUSTOMER_ORDER__UN (MD5_HUB_CUSTOMER, MD5_HUB_ORDER) LNK_CUSTOMER_ORDER_HUB_CUSTOMER_FK (MD5_HUB_CUSTOMER) LNK_CUSTOMER_ORDER_HUB_ORDER_FK (MD5_HUB_ORDER) HUB_ORDER H * P MD5_HUB_ORDER VARCHAR2 (32) * U O_ORDERID INTEGER * LDTS TIMESTAMP * RSCR VARCHAR2 (256) O_ORDERKEY INTEGER HUB_ORDER_PK (MD5_HUB_ORDER) HUB_ORDER__UN (O_ORDERID) HUB_CUSTOMER H * P MD5_HUB_CUSTOMER VARCHAR2 (32) * U C_NAME VARCHAR2 (80) LDTS TIMESTAMP RSCR VARCHAR2 (256) C_CUSTKEY INTEGER HUB_CUSTOMER_PK (MD5_HUB_CUSTOMER) HUB_CUSTOMER__UN (C_NAME)
20.
© 2018 Snowflake
Computing Inc. All Rights Reserved Tomorrow With DV No need to change EDW structure The business rule can change to a 1:M Relationship is a 1:1 so why model a Link? Modeling Links - 1:1 or 1:M? Today You discover new data later Existing data is fine New data is added
21.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved 3. Satellites = Descriptors Satellites provide context for the Hubs and the Links Tracks changes over time - Like SCD 2 In DV 2.0 use HASH_DIFF to detect changes SAT_ORDER S * PF MD5_HUB_ORDER VARCHAR2 (32) * P LDTS TIMESTAMP O_ORDERSTATUS VARCHAR2 (1) O_TOTALPRICE NUMBER (12,2) O_ORDERDATE DATE O_ORDERPRIORITY VARCHAR2 (15) O_CLERK VARCHAR2 (15) O_SHIPPRIORITY INTEGER O_COMMENT VARCHAR2 (79) * HASH_DIFF VARCHAR2 (32) * RSRC VARCHAR2 (256) SAT_ORDER_PK (MD5_HUB_ORDER, LDTS) SAT_ORDER_HUB_ORDER_FK (MD5_HUB_ORDER) SAT_LNK_PART_SUPPLIER S * PF MD5_LNK_PART_SUPPLIER VARCHAR2 (32) * P LDTS TIMESTAMP PS_AVAILQTY INTEGER PS_SUPPLYCOST NUMBER (12,2) PS_COMMENT VARCHAR2 (199) * HASH_DIFF VARCHAR2 (32) * RSRC VARCHAR2 (256) PK_SAT_LNK_PART_SUPPLIER (LDTS, MD5_LNK_PART_SUPPLIER) SAT_LNK_PART_SUPPLIER_LNK_PART_SUPPLIER_FK (MD5_LNK_PART_SUPPLIER) HUB_ORDER H * P MD5_HUB_ORDER VARCHAR2 (32) * U O_ORDERID INTEGER * LDTS TIMESTAMP * RSRC VARCHAR2 (256) HUB_ORDER_PK (MD5_HUB_ORDER) HUB_ORDER__UN (O_ORDERID) LNK_PART_SUPPLIER L * P MD5_LNK_PART_SUPPLIER VARCHAR2 (32) * UF MD5_HUB_SUPPLIER VARCHAR2 (32) * UF MD5_HUB_PART VARCHAR2 (32) * S_NAME VARCHAR2 (80) * P_NAME VARCHAR2 (80) * P_BRAND VARCHAR2 (80) * P_TYPE VARCHAR2 (80) * P_SIZE INTEGER * P_CONTAINER VARCHAR2 (80) * LDTS TIMESTAMP * RSRC VARCHAR2 (256) LNK_PART_SUPPLIER_PK (MD5_LNK_PART_SUPPLIER) LNK_PART_SUPPLIER__UN (MD5_HUB_SUPPLIER, MD5_HUB_PART) LNK_PART_SUPPLIER_HUB_PART_FK (MD5_HUB_PART) LNK_PART_SUPPLIER_HUB_SUPPLIER_FK (MD5_HUB_SUPPLIER)
22.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved • Think Type 2 SCD (Slowly Changing Dimensions) • Old Way: • Compare column by column • Source value != Current value in DW table • 20 columns, then 20 compares • New Way: • Concatenate all columns to one string • Convert to one char(32) string with hash function • Compare to hashed value (HASH_DIFF) in target table • Does not matter how many columns MD5-Based Change Detection
23.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved • Use the Hub/Link PK columns • Filter on LEAD of LOAD_DTS Easily Getting Current Rows
24.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved Example – Virtual Expire Date! CASE WHEN LEAD(stg.LOAD_DTS) OVER (PARTITION BY stg.CDC_KEY ORDER BY stg.LOAD_DTS) IS NULL THEN 'Y' ELSE 'N' END CURR_FLG, LEAD(stg.LOAD_DTS) OVER (PARTITION BY stg.CDC_KEY ORDER BY stg.LOAD_DTS) EXPR_DTS
25.
© 2018 Snowflake
Computing Inc. All Rights Reserved Foundational Keys
26.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved Data Vault Model Flexibility (Agility) Based on natural business keys Highly normalized › Hubs and Links only hold keys and meta data › Satellites split by rate of change and/or source Enables Agile data modeling › Easy to add to model without having to change existing structures and load routines • Relationships (links) can be dropped and created on-demand. › No more reloading history because of a missed requirement Not system surrogate keys Allows for integrating data across functions and source systems more easily › All data relationships are key driven Highly normalized › Hubs and Links only hold keys and meta data › Satellites split by rate of change and/or source Enables Agile data modeling › Easy to add to model without having to change existing structures and load routines • Relationships (links) can be dropped and created on-demand. › No more reloading history because of a missed requirement Not system surrogate keys Allows for integrating data across functions and source systems more easily › All data relationships are key driven Goes beyond standard 3NF
27.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved Data Vault Agility Adding new components to the EDW has NEAR ZERO impact to existing: Ø Loading Processes Ø Data Model Ø Reports & BI Functions Ø Downstream Systems Ø Star Schemas or Data Marts
28.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved Perhaps You Wish to Split for Security Reasons? From This DV Physical Partitioning Single Snowflake Database DV “Logical” Partitioning In Snowflake – might split for security reasons To THIS! DV “Physical” Partitioning Non-sensitive data PII or sensitive data with a Link to non-sensitive data Snowflake DB 1 Snowflake DB 2
29.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved • Standardized modeling rules • Highly repeatable and learnable modeling technique • Can standardize load routines • Delta Driven process • Re-startable, consistent loading patterns. • Load multiple objects in parallel! • Can standardize extract routines • Rapid build of new or revised Data Marts • Can be automated (e.g. WhereScape) • Can use a BI-meta layer to virtualize the reporting structures • Example: Looker using LookML semantic layer • Example: BOBJ Universe Business Layer • Can put views on the DV structures as well • Simulate ODS/3NF or Star Schemas Productivity
30.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved Productivity - Loading Deterministic Keys Non-Deterministic Keys Less Scheduling
31.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved • Modeling the EDW as a DV forces integration of the Business Keys upfront • Good for organizational alignment • An integrated data set with raw data extends it’s value beyond BI: • Source for data quality projects • Source for master data • Source for data mining • Source for Data as a Service (DaaS) in an SOA (Service Oriented Architecture) Other Benefits of a Data Vault
32.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved • Upfront Hub integration simplifies the data integration routines required to load data marts • Helps divide the work a bit • Much easier to implement security on these granular pieces • Granular, re-startable processes enable pin-point failure correction • Designed and optimized for real-time loading in its core architecture (without any tweaks or mods) Other Benefits of Data Vault
33.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved • University of Texas, MD Anderson Cancer Center • Denver Public Schools • Micron • Independent Purchasing Cooperative (IPC, Miami) • Kaplan • US Defense Department • Colorado Springs Utilities • State Court of Wyoming • Federal Express • US Dept. of Agriculture Organizations Using Data Vault
34.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved Ø Aptus Health Ø ResearchNow Ø F+W Media Ø Sainsbury’s Snowflake Customers using Data Vault
35.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved • Several Snowflake Partners offer Data Vault classes • ScaleFree - in EMEA • PerformanceG2 – in USA • Empowered Holdings (Dan Linstedt) – globally • Talk to you Snowflake account rep for contact information Data Vault Training & Certification
36.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. The Experts Say… “The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework.” Bill Inmon “The Data Vault is foundationally strong and exceptionally scalable architecture.” Stephen Brobst “The Data Vault is a technique which some industry experts have predicted may spark a revolution as the next big thing in data modeling for enterprise warehousing....” Doug Laney
37.
© 2018 Snowflake
Computing Inc. All Rights Reserved References
38.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. › Available on Amazon: http://www.amazon.com/Better- Data-Modeling-Introduction- Engineering-ebook /dp/ B018BREV1C/ Intro ebook by Kent Graziano:
39.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. › Available on Amazon.com › Soft Cover or Kindle Format › Now also available in PDF at LearnDataVault.com › Kent was the Technical Editor Super Charge Your Data Warehouse
40.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. › Available on Amazon: http://www.amazon.com/Buildin g-Scalable-Data-Warehouse- Vault/dp/0128025107/ New DV 2.0 Book from Dan Linstedt
41.
© 2018 Snowflake
Computing Inc. All Rights Reserved.. Kent Graziano Snowflake Computing Kent.graziano@snowflake.com On Twitter @KentGraziano More info at http://snowflake.com Visit my blog at http://kentgraziano.com Contact Information
42.
© 2018 Snowflake
Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. THANK YOU © 2018 Snowflake Computing Inc. All Rights Reserved