SlideShare une entreprise Scribd logo
1  sur  61
INTRODUCTION TO
DATA VAULT 2.0 ON
SNOWFLAKE
KENT GRAZIANO, CHIEF TECHNICAL EVANGELIST
KentGraziano
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
© 2021 Snowflake Inc. All Rights Reserved
› Chief Technical Evangelist, Snowflake Inc
› Blogger: The Data Warrior
› Certified Data Vault Master and DV 2.0 Practitioner
› Oracle ACE Director (Alumni)
› OakTable Member
› Member – DAMA Houston & DAMA International
› Data Modeling, Data Architecture and Data Warehouse
Specialist
› 30+ years in IT
› 25+ years of Oracle-related work
› 20+ years of data warehousing experience
› Former-Member: Boulder BI Brain Trust
(http://www.boulderbibraintrust.org/)
› Author & Co-Author of a bunch of books
› Past-President of ODTUG and Rocky Mountain Oracle User
Group
MY BIO
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
© 2021 Snowflake Inc. All Rights Reserved
AGENDA
• Why talk about Data Vault?
• What is a Data Vault?
• Data Vault 2.0 modeling basics
• Applying Snowflake Features
• Virtual Information Marts
• Reference Architecture
• Critical Success Factors
• Customer Examples
• References
© 2021 Snowflake Inc. All Rights Reserved
WHY DATA VAULT?
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
© 2021 Snowflake Inc. All Rights Reserved
SOLUTIONS
CUSTOMERS LOOK FOR
Future Proof
supporting complex scenarios
scalable
extendable / agile
Controlled
governed
transparent
Modern
attractive for talent
latest & greatest
Easy to engineer
pattern based
quick time to value
automated
Greenfield
Migration
Immediate
pain
Centralize
and
democratize
New Business
Opportunities
Analytics 2.0
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
© 2021 Snowflake Inc. All Rights Reserved
CUSTOMERS ASKING
FOR DATA VAULT
We are new to DW.
What is the best
pattern to use for
my cloud data
project?
We did DW before.
Should we try
Data Vault? We did Data Vault
before. What are the
best practices on
Snowflake?
300+ DV customers
© 2021 Snowflake Inc. All Rights Reserved
WHAT IS DATA VAULT?
© 2021 Snowflake Inc. All Rights Reserved
ORIGINAL DATA VAULT DEFINITION
The Data Vault is a detail oriented, historical tracking and uniquely linked
set of normalized tables that support one or more functional areas of
business.
It is a hybrid approach encompassing the best of breed between 3rd normal
form (3NF) and star schema. The design is flexible, scalable, consistent
and adaptable to the needs of the enterprise.
Architected specifically to meet the needs of today’s
enterprise data warehouses
Dan Linstedt: Defining the Data Vault
© 2020 Empowered Holdings & Dan Linstedt. All Rights Reserved
WHAT IS DATA VAULT 2.0?
What is Data Vault?
A System of Business
Intelligence containing the
necessary components needed to
accomplish enterprise vision in
Data Warehousing and Information
Delivery.
Foundational Pillars of Data Vault 2.0
Far more than just a
data modeling technique,
it is a complete
methodology.
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
© 2021 Snowflake Inc. All Rights Reserved
• What are our other Enterprise Data
Warehouse Options?
• Third-Normal Form (3NF): Complex
primary keys (PK’s) with cascading
snapshot
• Star Schema (Dimensional): Difficult
to reengineer fact tables for
granularity changes
• Difficult to get it right the first time
• Not adaptable to rapid business
change
• NOT AGILE!
WHAT IS DATA VAULT TRYING TO
SOLVE?
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
© 2021 Snowflake Inc. All Rights Reserved
• The work on the Data Vault approach began in the early 1990s;
completed around 1999.
• Throughout 1999, 2000, and 2001, the Data Vault design was
tested, refined, and deployed into specific customer sites.
• In 2002, the industry thought leaders were asked to review the
architecture.
• Kent meets Dan at a Lunch & Learn in Denver!
• In 2003, Dan began teaching the modeling techniques to the mass
public.
• Kent & Team take their 1st Data Vault Modeling class
• In 2014, Dan introduced DV 2.0!
DATA VAULT EVOLUTION
© 2021 Snowflake Inc. All Rights Reserved
WHAT DOES THE DATA VAULT
MODEL LOOK LIKE?
© 2021 Snowflake Inc. All Rights Reserved
DATA VAULT – 3 SIMPLE
STRUCTURES
HUB
LINK
SATELLITE
EDW
Data Vault
1
2
3
© 2021 Snowflake Inc. All Rights Reserved
DATA VAULT CORE
ARCHITECTURE
HUBS
-------------------
Unique List of
Business Keys
LINKS
-------------------
Unique List of
Relationships
across Keys
SATS
-------------------
Descriptive
Data
● Satellites have one and only one parent table
● Satellites cannot be parents to other tables
● Hubs cannot be child tables
© 2020 Empowered Holdings & Dan Linstedt. All Rights Reserved
• WHAT DOES A DATA VAULT MODEL LOOK LIKE?
© 2020 Empowered Holdings & Dan Linstedt. All Rights Reserved
• WHAT DOES A DATA VAULT MODEL LOOK LIKE?
© 2020 Empowered Holdings & Dan Linstedt. All Rights Reserved
• WHAT DOES A DATA VAULT MODEL LOOK LIKE?
© 2020 Empowered Holdings & Dan Linstedt. All Rights Reserved
• WHAT DOES A DATA VAULT MODEL LOOK LIKE?
© 2020 Empowered Holdings & Dan Linstedt. All Rights Reserved
• WHAT DOES A DATA VAULT MODEL LOOK LIKE?
© 2021 Snowflake Inc. All Rights Reserved
1. HUB = BUSINESS KEYS
Hubs = Unique Lists of Business Keys
Business Keys are used to TRACK and IDENTIFY key information
DV 2.0 uses MD5 Hash of the BK for the PK
Natural Business Keys also acceptable for PK
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
© 2021 Snowflake Inc. All Rights Reserved
2: LINKS = ASSOCIATONS
Links = Transactions and Associations
They are used to hook together multiple sets of information
In DV 2.0 the BK attributes may migrate to the Links for faster query
© 2021 Snowflake Inc. All Rights Reserved
Tomorrow With DV
No need to change
EDW structure
The business rule can
change to a 1:M
Relationship is a 1:1
so
why model a Link?
MODELING LINKS - 1:1 or 1:M?
Today
You discover new
data later
Existing data is fine
New data is added
© 2021 Snowflake Inc. All Rights Reserved
3. SATELLITES = DESCRIPTORS
Satellites provide context for the Hubs and the Links
Tracks changes over time - Like SCD 2
In DV 2.0 use HASH_DIFF to detect changes
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
© 2021 Snowflake Inc. All Rights Reserved
• Think Type 2 SCD (Slowly Changing Dimensions)
• Old Way:
• Compare column by column
• Source value != Current value in DW table
• 20 columns, then 20 compares
• New Way:
• Concatenate all columns to one string
• Convert to one char(32) string with hash function
• Compare to hashed value (HASH_DIFF) in target table
• Does not matter how many columns
MD5-BASED CHANGE DETECTION
© 2021 Snowflake Inc. All Rights Reserved
SNOWFLAKE
FEATURES
© 2021 Snowflake Inc. All Rights Reserved
SEPARATION OF WORKLOADS
TO SUPPORT DATA VAULT
LOADING
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
© 2021 Snowflake Inc. All Rights Reserved
Snowflake solution
• Assign each workload its own Virtual WH
• At a minimum, two WH: ETL/ELT and BI
• ETL can run continuously if it makes sense
for the business
• ETL / Business workload contention is
eliminated
• Further subdivide business or ETL
workloads into own clusters as needed
• Eliminate contention between Sales,
Marketing, Finance, Data Science
workloads
• Easy to add new workloads on demand!
SEPARATION OF
WORKLOADS
Virtual
Warehouse
Databases
Virtual
Warehouse
ETL & Data Loading
Business Workloads
Finance
Virtual
Warehouse
Test/Dev
Virtual
Warehouse
S Marketing
Virtual
Warehouse
Sales
Virtual
Warehouse
S
Research
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
© 2021 Snowflake Inc. All Rights Reserved
OTHER AGILE WAREHOUSE
SCALING TECHNIQUES
Increase cluster size – on the fly!
• More data being analyzed
• More complex queries
• Get some concurrency boost
• No need to start, stop, or reboot!
Automatic scale out!
• Multi-cluster warehouse
concurrency
• Workload queries have usual weight
• But more of them, e.g. 20 dashboard
users rather than the usual 5
• Automatically scales back after
the load drops
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
© 2021 Snowflake Inc. All Rights Reserved
DATA VAULT LOADING PATTERN
(BUSINESS / HASHED KEYS)
Processing:
• All processes within sets done in parallel
• Process sets ”wait” for previous set completion
• Sets run as soon as data is ready
• No other “waiting” needed
• Load dependencies maximally reduced
Vault Loads
Stage Loads
- Major Synchronization Points
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
© 2021 Snowflake Inc. All Rights Reserved
APPLYING VIRTUAL WAREHOUSES TO DATA VAULT
Source to Stage Loads
•1 multi-cluster warehouse
Use as separate MCWH for each Data
Vault entity type
•Maximum parallelism
•Zero contention
•Maximum flexibility
•Each MCWH can be
independently sized
© 2021 Snowflake Inc. All Rights Reserved
SEMI STRUCTURED
DATA & DATA VAULT
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
© 2021 Snowflake Inc. All Rights Reserved
JSON SUPPORT WITH SQL – VARIANT
Apple 101.12 250 FIH-2316
Pear 56.22 202 IHO-6912
Orange 98.21 600 WHQ-6090
Structured data
(e.g. CSV)
Semi-structured data
(e.g. JSON, Avro, XML)
{ "firstName": "John",
"lastName": "Smith",
"height_cm": 167.64,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021-3100"
},
"phoneNumbers": [
{ "type": "home", "number": "212 555-1234" },
{ "type": "office", "number": "646 555-4567" }
]
}
Optimized storage
Flexible schema - Native
Relational processing
select v:lastName::string as last_name
from json_demo;
All Your Data!
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
© 2021 Snowflake Inc. All Rights Reserved
Create Sats using a VARIANT
• Hub Key
• Load DTS
• Rec Source
• VARIANT
Load JSON doc into VARIANT column
Create a View on the Sat for Business Vault
• View presents only the JSON attributes needed for current use case
Benefits
• No change to Sat definitions if “schema” changes
• Business Vault views are unaffected
• Change or add new BV views over time as needed
COMBINE DATA VAULT & SCHEMA-IN-READ
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
BUILDING THE BIZ VAULT VIEW
CREATE OR REPLACE VIEW bizvault.vw_sat_account_type1 (
hub_account_hkey,
load_dts,
name,
account_type_code,
account_type,
account_type_description,
fund_bill_addr_code,
fund_bill_addr_name,
fund_bill_addr_description,
keywords,
kids,
addresses,
emails,
phones,
tags,
user_defined_fields,
attributes COMMENT '{
Accounts: "accountname INT, accountstatusid INT, accounttype INT, affiliations INT, alertids INT, announcement INT, authinfo
INT, bankaccounts INT, creditcards INT, dbid INT, extids INT, fndbilladdrid INT, fndmailaddrid INT, fndprimaryrepid BIGINT, fndrepresentatives INT,
genprimaryrepid BIGINT, genrepresentatives INT, householdincomeid INT, id INT, imageid STRING, keywords INT, kids INT, magstripeids INT,
mailingfl INT, onlinelockfl INT, orgaddresses INT, orgcomments INT, orgemails INT, orgindustryid INT, orgname INT, orgphones INT, orgtype INT,
orgurl INT, pin INT, preferredcontactmethod INT, preferredemail INT, preferredlanguageid INT, preferredphoneid INT, primarycontactid STRING,
profileids INT, salutations INT, sourceid INT, tags INT, tktbilladdrid INT, tktmailaddrid INT, tktprimaryrepid BIGINT, tktrepresentatives INT,
userdefinedfields INT, z_createduserid BIGINT, z_createduserip INT, z_createdusername INT, z_createdusersession INT, z_createdusertimestamp
TIMESTAMP, z_lastupdateduserid BIGINT, z_lastupdateduserip INT, z_lastupdatedusername INT, z_lastupdatedusersession INT,
z_lastupdatedusertimestamp TIMESTAMP, z_originationtimestamp TIMESTAMP, z_originationusername INT, z_status INT, z_tag INT, z_version
BIGINT",
Codes: "name STRING, description STRING"}'
)
COMMENT = 'Contains the latest active record for each key'
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
© 2021 Snowflake Inc. All Rights Reserved
AS
WITH sat_cte AS (
SELECT
hub_account_hkey,
load_dts,
is_deleted,
attributes, -- This is the VARIANT
LEAD(load_dts) OVER (PARTITION BY hub_account_hkey
ORDER BY load_dts, load_seq) AS lead_dts
FROM
rawvault.sat_account
)
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
© 2021 Snowflake Inc. All Rights Reserved
SELECT
sat.hub_account_hkey,
sat.load_dts,
sat.attributes:accountname::STRING,
sat.attributes:accounttype::STRING,
ref_account_type.attributes:name:en_US::STRING,
ref_account_type.attributes:description:en_US::STRING,
sat.attributes:fndbilladdrid::STRING,
ref_fund_bill_address.attributes:name:en_US::STRING,
ref_fund_bill_address.attributes:description:en_US::STRING,
sat.attributes:keywords::ARRAY,
sat.attributes:kids::ARRAY,
sat.attributes:orgaddresses::ARRAY,
sat.attributes:orgemails::ARRAY,
sat.attributes:orgphones::ARRAY,
sat.attributes:tags::ARRAY,
sat.attributes:userdefinedfields::ARRAY,
OBJECT_DELETE(sat.attributes,'accountname','accounttype','keywords','kids','orgaddresses’,
'orgemails','orgphones','tags','userdefinedfields’) as Attributes
FROM
sat_cte sat
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
© 2021 Snowflake Inc. All Rights Reserved
LEFT JOIN
vw_ref_code_type1 ref_account_type ON
sat.attributes:accounttype::STRING = ref_account_type.code AND
ref_account_type.organization_id = '-1' AND -- Global code
ref_account_type.type = 'P' AND
ref_account_type.subtype = 'ACCT_TYPE'
LEFT JOIN
vw_ref_code_type1 ref_fund_bill_address ON
sat.attributes:fndbilladdrid::STRING = ref_fund_bill_address.code AND
sat.attributes:dbid::STRING = ref_fund_bill_address.organization_id AND
ref_fund_bill_address.type = 'P' AND
ref_fund_bill_address.subtype = 'ADDRESS'
WHERE
lead_dts IS NULL AND
is_deleted = FALSE;
© 2021 Snowflake Inc. All Rights Reserved
“We compared before and after numbers and
found that with Snowflake there was a 90
percent reduction in the code used for the
ETL process. That’s a big win for us.”
- Paciolan’s Principal Software Engineer,
Ashkan Khoshcheshmi,
© 2021 Snowflake Inc. All Rights Reserved
OTHER FEATURES
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
© 2021 Snowflake Inc. All Rights Reserved
Allows loading several tables from one
source in one statement
USING MULTI-TABLE INSERT
INSERT ALL
WHEN (SELECT COUNT(*)
FROM dv_hub_country hc
WHERE hc.hub_country_key= stg.hash_key) = 0
THEN
INTO dv.hub_country
(hub_country_key, country_abbrv, hub_load_dts,
hub_rec_src)
VALUES (stg.hash_key, stg.country_abbrv, stg.load_dts,
stg.rec_src)
WHEN (SELECT COUNT(*)
FROM dv.sat_countries sc
WHERE sc.hub_country_key = stg.hash_key AND
sc.hash_diff = MD5(country_name)) = 0
THEN
INTO dv.sat_countries (hub_country_key, sat_load_dts,
hash_diff, sat_rec_src, country_name)
VALUES (stg.hash_key, stg.load_dts,
stg.hash_diff, stg.rec_src, stg.country_name)
SELECT MD5(country_abbrv) AS hash_key,
country_abbv,
country_name,
MD5(country_name) AS hash_diff,
CURRENT_TIMESTAMP AS load_dts,
stage_rec_src AS rec_src
FROM stage.country stg;
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
© 2021 Snowflake Inc. All Rights Reserved
PERHAPS YOU WISH TO SPLIT FOR SECURITY REASONS?
From This
Schema layout
Single Snowflake Database
In Snowflake –
you can easily
split for security
reasons
To THIS!
DV Logical Partitioning
Non-sensitive
data
PII or sensitive data with
a Link to non-sensitive
data
Snowflake DB 1 Snowflake DB 2
© 2021 Snowflake Inc. All Rights Reserved
VIRTUALIZING
INFORMATION MARTS
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
© 2021 Snowflake Inc. All Rights Reserved
• Use the Hub/Link PK columns
• Filter on LEAD of LOAD_DTS - imbedded in a CTE
AS
WITH sat_cte AS (
SELECT
hub_account_hkey,
load_dts,
is_deleted,
attributes,
LEAD(load_dts) OVER
(PARTITION BY hub_account_hkey ORDER BY load_dts) AS expire_dts
FROM
rawvault.sat_account
)
TIP - EASILY GETTING CURRENT ROWS FROM A SAT
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
© 2021 Snowflake Inc. All Rights Reserved
EXAMPLE – VIRTUAL CURRENT FLAG & EXPIRE DATE!
CASE
WHEN LEAD(stg.LOAD_DTS)
OVER (PARTITION BY stg.CDC_KEY
ORDER BY stg.LOAD_DTS) IS NULL
THEN 'Y'
ELSE 'N'
END CURR_FLG,
LEAD(stg.LOAD_DTS) OVER (PARTITION BY
stg.CDC_KEY ORDER BY stg.LOAD_DTS) EXPR_DTS
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
© 2021 Snowflake Inc. All Rights Reserved
• Yes!
• Elastic and multi-cluster warehouses provide the power
• Dynamic optimization engine provides the performance
• Best practice
• Start with views!
• Find the right size VWH to support desired SLA
• How fast is fast enough?
• If still too slow THEN use traditional tables
• Allows agile development approach
• Fail fast!
WILL VIEWS PERFORM?
© 2021 Snowflake Inc. All Rights Reserved
“In the past, our data vault had performance
limitations, since complex joins happened
on the same resource where data was
stored. With the separation of storage and
compute in Snowflake, that limitation goes
away.”
- GRAEME FORBES, Principal Data Engineer,
Sainsbury’s
© 2021 Snowflake Inc. All Rights Reserved
SIMPLE REFERENCE
ARCHITECTURE
© 2021 Snowflake Inc. All Rights Reserved
RAW Data Vault
Acquisition Business Data Vault Information Delivery
Snowflake Data Cloud
Data
Consumers
Data Lifecycle
Data Replication
Tools
Data Lake
(complement)
Bulk upload
Stream
Data Marketplace
User workspaces
MULTI-TIER
DATA VAULT ARCHITECTURE
Ad-hoc
Analysis
BI & Self-Service
Reporting
Data
Share
Data Science
Databases
Data
Providers
Bulk
File Upload
Data Lake
Apps & SaaS
Streaming data
Data Share
Customer managed
Snowflake managed
Information Marts
(physical or virtual)
Loading & Staging
Data
Share
Snowpipe, Kafka
External
Tables
Micro-batch
Direct Data Discovery
Governance, Security, DevOps
RAW Data
Structured
Semi-Structured
Unstructured
Data Share
Bridge, PIT tables
(optional)
derived DV objects
soft business rules
(physical or virtual)
RAW DV objects
hard business rules
+
© 2021 Snowflake Inc. All Rights Reserved
Information Delivery
RAW Data Vault Business Data Vault
Data Lifecycle
DATA VAULT
VIRTUALIZATION
Loading & Staging
Data
Share
Snowpipe, Kafka
External
Tables
Micro-batch
...
Staging
Table
HUB
LINK
SAT
RAW Data
Structured
Semi-Structured
Unstructured
Data Share
Snowflake Data Cloud
Governance, Security, DevOps
Direct Data Discovery
New derived
relationship
...
View derivation
SAT
LINK
...
Any other delivery model
Optionally Materialized
XS
6XL
+
+
Bridge, PIT tables
optionally materialized
XS
6XL
business view
raw view
live raw data marts
live information marts
Data
Consumers
Ad-hoc
Analysis
BI & Self-Service
Reporting
Data
Share
Data Science
© 2021 Snowflake Inc. All Rights Reserved
CRITICAL SUCCESS
FACTORS
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
© 2021 Snowflake Inc. All Rights Reserved
IMPORTANT
SUCCESS COMPONENTS
All data
All workloads
At any scale
Simple
Training
Community
Consulting
Right Data Platform Automation
zz
Time to Value
Skills
200x
Certified!
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
© 2021 Snowflake Inc. All Rights Reserved
• Several Snowflake Partners offer Data Vault classes
• Empowered Holdings (Dan Linstedt) – globally
• ScaleFree - in EMEA
• PerformanceG2 – in USA
• Certus – IN APAC
• Talk to your Snowflake account manager for contact information
• Join DataVaultAlliance.com to keep up and ask questions
DATA VAULT TRAINING & CERTIFICATION
© 2021 Snowflake Inc. All Rights Reserved
CUSTOMER EXAMPLES
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
• Picnic (Amsterdam)
• Happy Money (US)
• Alpha Cruncher (EMEA)
• NYU (US)
• Sunoco (US)
• Emerald Performance
Materials (US)
• Finnair (Finland)
• Finnavia (Finland)
• NIB (Australia)
• NPS Medicinewise (Australia)
SNOWFLAKE CUSTOMERS USING DATA VAULT
• Scripps Health (US)
• PepsiCo (US)
• Hampshire Trust Bank (US)
• Paciolan (US)
• Micron (US)
• AXA (Germany)
• ResearchNow (US)
• JC Penny (US)
• ROLAND Rechtsschutz-
Versicherungs-AG (Germany)
• F+W Media (US)
• Sainsbury’s (London)
• Amica Insurance (US)
© 2021 Snowflake Inc. All Rights Reserved
References
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
› Available on Amazon:
http://www.amazon.com/Better-
Data-Modeling-Introduction-
Engineering-ebook /dp/
B018BREV1C/
Intro ebook by
Kent Graziano:
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
› Available on Amazon.com
› Soft Cover or Kindle Format
› Now also available in PDF at
LearnDataVault.com
› Kent was the Technical Editor
Super Charge
Your Data Warehouse
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
› Available on Amazon:
http://www.amazon.com/Buildin
g-Scalable-Data-Warehouse-
Vault/dp/0128025107/
New DV 2.0 Book
from Dan Linstedt
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
© 2021 Snowflake Inc. All Rights Reserved
THANK YOU

Contenu connexe

Tendances

Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
 
Introduction to Data Vault Modeling
Introduction to Data Vault ModelingIntroduction to Data Vault Modeling
Introduction to Data Vault ModelingKent Graziano
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsDATAVERSITY
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for DinnerKent Graziano
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data MeshLibbySchulze
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Empowered Holdings, LLC
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & DeltaDatabricks
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 

Tendances (20)

Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Introduction to Data Vault Modeling
Introduction to Data Vault ModelingIntroduction to Data Vault Modeling
Introduction to Data Vault Modeling
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Data Mesh 101
Data Mesh 101Data Mesh 101
Data Mesh 101
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 

Similaire à Introduction to Data Vault 2.0 on Snowflake

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Kent Graziano
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWKent Graziano
 
Snowflake’s Cloud Data Platform and Modern Analytics
Snowflake’s Cloud Data Platform and Modern AnalyticsSnowflake’s Cloud Data Platform and Modern Analytics
Snowflake’s Cloud Data Platform and Modern AnalyticsSenturus
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingKent Graziano
 
Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeDelivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeKent Graziano
 
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesLogical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesDenodo
 
Scaling db infra_pay_pal
Scaling db infra_pay_palScaling db infra_pay_pal
Scaling db infra_pay_palpramod garre
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Kent Graziano
 
Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceHarald Erb
 
Balance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudBalance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudKent Graziano
 
Building the Modern Data Hub
Building the Modern Data HubBuilding the Modern Data Hub
Building the Modern Data HubDatavail
 
Idera live 2021: Will Data Vault add Value to Your Data Warehouse? 3 Signs th...
Idera live 2021: Will Data Vault add Value to Your Data Warehouse? 3 Signs th...Idera live 2021: Will Data Vault add Value to Your Data Warehouse? 3 Signs th...
Idera live 2021: Will Data Vault add Value to Your Data Warehouse? 3 Signs th...IDERA Software
 
Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)Kent Graziano
 
Idera live 2021: Managing Databases in the Cloud - the First Step, a Succes...
Idera live 2021:   Managing Databases in the Cloud - the First Step, a Succes...Idera live 2021:   Managing Databases in the Cloud - the First Step, a Succes...
Idera live 2021: Managing Databases in the Cloud - the First Step, a Succes...IDERA Software
 
Kent-Graziano-Intro-to-Datavault_short.pdf
Kent-Graziano-Intro-to-Datavault_short.pdfKent-Graziano-Intro-to-Datavault_short.pdf
Kent-Graziano-Intro-to-Datavault_short.pdfabhaybansal43
 
Oracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaOracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaMarketingArrowECS_CZ
 
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceSnowflake Computing
 
Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
Building the Modern Data Hub: Beyond the Traditional Enterprise Data WarehouseBuilding the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
Building the Modern Data Hub: Beyond the Traditional Enterprise Data WarehouseFormant
 
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?MarketingArrowECS_CZ
 

Similaire à Introduction to Data Vault 2.0 on Snowflake (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFW
 
Snowflake’s Cloud Data Platform and Modern Analytics
Snowflake’s Cloud Data Platform and Modern AnalyticsSnowflake’s Cloud Data Platform and Modern Analytics
Snowflake’s Cloud Data Platform and Modern Analytics
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
 
Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeDelivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with Snowflake
 
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesLogical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business Outcomes
 
Scaling db infra_pay_pal
Scaling db infra_pay_palScaling db infra_pay_pal
Scaling db infra_pay_pal
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
 
Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data Science
 
Balance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudBalance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data Cloud
 
Building the Modern Data Hub
Building the Modern Data HubBuilding the Modern Data Hub
Building the Modern Data Hub
 
Idera live 2021: Will Data Vault add Value to Your Data Warehouse? 3 Signs th...
Idera live 2021: Will Data Vault add Value to Your Data Warehouse? 3 Signs th...Idera live 2021: Will Data Vault add Value to Your Data Warehouse? 3 Signs th...
Idera live 2021: Will Data Vault add Value to Your Data Warehouse? 3 Signs th...
 
Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)
 
Idera live 2021: Managing Databases in the Cloud - the First Step, a Succes...
Idera live 2021:   Managing Databases in the Cloud - the First Step, a Succes...Idera live 2021:   Managing Databases in the Cloud - the First Step, a Succes...
Idera live 2021: Managing Databases in the Cloud - the First Step, a Succes...
 
Kent-Graziano-Intro-to-Datavault_short.pdf
Kent-Graziano-Intro-to-Datavault_short.pdfKent-Graziano-Intro-to-Datavault_short.pdf
Kent-Graziano-Intro-to-Datavault_short.pdf
 
Oracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaOracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management Platforma
 
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a Service
 
Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
Building the Modern Data Hub: Beyond the Traditional Enterprise Data WarehouseBuilding the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
 
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
 

Plus de Kent Graziano

HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...Kent Graziano
 
Rise of the Data Cloud
Rise of the Data CloudRise of the Data Cloud
Rise of the Data CloudKent Graziano
 
Making Sense of Schema on Read
Making Sense of Schema on ReadMaking Sense of Schema on Read
Making Sense of Schema on ReadKent Graziano
 
Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Extreme BI: Creating Virtualized Hybrid Type 1+2 DimensionsExtreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Extreme BI: Creating Virtualized Hybrid Type 1+2 DimensionsKent Graziano
 
Agile Data Warehousing: Using SDDM to Build a Virtualized ODS
Agile Data Warehousing: Using SDDM to Build a Virtualized ODSAgile Data Warehousing: Using SDDM to Build a Virtualized ODS
Agile Data Warehousing: Using SDDM to Build a Virtualized ODSKent Graziano
 
Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)Kent Graziano
 
Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016Kent Graziano
 
Worst Practices in Data Warehouse Design
Worst Practices in Data Warehouse DesignWorst Practices in Data Warehouse Design
Worst Practices in Data Warehouse DesignKent Graziano
 
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureData Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureKent Graziano
 
Agile Methods and Data Warehousing
Agile Methods and Data WarehousingAgile Methods and Data Warehousing
Agile Methods and Data WarehousingKent Graziano
 
Top Five Cool Features in Oracle SQL Developer Data Modeler
Top Five Cool Features in Oracle SQL Developer Data ModelerTop Five Cool Features in Oracle SQL Developer Data Modeler
Top Five Cool Features in Oracle SQL Developer Data ModelerKent Graziano
 
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault ModelingKent Graziano
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachKent Graziano
 

Plus de Kent Graziano (13)

HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
 
Rise of the Data Cloud
Rise of the Data CloudRise of the Data Cloud
Rise of the Data Cloud
 
Making Sense of Schema on Read
Making Sense of Schema on ReadMaking Sense of Schema on Read
Making Sense of Schema on Read
 
Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Extreme BI: Creating Virtualized Hybrid Type 1+2 DimensionsExtreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
 
Agile Data Warehousing: Using SDDM to Build a Virtualized ODS
Agile Data Warehousing: Using SDDM to Build a Virtualized ODSAgile Data Warehousing: Using SDDM to Build a Virtualized ODS
Agile Data Warehousing: Using SDDM to Build a Virtualized ODS
 
Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)
 
Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016
 
Worst Practices in Data Warehouse Design
Worst Practices in Data Warehouse DesignWorst Practices in Data Warehouse Design
Worst Practices in Data Warehouse Design
 
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureData Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
 
Agile Methods and Data Warehousing
Agile Methods and Data WarehousingAgile Methods and Data Warehousing
Agile Methods and Data Warehousing
 
Top Five Cool Features in Oracle SQL Developer Data Modeler
Top Five Cool Features in Oracle SQL Developer Data ModelerTop Five Cool Features in Oracle SQL Developer Data Modeler
Top Five Cool Features in Oracle SQL Developer Data Modeler
 
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
 

Dernier

INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 

Dernier (20)

INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 

Introduction to Data Vault 2.0 on Snowflake

  • 1. INTRODUCTION TO DATA VAULT 2.0 ON SNOWFLAKE KENT GRAZIANO, CHIEF TECHNICAL EVANGELIST KentGraziano
  • 2. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2021 Snowflake Inc. All Rights Reserved › Chief Technical Evangelist, Snowflake Inc › Blogger: The Data Warrior › Certified Data Vault Master and DV 2.0 Practitioner › Oracle ACE Director (Alumni) › OakTable Member › Member – DAMA Houston & DAMA International › Data Modeling, Data Architecture and Data Warehouse Specialist › 30+ years in IT › 25+ years of Oracle-related work › 20+ years of data warehousing experience › Former-Member: Boulder BI Brain Trust (http://www.boulderbibraintrust.org/) › Author & Co-Author of a bunch of books › Past-President of ODTUG and Rocky Mountain Oracle User Group MY BIO
  • 3. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2021 Snowflake Inc. All Rights Reserved AGENDA • Why talk about Data Vault? • What is a Data Vault? • Data Vault 2.0 modeling basics • Applying Snowflake Features • Virtual Information Marts • Reference Architecture • Critical Success Factors • Customer Examples • References
  • 4. © 2021 Snowflake Inc. All Rights Reserved WHY DATA VAULT?
  • 5. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2021 Snowflake Inc. All Rights Reserved SOLUTIONS CUSTOMERS LOOK FOR Future Proof supporting complex scenarios scalable extendable / agile Controlled governed transparent Modern attractive for talent latest & greatest Easy to engineer pattern based quick time to value automated Greenfield Migration Immediate pain Centralize and democratize New Business Opportunities Analytics 2.0
  • 6. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2021 Snowflake Inc. All Rights Reserved CUSTOMERS ASKING FOR DATA VAULT We are new to DW. What is the best pattern to use for my cloud data project? We did DW before. Should we try Data Vault? We did Data Vault before. What are the best practices on Snowflake? 300+ DV customers
  • 7. © 2021 Snowflake Inc. All Rights Reserved WHAT IS DATA VAULT?
  • 8. © 2021 Snowflake Inc. All Rights Reserved ORIGINAL DATA VAULT DEFINITION The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise. Architected specifically to meet the needs of today’s enterprise data warehouses Dan Linstedt: Defining the Data Vault
  • 9. © 2020 Empowered Holdings & Dan Linstedt. All Rights Reserved WHAT IS DATA VAULT 2.0? What is Data Vault? A System of Business Intelligence containing the necessary components needed to accomplish enterprise vision in Data Warehousing and Information Delivery. Foundational Pillars of Data Vault 2.0 Far more than just a data modeling technique, it is a complete methodology.
  • 10. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2021 Snowflake Inc. All Rights Reserved • What are our other Enterprise Data Warehouse Options? • Third-Normal Form (3NF): Complex primary keys (PK’s) with cascading snapshot • Star Schema (Dimensional): Difficult to reengineer fact tables for granularity changes • Difficult to get it right the first time • Not adaptable to rapid business change • NOT AGILE! WHAT IS DATA VAULT TRYING TO SOLVE?
  • 11. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2021 Snowflake Inc. All Rights Reserved • The work on the Data Vault approach began in the early 1990s; completed around 1999. • Throughout 1999, 2000, and 2001, the Data Vault design was tested, refined, and deployed into specific customer sites. • In 2002, the industry thought leaders were asked to review the architecture. • Kent meets Dan at a Lunch & Learn in Denver! • In 2003, Dan began teaching the modeling techniques to the mass public. • Kent & Team take their 1st Data Vault Modeling class • In 2014, Dan introduced DV 2.0! DATA VAULT EVOLUTION
  • 12. © 2021 Snowflake Inc. All Rights Reserved WHAT DOES THE DATA VAULT MODEL LOOK LIKE?
  • 13. © 2021 Snowflake Inc. All Rights Reserved DATA VAULT – 3 SIMPLE STRUCTURES HUB LINK SATELLITE EDW Data Vault 1 2 3
  • 14. © 2021 Snowflake Inc. All Rights Reserved DATA VAULT CORE ARCHITECTURE HUBS ------------------- Unique List of Business Keys LINKS ------------------- Unique List of Relationships across Keys SATS ------------------- Descriptive Data ● Satellites have one and only one parent table ● Satellites cannot be parents to other tables ● Hubs cannot be child tables
  • 15. © 2020 Empowered Holdings & Dan Linstedt. All Rights Reserved • WHAT DOES A DATA VAULT MODEL LOOK LIKE?
  • 16. © 2020 Empowered Holdings & Dan Linstedt. All Rights Reserved • WHAT DOES A DATA VAULT MODEL LOOK LIKE?
  • 17. © 2020 Empowered Holdings & Dan Linstedt. All Rights Reserved • WHAT DOES A DATA VAULT MODEL LOOK LIKE?
  • 18. © 2020 Empowered Holdings & Dan Linstedt. All Rights Reserved • WHAT DOES A DATA VAULT MODEL LOOK LIKE?
  • 19. © 2020 Empowered Holdings & Dan Linstedt. All Rights Reserved • WHAT DOES A DATA VAULT MODEL LOOK LIKE?
  • 20. © 2021 Snowflake Inc. All Rights Reserved 1. HUB = BUSINESS KEYS Hubs = Unique Lists of Business Keys Business Keys are used to TRACK and IDENTIFY key information DV 2.0 uses MD5 Hash of the BK for the PK Natural Business Keys also acceptable for PK
  • 21. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2021 Snowflake Inc. All Rights Reserved 2: LINKS = ASSOCIATONS Links = Transactions and Associations They are used to hook together multiple sets of information In DV 2.0 the BK attributes may migrate to the Links for faster query
  • 22. © 2021 Snowflake Inc. All Rights Reserved Tomorrow With DV No need to change EDW structure The business rule can change to a 1:M Relationship is a 1:1 so why model a Link? MODELING LINKS - 1:1 or 1:M? Today You discover new data later Existing data is fine New data is added
  • 23. © 2021 Snowflake Inc. All Rights Reserved 3. SATELLITES = DESCRIPTORS Satellites provide context for the Hubs and the Links Tracks changes over time - Like SCD 2 In DV 2.0 use HASH_DIFF to detect changes
  • 24. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2021 Snowflake Inc. All Rights Reserved • Think Type 2 SCD (Slowly Changing Dimensions) • Old Way: • Compare column by column • Source value != Current value in DW table • 20 columns, then 20 compares • New Way: • Concatenate all columns to one string • Convert to one char(32) string with hash function • Compare to hashed value (HASH_DIFF) in target table • Does not matter how many columns MD5-BASED CHANGE DETECTION
  • 25. © 2021 Snowflake Inc. All Rights Reserved SNOWFLAKE FEATURES
  • 26. © 2021 Snowflake Inc. All Rights Reserved SEPARATION OF WORKLOADS TO SUPPORT DATA VAULT LOADING
  • 27. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2021 Snowflake Inc. All Rights Reserved Snowflake solution • Assign each workload its own Virtual WH • At a minimum, two WH: ETL/ELT and BI • ETL can run continuously if it makes sense for the business • ETL / Business workload contention is eliminated • Further subdivide business or ETL workloads into own clusters as needed • Eliminate contention between Sales, Marketing, Finance, Data Science workloads • Easy to add new workloads on demand! SEPARATION OF WORKLOADS Virtual Warehouse Databases Virtual Warehouse ETL & Data Loading Business Workloads Finance Virtual Warehouse Test/Dev Virtual Warehouse S Marketing Virtual Warehouse Sales Virtual Warehouse S Research
  • 28. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2021 Snowflake Inc. All Rights Reserved OTHER AGILE WAREHOUSE SCALING TECHNIQUES Increase cluster size – on the fly! • More data being analyzed • More complex queries • Get some concurrency boost • No need to start, stop, or reboot! Automatic scale out! • Multi-cluster warehouse concurrency • Workload queries have usual weight • But more of them, e.g. 20 dashboard users rather than the usual 5 • Automatically scales back after the load drops
  • 29. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2021 Snowflake Inc. All Rights Reserved DATA VAULT LOADING PATTERN (BUSINESS / HASHED KEYS) Processing: • All processes within sets done in parallel • Process sets ”wait” for previous set completion • Sets run as soon as data is ready • No other “waiting” needed • Load dependencies maximally reduced Vault Loads Stage Loads - Major Synchronization Points
  • 30. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2021 Snowflake Inc. All Rights Reserved APPLYING VIRTUAL WAREHOUSES TO DATA VAULT Source to Stage Loads •1 multi-cluster warehouse Use as separate MCWH for each Data Vault entity type •Maximum parallelism •Zero contention •Maximum flexibility •Each MCWH can be independently sized
  • 31. © 2021 Snowflake Inc. All Rights Reserved SEMI STRUCTURED DATA & DATA VAULT
  • 32. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2021 Snowflake Inc. All Rights Reserved JSON SUPPORT WITH SQL – VARIANT Apple 101.12 250 FIH-2316 Pear 56.22 202 IHO-6912 Orange 98.21 600 WHQ-6090 Structured data (e.g. CSV) Semi-structured data (e.g. JSON, Avro, XML) { "firstName": "John", "lastName": "Smith", "height_cm": 167.64, "address": { "streetAddress": "21 2nd Street", "city": "New York", "state": "NY", "postalCode": "10021-3100" }, "phoneNumbers": [ { "type": "home", "number": "212 555-1234" }, { "type": "office", "number": "646 555-4567" } ] } Optimized storage Flexible schema - Native Relational processing select v:lastName::string as last_name from json_demo; All Your Data!
  • 33. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2021 Snowflake Inc. All Rights Reserved Create Sats using a VARIANT • Hub Key • Load DTS • Rec Source • VARIANT Load JSON doc into VARIANT column Create a View on the Sat for Business Vault • View presents only the JSON attributes needed for current use case Benefits • No change to Sat definitions if “schema” changes • Business Vault views are unaffected • Change or add new BV views over time as needed COMBINE DATA VAULT & SCHEMA-IN-READ
  • 34. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
  • 35. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
  • 36. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. BUILDING THE BIZ VAULT VIEW CREATE OR REPLACE VIEW bizvault.vw_sat_account_type1 ( hub_account_hkey, load_dts, name, account_type_code, account_type, account_type_description, fund_bill_addr_code, fund_bill_addr_name, fund_bill_addr_description, keywords, kids, addresses, emails, phones, tags, user_defined_fields, attributes COMMENT '{ Accounts: "accountname INT, accountstatusid INT, accounttype INT, affiliations INT, alertids INT, announcement INT, authinfo INT, bankaccounts INT, creditcards INT, dbid INT, extids INT, fndbilladdrid INT, fndmailaddrid INT, fndprimaryrepid BIGINT, fndrepresentatives INT, genprimaryrepid BIGINT, genrepresentatives INT, householdincomeid INT, id INT, imageid STRING, keywords INT, kids INT, magstripeids INT, mailingfl INT, onlinelockfl INT, orgaddresses INT, orgcomments INT, orgemails INT, orgindustryid INT, orgname INT, orgphones INT, orgtype INT, orgurl INT, pin INT, preferredcontactmethod INT, preferredemail INT, preferredlanguageid INT, preferredphoneid INT, primarycontactid STRING, profileids INT, salutations INT, sourceid INT, tags INT, tktbilladdrid INT, tktmailaddrid INT, tktprimaryrepid BIGINT, tktrepresentatives INT, userdefinedfields INT, z_createduserid BIGINT, z_createduserip INT, z_createdusername INT, z_createdusersession INT, z_createdusertimestamp TIMESTAMP, z_lastupdateduserid BIGINT, z_lastupdateduserip INT, z_lastupdatedusername INT, z_lastupdatedusersession INT, z_lastupdatedusertimestamp TIMESTAMP, z_originationtimestamp TIMESTAMP, z_originationusername INT, z_status INT, z_tag INT, z_version BIGINT", Codes: "name STRING, description STRING"}' ) COMMENT = 'Contains the latest active record for each key'
  • 37. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2021 Snowflake Inc. All Rights Reserved AS WITH sat_cte AS ( SELECT hub_account_hkey, load_dts, is_deleted, attributes, -- This is the VARIANT LEAD(load_dts) OVER (PARTITION BY hub_account_hkey ORDER BY load_dts, load_seq) AS lead_dts FROM rawvault.sat_account )
  • 38. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2021 Snowflake Inc. All Rights Reserved SELECT sat.hub_account_hkey, sat.load_dts, sat.attributes:accountname::STRING, sat.attributes:accounttype::STRING, ref_account_type.attributes:name:en_US::STRING, ref_account_type.attributes:description:en_US::STRING, sat.attributes:fndbilladdrid::STRING, ref_fund_bill_address.attributes:name:en_US::STRING, ref_fund_bill_address.attributes:description:en_US::STRING, sat.attributes:keywords::ARRAY, sat.attributes:kids::ARRAY, sat.attributes:orgaddresses::ARRAY, sat.attributes:orgemails::ARRAY, sat.attributes:orgphones::ARRAY, sat.attributes:tags::ARRAY, sat.attributes:userdefinedfields::ARRAY, OBJECT_DELETE(sat.attributes,'accountname','accounttype','keywords','kids','orgaddresses’, 'orgemails','orgphones','tags','userdefinedfields’) as Attributes FROM sat_cte sat
  • 39. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2021 Snowflake Inc. All Rights Reserved LEFT JOIN vw_ref_code_type1 ref_account_type ON sat.attributes:accounttype::STRING = ref_account_type.code AND ref_account_type.organization_id = '-1' AND -- Global code ref_account_type.type = 'P' AND ref_account_type.subtype = 'ACCT_TYPE' LEFT JOIN vw_ref_code_type1 ref_fund_bill_address ON sat.attributes:fndbilladdrid::STRING = ref_fund_bill_address.code AND sat.attributes:dbid::STRING = ref_fund_bill_address.organization_id AND ref_fund_bill_address.type = 'P' AND ref_fund_bill_address.subtype = 'ADDRESS' WHERE lead_dts IS NULL AND is_deleted = FALSE;
  • 40. © 2021 Snowflake Inc. All Rights Reserved “We compared before and after numbers and found that with Snowflake there was a 90 percent reduction in the code used for the ETL process. That’s a big win for us.” - Paciolan’s Principal Software Engineer, Ashkan Khoshcheshmi,
  • 41. © 2021 Snowflake Inc. All Rights Reserved OTHER FEATURES
  • 42. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2021 Snowflake Inc. All Rights Reserved Allows loading several tables from one source in one statement USING MULTI-TABLE INSERT INSERT ALL WHEN (SELECT COUNT(*) FROM dv_hub_country hc WHERE hc.hub_country_key= stg.hash_key) = 0 THEN INTO dv.hub_country (hub_country_key, country_abbrv, hub_load_dts, hub_rec_src) VALUES (stg.hash_key, stg.country_abbrv, stg.load_dts, stg.rec_src) WHEN (SELECT COUNT(*) FROM dv.sat_countries sc WHERE sc.hub_country_key = stg.hash_key AND sc.hash_diff = MD5(country_name)) = 0 THEN INTO dv.sat_countries (hub_country_key, sat_load_dts, hash_diff, sat_rec_src, country_name) VALUES (stg.hash_key, stg.load_dts, stg.hash_diff, stg.rec_src, stg.country_name) SELECT MD5(country_abbrv) AS hash_key, country_abbv, country_name, MD5(country_name) AS hash_diff, CURRENT_TIMESTAMP AS load_dts, stage_rec_src AS rec_src FROM stage.country stg;
  • 43. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2021 Snowflake Inc. All Rights Reserved PERHAPS YOU WISH TO SPLIT FOR SECURITY REASONS? From This Schema layout Single Snowflake Database In Snowflake – you can easily split for security reasons To THIS! DV Logical Partitioning Non-sensitive data PII or sensitive data with a Link to non-sensitive data Snowflake DB 1 Snowflake DB 2
  • 44. © 2021 Snowflake Inc. All Rights Reserved VIRTUALIZING INFORMATION MARTS
  • 45. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2021 Snowflake Inc. All Rights Reserved • Use the Hub/Link PK columns • Filter on LEAD of LOAD_DTS - imbedded in a CTE AS WITH sat_cte AS ( SELECT hub_account_hkey, load_dts, is_deleted, attributes, LEAD(load_dts) OVER (PARTITION BY hub_account_hkey ORDER BY load_dts) AS expire_dts FROM rawvault.sat_account ) TIP - EASILY GETTING CURRENT ROWS FROM A SAT
  • 46. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2021 Snowflake Inc. All Rights Reserved EXAMPLE – VIRTUAL CURRENT FLAG & EXPIRE DATE! CASE WHEN LEAD(stg.LOAD_DTS) OVER (PARTITION BY stg.CDC_KEY ORDER BY stg.LOAD_DTS) IS NULL THEN 'Y' ELSE 'N' END CURR_FLG, LEAD(stg.LOAD_DTS) OVER (PARTITION BY stg.CDC_KEY ORDER BY stg.LOAD_DTS) EXPR_DTS
  • 47. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2021 Snowflake Inc. All Rights Reserved • Yes! • Elastic and multi-cluster warehouses provide the power • Dynamic optimization engine provides the performance • Best practice • Start with views! • Find the right size VWH to support desired SLA • How fast is fast enough? • If still too slow THEN use traditional tables • Allows agile development approach • Fail fast! WILL VIEWS PERFORM?
  • 48. © 2021 Snowflake Inc. All Rights Reserved “In the past, our data vault had performance limitations, since complex joins happened on the same resource where data was stored. With the separation of storage and compute in Snowflake, that limitation goes away.” - GRAEME FORBES, Principal Data Engineer, Sainsbury’s
  • 49. © 2021 Snowflake Inc. All Rights Reserved SIMPLE REFERENCE ARCHITECTURE
  • 50. © 2021 Snowflake Inc. All Rights Reserved RAW Data Vault Acquisition Business Data Vault Information Delivery Snowflake Data Cloud Data Consumers Data Lifecycle Data Replication Tools Data Lake (complement) Bulk upload Stream Data Marketplace User workspaces MULTI-TIER DATA VAULT ARCHITECTURE Ad-hoc Analysis BI & Self-Service Reporting Data Share Data Science Databases Data Providers Bulk File Upload Data Lake Apps & SaaS Streaming data Data Share Customer managed Snowflake managed Information Marts (physical or virtual) Loading & Staging Data Share Snowpipe, Kafka External Tables Micro-batch Direct Data Discovery Governance, Security, DevOps RAW Data Structured Semi-Structured Unstructured Data Share Bridge, PIT tables (optional) derived DV objects soft business rules (physical or virtual) RAW DV objects hard business rules +
  • 51. © 2021 Snowflake Inc. All Rights Reserved Information Delivery RAW Data Vault Business Data Vault Data Lifecycle DATA VAULT VIRTUALIZATION Loading & Staging Data Share Snowpipe, Kafka External Tables Micro-batch ... Staging Table HUB LINK SAT RAW Data Structured Semi-Structured Unstructured Data Share Snowflake Data Cloud Governance, Security, DevOps Direct Data Discovery New derived relationship ... View derivation SAT LINK ... Any other delivery model Optionally Materialized XS 6XL + + Bridge, PIT tables optionally materialized XS 6XL business view raw view live raw data marts live information marts Data Consumers Ad-hoc Analysis BI & Self-Service Reporting Data Share Data Science
  • 52. © 2021 Snowflake Inc. All Rights Reserved CRITICAL SUCCESS FACTORS
  • 53. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2021 Snowflake Inc. All Rights Reserved IMPORTANT SUCCESS COMPONENTS All data All workloads At any scale Simple Training Community Consulting Right Data Platform Automation zz Time to Value Skills 200x Certified!
  • 54. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2021 Snowflake Inc. All Rights Reserved • Several Snowflake Partners offer Data Vault classes • Empowered Holdings (Dan Linstedt) – globally • ScaleFree - in EMEA • PerformanceG2 – in USA • Certus – IN APAC • Talk to your Snowflake account manager for contact information • Join DataVaultAlliance.com to keep up and ask questions DATA VAULT TRAINING & CERTIFICATION
  • 55. © 2021 Snowflake Inc. All Rights Reserved CUSTOMER EXAMPLES
  • 56. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. • Picnic (Amsterdam) • Happy Money (US) • Alpha Cruncher (EMEA) • NYU (US) • Sunoco (US) • Emerald Performance Materials (US) • Finnair (Finland) • Finnavia (Finland) • NIB (Australia) • NPS Medicinewise (Australia) SNOWFLAKE CUSTOMERS USING DATA VAULT • Scripps Health (US) • PepsiCo (US) • Hampshire Trust Bank (US) • Paciolan (US) • Micron (US) • AXA (Germany) • ResearchNow (US) • JC Penny (US) • ROLAND Rechtsschutz- Versicherungs-AG (Germany) • F+W Media (US) • Sainsbury’s (London) • Amica Insurance (US)
  • 57. © 2021 Snowflake Inc. All Rights Reserved References
  • 58. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. › Available on Amazon: http://www.amazon.com/Better- Data-Modeling-Introduction- Engineering-ebook /dp/ B018BREV1C/ Intro ebook by Kent Graziano:
  • 59. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. › Available on Amazon.com › Soft Cover or Kindle Format › Now also available in PDF at LearnDataVault.com › Kent was the Technical Editor Super Charge Your Data Warehouse
  • 60. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. › Available on Amazon: http://www.amazon.com/Buildin g-Scalable-Data-Warehouse- Vault/dp/0128025107/ New DV 2.0 Book from Dan Linstedt
  • 61. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. © 2021 Snowflake Inc. All Rights Reserved THANK YOU