SlideShare a Scribd company logo
1 of 14
Download to read offline
Data Vault 2.0: Using MD5 Hashes for 
Change Data Capture 
Kent Graziano 
Data Warrior LLC 
Twitter @KentGraziano
Data Vault Definition 
The Data Vault is a detail oriented, historical tracking 
and uniquely linked set of normalized tables that 
support one or more functional areas of business. 
It is a hybrid approach encompassing the best of 
breed between 3rd normal form (3NF) and star 
schema. The design is flexible, scalable, consistent 
and adaptable to the needs of the enterprise. 
Architected specifically to meet the needs 
of today’s enterprise data warehouses 
Dan Linstedt: Defining the Data Vault 
TDAN.com Article
Data Vault Time Line 
E.F. Codd invented 
relational modeling 
Chris Date and 
Hugh Darwen 
Maintained and 
Refined 
Modeling 
1976 Dr Peter Chen 
Created E-R 
Diagramming 
Mid 70’s AC Nielsen 
Popularized 
Dimension & Fact Terms 
1990 – Dan Linstedt 
Begins R&D on Data 
Vault Modeling 
1960 1970 1980 1990 2000 
Early 70’s Bill 
Inmon Began 
Discussing Data 
Warehousing 
Mid 60’s Dimension & Fact 
Modeling presented by 
General Mills and Dartmouth 
University 
Late 80’s – Barry 
Devlin and Dr Kimball 
Release “Business 
Data Warehouse” 
Mid 80’s Bill Inmon 
Popularizes Data 
Warehousing 
Mid – Late 80’s Dr Kimball 
Popularizes Star Schema 
2000 – Dan Linstedt 
releases first 5 
articles on Data Vault 
Modeling 
© LearnDataVault.com
2014 - Next Evolution
What’s New in DV2.0? 
 Modeling Structure Includes… 
● NoSQL, and Non-Relational DB systems, Hybrid Systems 
● Minor Structure Changes to support NoSQL 
 New ETL Implementation Standards 
● For true real-time support 
● For NoSQL support 
 New Architecture Standards 
● To include support for NoSQL data management systems 
 New Methodology Components 
● Including CMMI, Six Sigma, and TQM 
● Including Project Planning, Tracking, and Oversight 
● Agile Delivery Mechanisms 
● Standards, and templates for Projects 
© LearnDataVault.com
This model is fully 
compliant with Hadoop, 
needs NO changes to 
work properly. 
The Hash Keys can be 
used to join to Hadoop 
data sets. 
MD5 PK – replaces 
surrogate keys 
MD5DIFF – used for 
change detection 
Use of MD5 Hash in DV2.0 
© LearnDataVault.com
MD5-based Change Detection 
 Think Type 2 SCD 
 Old Way: 
● Compare column by column 
● Source value != Current value in DW table 
● 20 columns, then 20 compares 
 New Way: 
● Concatenate all columns to one string 
● Convert to one char(32) string with hash function 
● Compare to hashed value (MD5DIFF) in target table 
● Does not matter how many columns 
© Data Warrior LLC
What does it look like? 
 Encode using standard MD5 hash 
function 
● rawtohex(sys.utl_raw.cast_to_raw( 
dbms_obfuscation_toolkit.md5 (input_string => 
...) 
 Need to minimize chance of duplicates 
● 12||3||45 and 1||2||345 hash to same value 
● Need a separator between each 
● Also handles case of null values 
● Example: Col1||’^’||Col2||’^’||Col3 
© Data Warrior LLC
Other considerations 
 To generate most consistent string: standardize! 
 Convert data types 
 If 'NUMBER', 'NVARCHAR2', 'NVARCHAR', 
'NCHAR‘ 
● THEN 'TO_CHAR(' || column_name || ')‘ 
 If 'RAW‘ 
● THEN 'ENC_BASE64(' || column_name || ')‘ 
 If 'DATE‘ 
● THEN 'TO_CHAR(' || column_name || ', ''YYYY-MM-DD'')‘ 
 If LIKE 'TIME%‘ 
● THEN 'TO_CHAR(' || column_name || ', ''YYYY-MM-DD 
HH24:MI:SS'')' 
© Data Warrior LLC
Final Input String 
(UPPER(TRIM(T1.GENERICNAME)) 
||'^'|| 
UPPER(TRIM( 
TO_CHAR(T1.MED_STRNG_AMT))) 
||'^'|| 
UPPER(TRIM(T1.UOM_CD)) 
||'^'|| 
UPPER(TRIM(T1.MED_FORM_NM)) 
||'^') 
© Data Warrior LLC
So what? 
 MD5 hash is consistent cross-platform 
 Changes multi-column compares to a single 
column 
 All compares take the same time during load 
process 
 Can use with any DW architecture that requires 
change detections 
 Virtually no limit 
● Think Big Data/Hadoop/NoSQL 
 Can generate the input string automatically 
● But that is another talk! 
© Data Warrior LLC
Learn more about Data Vault 
www.LearnDataVault.com 
www.danlinstedt.com 
On YouTube: 
www.youtube.com/LearnDataVault 
On Facebook: 
www.facebook.com/learndatavault
Super Charge Your Data Warehouse 
Available on Amazon.com 
Soft Cover or Kindle Format 
Now also available in PDF at 
LearnDataVault.com
Contact Information 
Kent Graziano 
The Oracle Data Warrior 
Data Warrior LLC 
Kent.graziano@att.net 
On Twitter @KentGraziano 
Visit my blog at 
http://kentgraziano.com

More Related Content

What's hot

Data Architecture for Solutions.pdf
Data Architecture for Solutions.pdfData Architecture for Solutions.pdf
Data Architecture for Solutions.pdf
Alan McSweeney
 
Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data Fabric
Alan McSweeney
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
Denodo
 
Data Architecture for Data Governance
Data Architecture for Data GovernanceData Architecture for Data Governance
Data Architecture for Data Governance
DATAVERSITY
 

What's hot (20)

Data Architecture for Solutions.pdf
Data Architecture for Solutions.pdfData Architecture for Solutions.pdf
Data Architecture for Solutions.pdf
 
Data Vault and DW2.0
Data Vault and DW2.0Data Vault and DW2.0
Data Vault and DW2.0
 
Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data Fabric
 
Data engineering
Data engineeringData engineering
Data engineering
 
Data modeling star schema
Data modeling star schemaData modeling star schema
Data modeling star schema
 
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
 
Visual Data Vault
Visual Data VaultVisual Data Vault
Visual Data Vault
 
Getting Started with Data Stewardship
Getting Started with Data StewardshipGetting Started with Data Stewardship
Getting Started with Data Stewardship
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Audit Approach To Developing An Enterprise Data Strategy
Data Audit Approach To Developing An Enterprise Data StrategyData Audit Approach To Developing An Enterprise Data Strategy
Data Audit Approach To Developing An Enterprise Data Strategy
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Reference master data management
Reference master data managementReference master data management
Reference master data management
 
Enterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesEnterprise Data Architecture Deliverables
Enterprise Data Architecture Deliverables
 
Data Architecture for Data Governance
Data Architecture for Data GovernanceData Architecture for Data Governance
Data Architecture for Data Governance
 
Business Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachBusiness Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected Approach
 
How to identify the correct Master Data subject areas & tooling for your MDM...
How to identify the correct Master Data subject areas & tooling for your MDM...How to identify the correct Master Data subject areas & tooling for your MDM...
How to identify the correct Master Data subject areas & tooling for your MDM...
 
Liberating data with Talend Data Catalog
Liberating data with Talend Data CatalogLiberating data with Talend Data Catalog
Liberating data with Talend Data Catalog
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Data Marketplace and the Role of Data Virtualization
Data Marketplace and the Role of Data VirtualizationData Marketplace and the Role of Data Virtualization
Data Marketplace and the Role of Data Virtualization
 

Viewers also liked

10 Interesting Facts about Accounting
10 Interesting Facts about Accounting10 Interesting Facts about Accounting
10 Interesting Facts about Accounting
Arass A. Ahmed
 

Viewers also liked (19)

Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
 
Worst Practices in Data Warehouse Design
Worst Practices in Data Warehouse DesignWorst Practices in Data Warehouse Design
Worst Practices in Data Warehouse Design
 
Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Extreme BI: Creating Virtualized Hybrid Type 1+2 DimensionsExtreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
 
Why Data Vault?
Why Data Vault? Why Data Vault?
Why Data Vault?
 
Agile Data Warehousing: Using SDDM to Build a Virtualized ODS
Agile Data Warehousing: Using SDDM to Build a Virtualized ODSAgile Data Warehousing: Using SDDM to Build a Virtualized ODS
Agile Data Warehousing: Using SDDM to Build a Virtualized ODS
 
Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016
 
Agile Methods and Data Warehousing
Agile Methods and Data WarehousingAgile Methods and Data Warehousing
Agile Methods and Data Warehousing
 
Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012
 
Top Five Cool Features in Oracle SQL Developer Data Modeler
Top Five Cool Features in Oracle SQL Developer Data ModelerTop Five Cool Features in Oracle SQL Developer Data Modeler
Top Five Cool Features in Oracle SQL Developer Data Modeler
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
 
Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)
 
Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)
 
Shorter time to insight more adaptable less costly bi with end to end modelst...
Shorter time to insight more adaptable less costly bi with end to end modelst...Shorter time to insight more adaptable less costly bi with end to end modelst...
Shorter time to insight more adaptable less costly bi with end to end modelst...
 
Data Vault: Data Warehouse Design Goes Agile
Data Vault: Data Warehouse Design Goes AgileData Vault: Data Warehouse Design Goes Agile
Data Vault: Data Warehouse Design Goes Agile
 
Wireless society, mobile learning
Wireless society, mobile learningWireless society, mobile learning
Wireless society, mobile learning
 
GRUPOD_APLICINFO_07
GRUPOD_APLICINFO_07GRUPOD_APLICINFO_07
GRUPOD_APLICINFO_07
 
10 Interesting Facts about Accounting
10 Interesting Facts about Accounting10 Interesting Facts about Accounting
10 Interesting Facts about Accounting
 
Revolucion industrial
Revolucion industrialRevolucion industrial
Revolucion industrial
 
Good design better society - I nuovi luoghi della comunicazione - Bari
Good design better society - I nuovi luoghi della comunicazione  - BariGood design better society - I nuovi luoghi della comunicazione  - Bari
Good design better society - I nuovi luoghi della comunicazione - Bari
 

Similar to Data Vault 2.0: Using MD5 Hashes for Change Data Capture

Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandra
PL dream
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
vivekjv
 

Similar to Data Vault 2.0: Using MD5 Hashes for Change Data Capture (20)

Cassandra20141009
Cassandra20141009Cassandra20141009
Cassandra20141009
 
Cassandra20141113
Cassandra20141113Cassandra20141113
Cassandra20141113
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandra
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
 
Cassandra Data Modelling
Cassandra Data ModellingCassandra Data Modelling
Cassandra Data Modelling
 
Module02
Module02Module02
Module02
 
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra GuruUse Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
 
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
 
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig KerstiensFive Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
 
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
 
Presentation
PresentationPresentation
Presentation
 
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
 
Re-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseRe-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series Database
 
MIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome MeasuresMIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome Measures
 
Real-World Cassandra at ShareThis
Real-World Cassandra at ShareThisReal-World Cassandra at ShareThis
Real-World Cassandra at ShareThis
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
 
Indic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path aheadIndic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path ahead
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Model
 

More from Kent Graziano

More from Kent Graziano (9)

Balance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudBalance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data Cloud
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
 
Rise of the Data Cloud
Rise of the Data CloudRise of the Data Cloud
Rise of the Data Cloud
 
Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeDelivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with Snowflake
 
Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)
 
Making Sense of Schema on Read
Making Sense of Schema on ReadMaking Sense of Schema on Read
Making Sense of Schema on Read
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFW
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
 

Recently uploaded

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
JohnnyPlasten
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 

Data Vault 2.0: Using MD5 Hashes for Change Data Capture

  • 1. Data Vault 2.0: Using MD5 Hashes for Change Data Capture Kent Graziano Data Warrior LLC Twitter @KentGraziano
  • 2. Data Vault Definition The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise. Architected specifically to meet the needs of today’s enterprise data warehouses Dan Linstedt: Defining the Data Vault TDAN.com Article
  • 3. Data Vault Time Line E.F. Codd invented relational modeling Chris Date and Hugh Darwen Maintained and Refined Modeling 1976 Dr Peter Chen Created E-R Diagramming Mid 70’s AC Nielsen Popularized Dimension & Fact Terms 1990 – Dan Linstedt Begins R&D on Data Vault Modeling 1960 1970 1980 1990 2000 Early 70’s Bill Inmon Began Discussing Data Warehousing Mid 60’s Dimension & Fact Modeling presented by General Mills and Dartmouth University Late 80’s – Barry Devlin and Dr Kimball Release “Business Data Warehouse” Mid 80’s Bill Inmon Popularizes Data Warehousing Mid – Late 80’s Dr Kimball Popularizes Star Schema 2000 – Dan Linstedt releases first 5 articles on Data Vault Modeling © LearnDataVault.com
  • 4. 2014 - Next Evolution
  • 5. What’s New in DV2.0?  Modeling Structure Includes… ● NoSQL, and Non-Relational DB systems, Hybrid Systems ● Minor Structure Changes to support NoSQL  New ETL Implementation Standards ● For true real-time support ● For NoSQL support  New Architecture Standards ● To include support for NoSQL data management systems  New Methodology Components ● Including CMMI, Six Sigma, and TQM ● Including Project Planning, Tracking, and Oversight ● Agile Delivery Mechanisms ● Standards, and templates for Projects © LearnDataVault.com
  • 6. This model is fully compliant with Hadoop, needs NO changes to work properly. The Hash Keys can be used to join to Hadoop data sets. MD5 PK – replaces surrogate keys MD5DIFF – used for change detection Use of MD5 Hash in DV2.0 © LearnDataVault.com
  • 7. MD5-based Change Detection  Think Type 2 SCD  Old Way: ● Compare column by column ● Source value != Current value in DW table ● 20 columns, then 20 compares  New Way: ● Concatenate all columns to one string ● Convert to one char(32) string with hash function ● Compare to hashed value (MD5DIFF) in target table ● Does not matter how many columns © Data Warrior LLC
  • 8. What does it look like?  Encode using standard MD5 hash function ● rawtohex(sys.utl_raw.cast_to_raw( dbms_obfuscation_toolkit.md5 (input_string => ...)  Need to minimize chance of duplicates ● 12||3||45 and 1||2||345 hash to same value ● Need a separator between each ● Also handles case of null values ● Example: Col1||’^’||Col2||’^’||Col3 © Data Warrior LLC
  • 9. Other considerations  To generate most consistent string: standardize!  Convert data types  If 'NUMBER', 'NVARCHAR2', 'NVARCHAR', 'NCHAR‘ ● THEN 'TO_CHAR(' || column_name || ')‘  If 'RAW‘ ● THEN 'ENC_BASE64(' || column_name || ')‘  If 'DATE‘ ● THEN 'TO_CHAR(' || column_name || ', ''YYYY-MM-DD'')‘  If LIKE 'TIME%‘ ● THEN 'TO_CHAR(' || column_name || ', ''YYYY-MM-DD HH24:MI:SS'')' © Data Warrior LLC
  • 10. Final Input String (UPPER(TRIM(T1.GENERICNAME)) ||'^'|| UPPER(TRIM( TO_CHAR(T1.MED_STRNG_AMT))) ||'^'|| UPPER(TRIM(T1.UOM_CD)) ||'^'|| UPPER(TRIM(T1.MED_FORM_NM)) ||'^') © Data Warrior LLC
  • 11. So what?  MD5 hash is consistent cross-platform  Changes multi-column compares to a single column  All compares take the same time during load process  Can use with any DW architecture that requires change detections  Virtually no limit ● Think Big Data/Hadoop/NoSQL  Can generate the input string automatically ● But that is another talk! © Data Warrior LLC
  • 12. Learn more about Data Vault www.LearnDataVault.com www.danlinstedt.com On YouTube: www.youtube.com/LearnDataVault On Facebook: www.facebook.com/learndatavault
  • 13. Super Charge Your Data Warehouse Available on Amazon.com Soft Cover or Kindle Format Now also available in PDF at LearnDataVault.com
  • 14. Contact Information Kent Graziano The Oracle Data Warrior Data Warrior LLC Kent.graziano@att.net On Twitter @KentGraziano Visit my blog at http://kentgraziano.com