5. THE RIGHT TO CHANGE YOUR MIND
THIS IS NOT
WHAT I ASKED FOR
I CHANGED
MY MIND
6. WORLD IS GOING DIGITAL
BI
BILLING
CRM
ARTIFICIAL
INTELLIGENCE
REPORTING
MOBILE APP
MOBILE APP
HR
7. Why architect an ideal and not the real world ?
Earlier approaches (pre-defined Data models, Kimball, Bill Inmon 3NF) build a data integration layer
based on the ideal world through a Business Based Model in the integration layer representing the single
version of the Truth.
This model becomes obsolete the moment it is ready
These models are not built to cope with multiple versions of the truth
These models are not resilient to change, they lack agility
These models cannot cope with an overload of data sources
REALITY CHECK
8. NEVER ENDING STORY OF DATA INTEGRATION?
LETS JUST DUMP ALL OUR DATA INTO AN
UNSTRUCTURED DATA LAKE AND TO SCHEMA
ON READ WHEN WE ACTUALLY NEED IT ???
9. UNSTRUCTURED DATA IS HARD TO USE
Unstructured data hard to find, read, integrate, use
Data scientists or AI engineers will spend most of
their time on data preparation
Valuable time gets lost to actually analyze data
Different people doing the same job multiple times
14. DATA VAULT 2.0
Data Vault 2.0 is a standardised approach for
implementing integration systems.
• data modelling technique
• architecture
• agile implementation methodology
Invented by Dan Linstedt
Standard is maintained by data vault alliance
15. THE HUB
The Hub represents a Core Business Element such as Customer, Vendor, Sale, Product, …
This means that there should also be only ONE Hub for every Core Business Element
The Hub contains no descriptive data
The Hub contains only the Business Key(s)
Only a list of Unique Business Key(s) is kept in de Hub, not the history.
Hash keys are generated for each and every Business Key
HASHKEY PRODUCT
SERIAL NO
LOAD
DATE
RECORD
SOURCE
5kj5-kj45 G110 8/1/19 ERP
zry7-yy5u G112 12/6/19 ERP
16. THE LINK
The Link is used to represent relationships between business elements.
Only one link should exist for a relationship between business elements.
Each Link is based on a unique, specific, natural business relationship.
Only a list of unique combinations of Business keys representing the relationship is kept, not the history of change.
The Link contains no descriptive data
The Link does not have its own Business Key
In Data Vault modelling there are only many-to-many relationships
—> focus on identifying business relationships and less on the specific relationship cardinality
PRODUCT
HASHKEY
CUSTOMER
HASHKEY
LOAD
DATE
RECORD
SOURCE
5kj5-kj45 56gf-uwn8 8/1/19 ERP
zry7-yy5u osn8-sdnx 12/6/19 ERP
17. THE SATELLITE
The Satellite contains all descriptive information for Hubs and Links
• satellite on hub
• satellite on link
The Satellite is the only construct in Data Vault modelling capable of tracking history
The Satellite doesn’t have a Business Key.
PRODUCT
HASHKEY
LOAD
DATE
RECORD
SOURCE
HASHDIFF NAME MIN ORDER
QTY
ACTIVE
PRODUCT
PRODUCT
DESCRIPTION
5kj5-kj45 8/1/19 ERP erdf-vg76 null null Y null
5kj5-kj45 9/1/19 FILE X hd02-9djd GPS110series 2 Y A GPS th..
5kj5-kj45 13/5/19 ERP 8js2-48ds GPS110series 2 N A GPS th..
zry7-yy5u 12/6/19 ERP jfkd-df43 GPS112series 1 Y Our second g..
19. OTHER OBJECTS
Point in time (PIT) tables: combine data from multiple satellites on a hub or link into one single snapshot table
Bridge tables: hash key combination for that bridges over multiple hubs and links
BOTH STRUCTURES ARE USED TO OPTIMISE QUERY PERFORMANCE ON THE RAW DATA VAULT
20. ARCHITECTURE: MODERN DATA WAREHOUSE OR
STRUCTURED DATA LAKE
MULTIPLE VERSIONS OF THE TRUTH
SINGLE VERSION OF THE FACTS
21. ARCHITECTURE: ENTERPRISE DATA HUB
BI
BILLING
CRM
ARTIFICIAL
INTELLIGENCE
REPORTING
MOBILE APP
MOBILE APP
MOBILE APP
HR
360o
Data Hub
DATA
VAULT
23. UNLIMITED SCALABILITY
Through the use of Hash Keys
no interdependencies no complex data flows
no loading sequence
all objects can be loaded in parallel delivering unlimited scalability
24. REPEATABLE PATTERNS
Faster Build of EDW through Data Warehouse Automation
Low level and limited number of objects
Hub, Satellite,Link, PIT and Bridge that have the same loading pattern
support automated generation of ETL-mappings instead of manual development.
Clear separation of Integration (Warehousing) and Delivery of Information
First create the Single Version of the Facts in the Raw Data Vault (Warehousing)
Support Multiple Versions of the Truth in the Business Data Vault (Delivery)
25. WHY DATA VAULT?
The advantages of using Data Vault 2.0 as a Data Modelling approach
Support the automated build and maintenance of an Enterprise Data Warehouse or Enterprise Data Hub
Repeatable patterns
Scalability
Completeness (atomic, all historic data) = Data Recorder
Resilient to change
Supports SQL and NoSQL environments: can bridge the gap between classic relational and hadoop & nosql
Flexibility & Multiple speed implementation
Support multiple versions of the truth
Opens the door for adaptive or dynamic data warehousing (without human intervention)
27. A DECADE OF EXPERIENCE: FROM FRAMEWORK TO PRODUCT
REBRANDING:
VAULTSPEED
2019
28. SAAS TOOL
HARVEST SOURCE METADATA
ANY SOURCE WITH JDBC CONNECTOR
DEPLOY CODE
- ELT FLOWS
- DATA DEFINITION
LANGUAGE
GUIDED USER INTERFACE
GENERATE INITIAL
SETUP OR DELTAS
30. VAULTSPEED PRINCIPLE - NOT AN ELT-TOOL
Vaultspeed
Accelerator to speed up the development of an
Integration Layer
Symbioses with existing ELT-tool and not a
replacement
Pro
Keep investments in existing ELT-Tool
license can be terminated but your ELT will still
run
VAULTSPEED
Templates
34. ARGENTA BANK AND INSURANCE
1300 employees in HQ
500 branches, 2000 employees
Net profit +/- 200 Million EUR
1,72 million customers
44,1 billion funds in management
8,1% market share in BE
35. Internal Knowledge
Management dashboards
Customer insights
Digital Transformation
New mobile platform for customers
Knowing the customer
Support all Regulatory Requirements
GDPR
BCBS239
MIFID II
BUSINESS NEEDS
36. BEFORE
Output
Boekhouding Model
Wet en Regelgeving
Model
Commercieel Model
Solvency II
Rapportering
Basle II
Rapportering
Overeenkomst
Persoon
Transactie
Profiel
Inzichten
Management
Rapportering
Finance Model
Wettelijke
Rapportering
Client beeld model
Input
Risk Monitoring
Model
Klanten Service Model
Client
Hoedanigheid
Produkt
Zekerheid
Voorwerp
Waardering
Gebeurtenis
Risico meting Model
Interactie
Klant
Regel
Gevers
FMP
WERA
GDPR
MIFID II
METRO
DIM
BCBS 239
KYC
37. SOLUTION
Create a corporate data store that distributes nearly-online integrated data to create value in data
in a controlled and managed process
quality assurance is embedded
on a need to know principle
with respect towards privacy and legal constraints
agility bimodal
waterfall: follow major releases in core banking systems
agile: independent releases for fast delivery of new content
between operational applications and supporting applications
towards reporting or analytical environments
40. NEAR REAL TIME IMPLEMENTATION
14 (+) heterogenous data sources
integrated in the solution
10000+ objects in raw + business
data vault
At multiple speeds
Average load time for real time CDC
BDV : 6 Seconds
Various outflows to other systems
41. AGILE WAY OF WORKING
SAFE (scaled agile framework)
2 week sprints
program increment planning
1 program increment = 6 sprints
Multiple teams
2 development teams (2x6 people)
1 system team (support team)
1 analyst team (make features sprintable)
1 data quality team
Source system releases are embedded in sprint planning as
maintenance features
42. CIVL FUNCTIONAL ROADMAP
mifid (investments products and client data)
IRB modelling (credit products)
click and social media data
client based scoring (profiling and modelling) on a datalab
analitical based selling and servicing
next best action self learning (AI)