Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Data Vault 2.0 and Datawarehouse Automation with Vaultspeed

94 vues

Publié le

Presentation by Jonas De Keuster (CEO, Vaultspeed) at the #BIDASUMMIT on June 13th, 2019 in Diegem.

Publié dans : Business
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Data Vault 2.0 and Datawarehouse Automation with Vaultspeed

  1. 1. www.vaultspeed.com DATA VAULT 2.0 & DWH AUTOMATION WITH VAULTSPEED
  2. 2. AGENDA 1. INTRO - ARCHITECT THE REAL WORLD 2.DATA VAULT 2.0 3. DATA WAREHOUSE AUTOMATION WITH VAULTSPEED 4.CUSTOMER CASE: ARGENTA
  3. 3. ARCHITECT THE REAL WORLD ARCHITECT FOR CHANGE ARCHITECT FOR SPEED EMBRACING REALITY
  4. 4. SINGLE VERSION OF THE TRUTH = MYTH
  5. 5. THE RIGHT TO CHANGE YOUR MIND THIS IS NOT WHAT I ASKED FOR I CHANGED MY MIND
  6. 6. WORLD IS GOING DIGITAL BI BILLING CRM ARTIFICIAL INTELLIGENCE REPORTING MOBILE APP MOBILE APP HR
  7. 7. Why architect an ideal and not the real world ? Earlier approaches (pre-defined Data models, Kimball, Bill Inmon 3NF) build a data integration layer based on the ideal world through a Business Based Model in the integration layer representing the single version of the Truth. This model becomes obsolete the moment it is ready These models are not built to cope with multiple versions of the truth These models are not resilient to change, they lack agility These models cannot cope with an overload of data sources REALITY CHECK
  8. 8. NEVER ENDING STORY OF DATA INTEGRATION? LETS JUST DUMP ALL OUR DATA INTO AN UNSTRUCTURED DATA LAKE AND TO SCHEMA ON READ WHEN WE ACTUALLY NEED IT ???
  9. 9. UNSTRUCTURED DATA IS HARD TO USE Unstructured data hard to find, read, integrate, use Data scientists or AI engineers will spend most of their time on data preparation Valuable time gets lost to actually analyze data Different people doing the same job multiple times
  10. 10. DATA LAKE OR SWAMP
  11. 11. DATA INTEGRITY VS AGILITY
  12. 12. WOULDN’T IT BE NICE … •REPEATABLE PATTERNS •ASSEMBLY LINE
  13. 13. www.vaultspeed.com DATA VAULT 2.0
  14. 14. DATA VAULT 2.0 Data Vault 2.0 is a standardised approach for implementing integration systems. • data modelling technique • architecture • agile implementation methodology Invented by Dan Linstedt Standard is maintained by data vault alliance
  15. 15. THE HUB The Hub represents a Core Business Element such as Customer, Vendor, Sale, Product, … This means that there should also be only ONE Hub for every Core Business Element The Hub contains no descriptive data The Hub contains only the Business Key(s) Only a list of Unique Business Key(s) is kept in de Hub, not the history. Hash keys are generated for each and every Business Key HASHKEY PRODUCT SERIAL NO LOAD DATE RECORD SOURCE 5kj5-kj45 G110 8/1/19 ERP zry7-yy5u G112 12/6/19 ERP
  16. 16. THE LINK The Link is used to represent relationships between business elements. Only one link should exist for a relationship between business elements. Each Link is based on a unique, specific, natural business relationship. Only a list of unique combinations of Business keys representing the relationship is kept, not the history of change. The Link contains no descriptive data The Link does not have its own Business Key In Data Vault modelling there are only many-to-many relationships 
 —> focus on identifying business relationships and less on the specific relationship cardinality PRODUCT HASHKEY CUSTOMER HASHKEY LOAD DATE RECORD SOURCE 5kj5-kj45 56gf-uwn8 8/1/19 ERP zry7-yy5u osn8-sdnx 12/6/19 ERP
  17. 17. THE SATELLITE The Satellite contains all descriptive information for Hubs and Links • satellite on hub • satellite on link The Satellite is the only construct in Data Vault modelling capable of tracking history The Satellite doesn’t have a Business Key. PRODUCT HASHKEY LOAD DATE RECORD SOURCE HASHDIFF NAME MIN ORDER QTY ACTIVE PRODUCT PRODUCT DESCRIPTION 5kj5-kj45 8/1/19 ERP erdf-vg76 null null Y null 5kj5-kj45 9/1/19 FILE X hd02-9djd GPS110series 2 Y A GPS th.. 5kj5-kj45 13/5/19 ERP 8js2-48ds GPS110series 2 N A GPS th.. zry7-yy5u 12/6/19 ERP jfkd-df43 GPS112series 1 Y Our second g..
  18. 18. FROM SOURCE RELATIONAL TO DV 2.0
  19. 19. OTHER OBJECTS Point in time (PIT) tables: combine data from multiple satellites on a hub or link into one single snapshot table Bridge tables: hash key combination for that bridges over multiple hubs and links BOTH STRUCTURES ARE USED TO OPTIMISE QUERY PERFORMANCE ON THE RAW DATA VAULT
  20. 20. ARCHITECTURE: MODERN DATA WAREHOUSE OR STRUCTURED DATA LAKE MULTIPLE VERSIONS OF THE TRUTH SINGLE VERSION OF THE FACTS
  21. 21. ARCHITECTURE: ENTERPRISE DATA HUB BI BILLING CRM ARTIFICIAL INTELLIGENCE REPORTING MOBILE APP MOBILE APP MOBILE APP HR 360o Data Hub DATA VAULT
  22. 22. SUPPORT INCREMENTAL APPROACH Phase 2 L S H H S S Phase 1 S H H H S S L L S
  23. 23. UNLIMITED SCALABILITY Through the use of Hash Keys no interdependencies no complex data flows no loading sequence all objects can be loaded in parallel delivering unlimited scalability
  24. 24. REPEATABLE PATTERNS Faster Build of EDW through Data Warehouse Automation Low level and limited number of objects Hub, Satellite,Link, PIT and Bridge that have the same loading pattern support automated generation of ETL-mappings instead of manual development. Clear separation of Integration (Warehousing) and Delivery of Information First create the Single Version of the Facts in the Raw Data Vault (Warehousing) Support Multiple Versions of the Truth in the Business Data Vault (Delivery)
  25. 25. WHY DATA VAULT? The advantages of using Data Vault 2.0 as a Data Modelling approach Support the automated build and maintenance of an Enterprise Data Warehouse or Enterprise Data Hub Repeatable patterns Scalability Completeness (atomic, all historic data) = Data Recorder Resilient to change Supports SQL and NoSQL environments: can bridge the gap between classic relational and hadoop & nosql Flexibility & Multiple speed implementation Support multiple versions of the truth Opens the door for adaptive or dynamic data warehousing (without human intervention)
  26. 26. www.vaultspeed.com OUR DWA TOOL: VAULTSPEED
  27. 27. A DECADE OF EXPERIENCE: FROM FRAMEWORK TO PRODUCT REBRANDING: VAULTSPEED 2019
  28. 28. SAAS TOOL HARVEST SOURCE METADATA ANY SOURCE WITH JDBC CONNECTOR DEPLOY CODE - ELT FLOWS - DATA DEFINITION 
 LANGUAGE GUIDED USER INTERFACE GENERATE INITIAL SETUP OR DELTAS
  29. 29. GUIDED AUTOMATION PROCESS ENFORCING MENUS
  30. 30. VAULTSPEED PRINCIPLE - NOT AN ELT-TOOL Vaultspeed Accelerator to speed up the development of an Integration Layer Symbioses with existing ELT-tool and not a replacement Pro Keep investments in existing ELT-Tool license can be terminated but your ELT will still run VAULTSPEED Templates
  31. 31. SUPPORTED TARGET DB PLATFORMS MORE COMING SOON ...
  32. 32. IMPLEMENTATION MODES
  33. 33. www.vaultspeed.com CUSTOMER CASE: ARGENTA
  34. 34. ARGENTA BANK AND INSURANCE 1300 employees in HQ 500 branches, 2000 employees Net profit +/- 200 Million EUR 1,72 million customers 44,1 billion funds in management 8,1% market share in BE
  35. 35. Internal Knowledge Management dashboards Customer insights Digital Transformation New mobile platform for customers Knowing the customer Support all Regulatory Requirements GDPR BCBS239 MIFID II BUSINESS NEEDS
  36. 36. BEFORE Output Boekhouding Model Wet en Regelgeving Model Commercieel Model Solvency II Rapportering Basle II Rapportering Overeenkomst Persoon Transactie Profiel Inzichten Management Rapportering Finance Model Wettelijke Rapportering Client beeld model Input Risk Monitoring Model Klanten Service Model Client Hoedanigheid Produkt Zekerheid Voorwerp Waardering Gebeurtenis Risico meting Model Interactie Klant Regel Gevers FMP WERA GDPR MIFID II METRO DIM BCBS 239 KYC
  37. 37. SOLUTION Create a corporate data store that distributes nearly-online integrated data to create value in data in a controlled and managed process quality assurance is embedded on a need to know principle with respect towards privacy and legal constraints agility bimodal waterfall: follow major releases in core banking systems agile: independent releases for fast delivery of new content between operational applications and supporting applications towards reporting or analytical environments
  38. 38. INFRASTRUCTURE
  39. 39. ARCHITECTURE DATA VAULT
  40. 40. NEAR REAL TIME IMPLEMENTATION 14 (+) heterogenous data sources integrated in the solution 10000+ objects in raw + business data vault At multiple speeds Average load time for real time CDC BDV : 6 Seconds Various outflows to other systems
  41. 41. AGILE WAY OF WORKING SAFE (scaled agile framework) 2 week sprints program increment planning 1 program increment = 6 sprints Multiple teams 2 development teams (2x6 people) 1 system team (support team) 1 analyst team (make features sprintable) 1 data quality team Source system releases are embedded in sprint planning as maintenance features
  42. 42. CIVL FUNCTIONAL ROADMAP mifid (investments products and client data) IRB modelling (credit products) click and social media data client based scoring (profiling and modelling) on a datalab analitical based selling and servicing next best action self learning (AI)
  43. 43. Thank you Jonas De Keuster jonas.dekeuster@vaultspeed.com
 Piet Dewindt piet.dewindt@vaultspeed.com 
 
 www.vaultspeed.com

×