SlideShare une entreprise Scribd logo
1  sur  34
Datavault Hennie de Nooijer
Dan Linstedt Data modeling All data, all the time Method of design Data Vault
Agenda Position Definition Architecture Modeling Methodology Questions? 3 8-12-2010
Informationprovisioning 8-12-2010 4
Controllled informationprovisioning Information provisioning DWH 8-12-2010 5
Business Intelligence Data warehouse ETL Hardware RDBMS 8-12-2010 6
Definition The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. 7 The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. 8-12-2010
Detailoriented 8 8-12-2010
Historical tracking 9 8-12-2010
Uniquely linked  set normalized  tables 10 8-12-2010
Functional areas  of business 11 8-12-2010
8-12-2010 12 But there are more aspects…..
Auditable 13 8-12-2010
Scalable 14 8-12-2010
8-12-2010 15 Adaptable
8-12-2010 16 Active
8-12-2010 17 Metadata
8-12-2010 18 MDM aware
Agenda Position Definition Architecture Modeling Methodology Questions? 19 8-12-2010
Conventional architecture Current Business Demands/Wishes Integration Storage Presentation D W H TRANSFORM S T A G E Business Information Model
Modern architecture Integration Storage Presentation Storage Current Business Demands/Wishes S T A G E s o u r c e D W H b u s i n e s s D W H TRANSFORM ALL DATA, ALL THE TIME Current Business Information Model
Is geplaatst onder /betreft werkdag Bestelling op Business Information Model Ontvangt /Is geplaatst bij heeft omvang Verplicht tot /Is realisatie van Leverancier Bestaat uit /zit in Leverings condities Is bereid te leveren /kan geleverd worden door Levering Bestaat uit /komt voor in Materiaal soort Voorziet in /wordt in voorzien door werkdag omvang Komt voor in met Moet in voorzien worden voor Wordt ontvangen door /ontvangt Bestaat uit Materiaalbehoefte magazijn Betreft de bereidhied tot het levereren aan een /kan conform worden geleverd aan Magazijn
Architecture (detail) 23 8-12-2010 Frond end Patient Datamarts Patient Business Datavault Patient Raw  Datavault 1 Raw  Datavault 2 Raw  Datavault n KNA1 Patient Customer Replicatielaag Bron n Bron 2 Bron 1 KNA1 Customer Patient
Architecture (Advanced) Enterprise Service Bus (Biztalk/Cloverleaf/SOA) 24 8-12-2010 Frond end tools Datamarts Datavault Bron n Bron 1 Bron 2
Benefits Manage and enforce Compliance (SOX, HIPPA en BASEL II). Reduces Business cycle time. Enabling Master Data management. CMM Level 5 compliant. Repeatable, consistent and redundant. Trace all data back to source systems. Flexibility. Scalability. Consistent. Adaptable. Possible automatic generation of the DDL and ETL. Supports VLDB Designed for EDW 25 8-12-2010
Agenda Position Definition Architecture Modeling Methodology Questions? 26 8-12-2010 Patient Treat Satellite Satellite Treatment Link Satellite Hub Hub Satellite Satellite Satellite Satellite
Hub 27 8-12-2010 Hub Represents the business key. A surrogate key as the primary key. Load date timestamp (when did it get there?) Record source (where did it come from?) Patient_ID Patient_Key Patient_Code Patient_Name Patient_Desc Patient_Category Patient_SubCategory Patient_Address Patient_Gender Patient_Code Load_Date Record_Source Hub_Patient Patient
Satellite 28 8-12-2010 Satellite Descriptive items of a hub or a link A surrogate key as the primary key. Load date timestamp (when did it get there?) Record source (where did it come from?) Patient_Key Load_Date Patient_ID Patient_Key Load_Date Patient_Key Load_Date Patient_Code Patient_Name Patient_Desc Patient_Category Patient_SubCategory Patient_Address Patient_Gender Patient_Name Patient_Desc Patient_Category Patient_SubCategory Patient_Address Patient_Gender Patient_Name Patient_Desc Patient_Address Patient_Gender Patient_Category Patient_SubCategory SAT_Patient SAT_PatientCategory SAT_Patient Patient
Link Links two or more hubs Own surogate key. Keys from the hub Load date time stamp Record source 29 8-12-2010 Link Patient_Key Treat_Key Treatment_Key Hub_Patient Patient_Key Treat_Key Load_Date Record_Source Patient_Code Load_Date Record_Source Treat_Code Load_Date Record_Source Hub_Treat Link_Treatment
Bron datamodel 30 8-12-2010
Analyse datamodel 31 8-12-2010
Datavault datamodel 32 8-12-2010
8-12-2010 33 Datavault Point in Time views (PIT). ‘truth’ at a certain moment. Helper table? Bridge. Same as Point in Time but then a range.
Questions? 34 8-12-2010

Contenu connexe

Tendances

Agile BI via Data Vault and Modelstorming
Agile BI via Data Vault and ModelstormingAgile BI via Data Vault and Modelstorming
Agile BI via Data Vault and ModelstormingDaniel Upton
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingKent Graziano
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesIvo Andreev
 
IRM UK - 2009: DV Modeling And Methodology
IRM UK - 2009: DV Modeling And MethodologyIRM UK - 2009: DV Modeling And Methodology
IRM UK - 2009: DV Modeling And MethodologyEmpowered Holdings, LLC
 
Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)Michael Olschimke
 
Data Warehouse Project Report
Data Warehouse Project Report Data Warehouse Project Report
Data Warehouse Project Report Tom Donoghue
 
Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Empowered Holdings, LLC
 
Rando Veizi: Data warehouse and Pentaho suite
Rando Veizi: Data warehouse and Pentaho suiteRando Veizi: Data warehouse and Pentaho suite
Rando Veizi: Data warehouse and Pentaho suiteCarlo Vaccari
 
Warehouse components
Warehouse componentsWarehouse components
Warehouse componentsganblues
 
Data warehouse design
Data warehouse designData warehouse design
Data warehouse designines beltaief
 
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Denodo
 
Lecture 04 - Granularity in the Data Warehouse
Lecture 04 - Granularity in the Data WarehouseLecture 04 - Granularity in the Data Warehouse
Lecture 04 - Granularity in the Data Warehousephanleson
 
Gartner Cool Vendor Report 2014
Gartner Cool Vendor Report 2014Gartner Cool Vendor Report 2014
Gartner Cool Vendor Report 2014jenjermain
 
Data Warehouse Interview Questions And Answers | Data Warehouse Tutorial | Ed...
Data Warehouse Interview Questions And Answers | Data Warehouse Tutorial | Ed...Data Warehouse Interview Questions And Answers | Data Warehouse Tutorial | Ed...
Data Warehouse Interview Questions And Answers | Data Warehouse Tutorial | Ed...Edureka!
 
The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecyclebartlowe
 
Dw hk-white paper
Dw hk-white paperDw hk-white paper
Dw hk-white paperjuly12jana
 

Tendances (20)

Agile BI via Data Vault and Modelstorming
Agile BI via Data Vault and ModelstormingAgile BI via Data Vault and Modelstorming
Agile BI via Data Vault and Modelstorming
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
 
Data Vault Introduction
Data Vault IntroductionData Vault Introduction
Data Vault Introduction
 
IRM UK - 2009: DV Modeling And Methodology
IRM UK - 2009: DV Modeling And MethodologyIRM UK - 2009: DV Modeling And Methodology
IRM UK - 2009: DV Modeling And Methodology
 
Why Data Vault?
Why Data Vault? Why Data Vault?
Why Data Vault?
 
Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)
 
Data Warehouse Project Report
Data Warehouse Project Report Data Warehouse Project Report
Data Warehouse Project Report
 
Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012
 
Rando Veizi: Data warehouse and Pentaho suite
Rando Veizi: Data warehouse and Pentaho suiteRando Veizi: Data warehouse and Pentaho suite
Rando Veizi: Data warehouse and Pentaho suite
 
Warehouse components
Warehouse componentsWarehouse components
Warehouse components
 
Data warehouse design
Data warehouse designData warehouse design
Data warehouse design
 
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
 
Lecture 04 - Granularity in the Data Warehouse
Lecture 04 - Granularity in the Data WarehouseLecture 04 - Granularity in the Data Warehouse
Lecture 04 - Granularity in the Data Warehouse
 
Data vault: What's Next
Data vault: What's NextData vault: What's Next
Data vault: What's Next
 
Gartner Cool Vendor Report 2014
Gartner Cool Vendor Report 2014Gartner Cool Vendor Report 2014
Gartner Cool Vendor Report 2014
 
Data Warehouse Interview Questions And Answers | Data Warehouse Tutorial | Ed...
Data Warehouse Interview Questions And Answers | Data Warehouse Tutorial | Ed...Data Warehouse Interview Questions And Answers | Data Warehouse Tutorial | Ed...
Data Warehouse Interview Questions And Answers | Data Warehouse Tutorial | Ed...
 
The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecycle
 
Data vault what's Next: Part 2
Data vault what's Next: Part 2Data vault what's Next: Part 2
Data vault what's Next: Part 2
 
Dw hk-white paper
Dw hk-white paperDw hk-white paper
Dw hk-white paper
 

Similaire à Data vault

Data Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroData Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroDenodo
 
Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)Denodo
 
Is it sensible to use Data Vault at all? Conclusions from a project.
Is it sensible to use Data Vault at all? Conclusions from a project.Is it sensible to use Data Vault at all? Conclusions from a project.
Is it sensible to use Data Vault at all? Conclusions from a project.Capgemini
 
Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...
Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...
Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...Alluxio, Inc.
 
Prague data management meetup #31 2020-01-27
Prague data management meetup #31 2020-01-27Prague data management meetup #31 2020-01-27
Prague data management meetup #31 2020-01-27Martin Bém
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerDataWorks Summit
 
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...Denodo
 
Data virtualization an introduction
Data virtualization an introductionData virtualization an introduction
Data virtualization an introductionDenodo
 
“A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack” “A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack” Stratio
 
CV_Kamel_Mahdhaoui_2015-08_English
CV_Kamel_Mahdhaoui_2015-08_EnglishCV_Kamel_Mahdhaoui_2015-08_English
CV_Kamel_Mahdhaoui_2015-08_EnglishKMAHDHAOUI
 
Fast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationFast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationDenodo
 
Thu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjayThu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjayAjay Shriwastava
 
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Denodo
 
Data Warehousing - in the real world
Data Warehousing - in the real worldData Warehousing - in the real world
Data Warehousing - in the real worldukc4
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesDenodo
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 

Similaire à Data vault (20)

Data Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroData Virtualization: From Zero to Hero
Data Virtualization: From Zero to Hero
 
Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)
 
Is it sensible to use Data Vault at all? Conclusions from a project.
Is it sensible to use Data Vault at all? Conclusions from a project.Is it sensible to use Data Vault at all? Conclusions from a project.
Is it sensible to use Data Vault at all? Conclusions from a project.
 
Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...
Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...
Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...
 
Prague data management meetup #31 2020-01-27
Prague data management meetup #31 2020-01-27Prague data management meetup #31 2020-01-27
Prague data management meetup #31 2020-01-27
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
 
Data warehousing unit 1
Data warehousing unit 1Data warehousing unit 1
Data warehousing unit 1
 
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
 
Data virtualization an introduction
Data virtualization an introductionData virtualization an introduction
Data virtualization an introduction
 
“A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack” “A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack”
 
CV_Kamel_Mahdhaoui_2015-08_English
CV_Kamel_Mahdhaoui_2015-08_EnglishCV_Kamel_Mahdhaoui_2015-08_English
CV_Kamel_Mahdhaoui_2015-08_English
 
CloverETL Provides Data Prep for Tableau
CloverETL Provides Data Prep for TableauCloverETL Provides Data Prep for Tableau
CloverETL Provides Data Prep for Tableau
 
Tamilarasu_Uthirasamy_10Yrs_Resume
Tamilarasu_Uthirasamy_10Yrs_ResumeTamilarasu_Uthirasamy_10Yrs_Resume
Tamilarasu_Uthirasamy_10Yrs_Resume
 
Fast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationFast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow Presentation
 
Thu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjayThu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjay
 
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)
 
Data Warehousing - in the real world
Data Warehousing - in the real worldData Warehousing - in the real world
Data Warehousing - in the real world
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & Bénéfices
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 

Dernier

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Dernier (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

Data vault

  • 2. Dan Linstedt Data modeling All data, all the time Method of design Data Vault
  • 3. Agenda Position Definition Architecture Modeling Methodology Questions? 3 8-12-2010
  • 5. Controllled informationprovisioning Information provisioning DWH 8-12-2010 5
  • 6. Business Intelligence Data warehouse ETL Hardware RDBMS 8-12-2010 6
  • 7. Definition The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. 7 The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. 8-12-2010
  • 10. Uniquely linked set normalized tables 10 8-12-2010
  • 11. Functional areas of business 11 8-12-2010
  • 12. 8-12-2010 12 But there are more aspects…..
  • 19. Agenda Position Definition Architecture Modeling Methodology Questions? 19 8-12-2010
  • 20. Conventional architecture Current Business Demands/Wishes Integration Storage Presentation D W H TRANSFORM S T A G E Business Information Model
  • 21. Modern architecture Integration Storage Presentation Storage Current Business Demands/Wishes S T A G E s o u r c e D W H b u s i n e s s D W H TRANSFORM ALL DATA, ALL THE TIME Current Business Information Model
  • 22. Is geplaatst onder /betreft werkdag Bestelling op Business Information Model Ontvangt /Is geplaatst bij heeft omvang Verplicht tot /Is realisatie van Leverancier Bestaat uit /zit in Leverings condities Is bereid te leveren /kan geleverd worden door Levering Bestaat uit /komt voor in Materiaal soort Voorziet in /wordt in voorzien door werkdag omvang Komt voor in met Moet in voorzien worden voor Wordt ontvangen door /ontvangt Bestaat uit Materiaalbehoefte magazijn Betreft de bereidhied tot het levereren aan een /kan conform worden geleverd aan Magazijn
  • 23. Architecture (detail) 23 8-12-2010 Frond end Patient Datamarts Patient Business Datavault Patient Raw Datavault 1 Raw Datavault 2 Raw Datavault n KNA1 Patient Customer Replicatielaag Bron n Bron 2 Bron 1 KNA1 Customer Patient
  • 24. Architecture (Advanced) Enterprise Service Bus (Biztalk/Cloverleaf/SOA) 24 8-12-2010 Frond end tools Datamarts Datavault Bron n Bron 1 Bron 2
  • 25. Benefits Manage and enforce Compliance (SOX, HIPPA en BASEL II). Reduces Business cycle time. Enabling Master Data management. CMM Level 5 compliant. Repeatable, consistent and redundant. Trace all data back to source systems. Flexibility. Scalability. Consistent. Adaptable. Possible automatic generation of the DDL and ETL. Supports VLDB Designed for EDW 25 8-12-2010
  • 26. Agenda Position Definition Architecture Modeling Methodology Questions? 26 8-12-2010 Patient Treat Satellite Satellite Treatment Link Satellite Hub Hub Satellite Satellite Satellite Satellite
  • 27. Hub 27 8-12-2010 Hub Represents the business key. A surrogate key as the primary key. Load date timestamp (when did it get there?) Record source (where did it come from?) Patient_ID Patient_Key Patient_Code Patient_Name Patient_Desc Patient_Category Patient_SubCategory Patient_Address Patient_Gender Patient_Code Load_Date Record_Source Hub_Patient Patient
  • 28. Satellite 28 8-12-2010 Satellite Descriptive items of a hub or a link A surrogate key as the primary key. Load date timestamp (when did it get there?) Record source (where did it come from?) Patient_Key Load_Date Patient_ID Patient_Key Load_Date Patient_Key Load_Date Patient_Code Patient_Name Patient_Desc Patient_Category Patient_SubCategory Patient_Address Patient_Gender Patient_Name Patient_Desc Patient_Category Patient_SubCategory Patient_Address Patient_Gender Patient_Name Patient_Desc Patient_Address Patient_Gender Patient_Category Patient_SubCategory SAT_Patient SAT_PatientCategory SAT_Patient Patient
  • 29. Link Links two or more hubs Own surogate key. Keys from the hub Load date time stamp Record source 29 8-12-2010 Link Patient_Key Treat_Key Treatment_Key Hub_Patient Patient_Key Treat_Key Load_Date Record_Source Patient_Code Load_Date Record_Source Treat_Code Load_Date Record_Source Hub_Treat Link_Treatment
  • 30. Bron datamodel 30 8-12-2010
  • 31. Analyse datamodel 31 8-12-2010
  • 33. 8-12-2010 33 Datavault Point in Time views (PIT). ‘truth’ at a certain moment. Helper table? Bridge. Same as Point in Time but then a range.

Notes de l'éditeur

  1. Kern punten :Data Vault schema vergelijkbaar met eenneuralenetwerk.Neuronen,dendriten en synapses.Worden gemaakt en vernietigdwanneerditnodig is (vawegerelaties die ontstaan of ernietmeerzijn)Neuronenzijn Hubs en Hub SatellietenLinks zijn de dendritesAndere links zijn de synapses (vectors in the opposite direction). Conclusie:
  2. Compliance AuditabilityFlexibilityTraceabilityDDL and ETL generated.
  3. Kern punten :Conclusie:
  4. DWH is gereedschapkistvoor BIFinancieeldirecteur is nietgeinteresseerd in ETL
  5. Kern punten :Spreek voor zich.Conclusie:
  6. Kern punten :Lowest granularity.Atomic level.No aggregation.Details omdat je business rules op nieuw kunnen genereren als de inzichten in een organisatie kan veranderen.Als we het niet doen en je laad data geaggregeerd dan mis detail informatie.Conclusie:
  7. Kern punten :LineageConclusie:
  8. Kern punten :Spreek voor zich.Conclusie:
  9. Kern punten :Spreek voor zich.Conclusie:
  10. Kern punten :Spreek voor zich.Conclusie:
  11. Kern punten :Alle data moet traceerbaar zijn.Conclusie:
  12. Near real time dataOperational datawarehouse
  13. Kern punten :Conclusie:
  14. Information model close to the business.When information model close to the source systems you need to modify or rewrite complete ETL, DDL, etc.
  15. Kern punten :Naamgeving business vault voor business herkenbaar.Vraaggestuurd. Alleenelementen die gebruiktwordenvolgens businessBusiness key integratie (unieke business keys) (overeenkomstige business keys).Geendirecterapporten op de Raw datavault en Business datavault.Conclusie:
  16. Kern punten :Conclusie:
  17. Kern punten :Conclusie:
  18. Kern punten :Elegante modelleer techniek met een minimum van een aantal componenten: Hub, Link en Satellite.Hub representing the primary key. The Link Entities provide transaction integration between the Hubs. The Satellite Entities provide the context of the Hub primary key. Conclusie:
  19. Kern punten :Spreek voor zich.Conclusie:
  20. Kern punten :Historisch perpectiefChanging over timeHieruit kunnen we allerlei dimensies opbouwen met TYPE 1, 2 of 3Mogelijk om Load date time stamp, load end date time stamp en record source toe te voegen.Voor elke rij in de hub een satellite record. Waarom? Vanwege inner joining.Conclusie:
  21. Kern punten :Een patient wordt op een bepaald moment behandeldAls er meer informatie bij een behandeling hoort dan moet er een extra satellite bij de link tabel worden opgenomen.Het is mogelijkomelke hub, satellite en satellites parallel telaten laden.Hoge mate van parallelismemogelijk.Conclusie:
  22. Kern punten :Spreek voor zich.Conclusie:
  23. Kern punten :Spreek voor zich.Conclusie:
  24. Kern punten :Spreek voor zich.Conclusie: