SlideShare une entreprise Scribd logo
1  sur  34
The factory and the workshopOpen source metadata driven data warehousing Johannes van den Bosch @johannesvdb
Agenda Background Organization Project Basicarchitecture The factory …and the workshop Topics fromyesterday Real time Business keyintegration Staging out Hierarchies and supertypes Self service BI and writeback Lessonslearnt
Waterschap De Dommel Dutch Waterboard in south of Netherlands Managing water quality and quantity for 900.000 citizens and 150.000 hectares 375 employees of which 2 BI Projectsmanagedon time, money and goals Demandforintegrated management information
Project Current BI architecturereachedlimits New greenfieldarchitecture Open source advocate Passionatebeliever in costeffectivesolutionsforgovernment It’s our money! Convinced management No software cost Internalhours (2x 0.3 FTE) 1 year
Open source software ETL Pentaho Data Integration (Kettle) Data warehouse management Quipu Documentation MediaWiki Modeling Power*Architect
EDW architecture Reporting Analysis Dashboards Data mart 1 Data mart n Business Data Vault Source Data Vault 1 Source Data Vault 2 Source Data Vaultn Supplydriven Demanddriven Generated and automated Staging 1 Staging 2 Stagingn Source 1 Source 2 Sourcen
Developmentapproach supply driven effort demand driven time
Plant, productionlines
The factory
The factory Source Staging Source Data vault ETL ETL
HowQuipuworks Source model Target model Template Load code (ETL)
1. Loadsource model
2. Generatestaging DDL
Notcross-platform! …I want my ETL tool..! 3. Generatestaging ETL Default ETL: INSERT INTO staging_table SELECT fields FROM source_table
4. Generatesource data vault model
4. Generatesource data vault DDL
5. Generatesource data vault ETL
Starting up the factory
The workshop
PoC Decided to try and build the bDV and Data Marts 100% virtual bDV = views on top of sDV Data marts = views on top of bDV Conclusion: it is possible
Functionalcomponents History Integration Transformation applyingsemantics, filtering, etc.
bDV design decisions: full bDV Source data vault Business data vault H T H I H
Integrationstrategies 1) same-as link H L H S S 2) integratedhub S H S 3) integratedhub + integratedsatellite H S
Integration: hubs Source Source data vault Business data vault person employee_h employee_h_s person_h person_h_s System x Users Users_h Users_h_s System y
Hubintegration – virtual System x BK1 System y BK2 New BK Integration business rule
Transformation: supertype example sDV Business rule bDV 5___ P______ 4____
Transformation: hierarchyexample sDV bDV
Data marts: virtual (simpleexample) Dimensionfromhub + sat Factfrom link + sat SCD type 1
Virtual: lineage
bDV design decisions: partialbDV Source data vault Business data vault T H I H T H T
Full bDVvspartialbDV Full Lots of elements to define Easy data marts Partial Lesswork More T between data vault and data marts Multiple versions of the truth
VirtualvsPhysical Virtual (views) No physicalmaintenance Easy to adapt Performance limitations Platform defines performance Lineage (dependingon platform) Real time Auditability? Physical Scalability / performance Manualtweaking (indexes, etc.) Surrogatekeys easy More intuitive to develop (ETL in stead of SQL) More complex transformations (ie. aggregations)
Self service BI and write back Palofor Excel Open source MOLAP Everycellpoints to location in the cube Writeback to cubepossible EDW cube excel
Lessonslearnt Itispossible to quicklybuildan EDW with open source software Somereallycooldevelopments (ie. data mart generation) Automationonlygoessofar Somechallengesstillneed to beaddressed …it is business intelligenceafter all. Automate, ifitsavesyou money Itcan save you time to focus on the important stuff The end product counts: does itdeliveraddedvalue? What’s the best EDW architecture?  Itdepends!™

Contenu connexe

En vedette

Bi Themadag 2011
Bi Themadag 2011Bi Themadag 2011
Bi Themadag 2011
johannesvdb
 
Data Warehouse automation in de praktijk (Heliview 30 1-2014)
Data Warehouse automation in de praktijk (Heliview 30 1-2014)Data Warehouse automation in de praktijk (Heliview 30 1-2014)
Data Warehouse automation in de praktijk (Heliview 30 1-2014)
johannesvdb
 
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data PresentationAgile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Vishal Kumar
 

En vedette (20)

Bi Themadag 2011
Bi Themadag 2011Bi Themadag 2011
Bi Themadag 2011
 
Data Vault ReConnect Speed Presenting PM Part Four
Data Vault ReConnect Speed Presenting PM Part FourData Vault ReConnect Speed Presenting PM Part Four
Data Vault ReConnect Speed Presenting PM Part Four
 
Data Warehouse Agility Array Conference2011
Data Warehouse Agility Array Conference2011Data Warehouse Agility Array Conference2011
Data Warehouse Agility Array Conference2011
 
Data Vault ReConnect Speed Presenting AM Part One
Data Vault ReConnect Speed Presenting AM Part OneData Vault ReConnect Speed Presenting AM Part One
Data Vault ReConnect Speed Presenting AM Part One
 
Lean Data Warehouse via Data Vault
Lean Data Warehouse via Data VaultLean Data Warehouse via Data Vault
Lean Data Warehouse via Data Vault
 
Guru4Pro Data Vault Best Practices
Guru4Pro Data Vault Best PracticesGuru4Pro Data Vault Best Practices
Guru4Pro Data Vault Best Practices
 
Metadaten und Data Vault (Meta Vault)
Metadaten und Data Vault (Meta Vault)Metadaten und Data Vault (Meta Vault)
Metadaten und Data Vault (Meta Vault)
 
Data Warehouse automation in de praktijk (Heliview 30 1-2014)
Data Warehouse automation in de praktijk (Heliview 30 1-2014)Data Warehouse automation in de praktijk (Heliview 30 1-2014)
Data Warehouse automation in de praktijk (Heliview 30 1-2014)
 
Tectrade Magazine 4
Tectrade Magazine 4Tectrade Magazine 4
Tectrade Magazine 4
 
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
 
Data Vault and DW2.0
Data Vault and DW2.0Data Vault and DW2.0
Data Vault and DW2.0
 
Data Vault ReConnect Speed Presenting AM Part Two
Data Vault ReConnect Speed Presenting AM Part TwoData Vault ReConnect Speed Presenting AM Part Two
Data Vault ReConnect Speed Presenting AM Part Two
 
Data Vault ReConnect Speed Presenting PM Part Three
Data Vault ReConnect Speed Presenting PM Part ThreeData Vault ReConnect Speed Presenting PM Part Three
Data Vault ReConnect Speed Presenting PM Part Three
 
Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)
 
Agile BI via Data Vault and Modelstorming
Agile BI via Data Vault and ModelstormingAgile BI via Data Vault and Modelstorming
Agile BI via Data Vault and Modelstorming
 
Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
 
CDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
CDC und Data Vault für den Aufbau eines DWH in der AutomobilindustrieCDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
CDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
 
Data Vault: Data Warehouse Design Goes Agile
Data Vault: Data Warehouse Design Goes AgileData Vault: Data Warehouse Design Goes Agile
Data Vault: Data Warehouse Design Goes Agile
 
Big Data Modeling
Big Data ModelingBig Data Modeling
Big Data Modeling
 
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data PresentationAgile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
 

Similaire à Data vault seminar May 5-6 Dommel - The factory and the workshop

Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 

Similaire à Data vault seminar May 5-6 Dommel - The factory and the workshop (20)

Traditional data word
Traditional data wordTraditional data word
Traditional data word
 
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
 
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - ExcercisesAgile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
 
IoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoTIoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoT
 
Agile Testing Days 2017 Introducing AgileBI Sustainably
Agile Testing Days 2017 Introducing AgileBI SustainablyAgile Testing Days 2017 Introducing AgileBI Sustainably
Agile Testing Days 2017 Introducing AgileBI Sustainably
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
 
Big Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of AnalyticsBig Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of Analytics
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Data Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsData Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platforms
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
Presentation by Erik van der Hoeven (Wisdom as a Service) at the Data Vault M...
Presentation by Erik van der Hoeven (Wisdom as a Service) at the Data Vault M...Presentation by Erik van der Hoeven (Wisdom as a Service) at the Data Vault M...
Presentation by Erik van der Hoeven (Wisdom as a Service) at the Data Vault M...
 
3dw
3dw3dw
3dw
 
HIPAS UCP HSP Openstack Sascha Oehl
HIPAS UCP HSP Openstack Sascha OehlHIPAS UCP HSP Openstack Sascha Oehl
HIPAS UCP HSP Openstack Sascha Oehl
 
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezHadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to Tez
 
"Building Data Warehouse with Google Cloud Platform", Artem Nikulchenko
"Building Data Warehouse with Google Cloud Platform",  Artem Nikulchenko"Building Data Warehouse with Google Cloud Platform",  Artem Nikulchenko
"Building Data Warehouse with Google Cloud Platform", Artem Nikulchenko
 
3dw
3dw3dw
3dw
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

Data vault seminar May 5-6 Dommel - The factory and the workshop