SlideShare une entreprise Scribd logo
1  sur  16
The difference between a Data Lake and a
Data Vault is the difference between a
stethoscope and a radar
• A Data Lake reinforces what you already know
• A Data Lake provides weak support for strategic
decisions
• Data Lakes encourage a silo mentality
• Data Lakes can show the ‘what’
• Data Vaults help with the ‘why’
• Data Lakes enable drill down
• Data Vaults encourage drill across
Data Lake vs Vault Summary
The hunt for business
signals
What do we do?
Signal Processing or Data
Processing?
• Signals start conversations
• Signals move boardrooms
• Signals release IT expenditure
• Signal variety, reliability and context
are key business drivers
• Data Processing ends
conversations!
Signal Processing is the
customer of Data
Integration & Warehousing
Signal
Processing
Business
Intelligence
Artificial
Intelligence
Reporting Analytics
Spreadsheets
Dashboards
Sales are down but why?
There are many interpretations of
reality;
• Website broken
• Marketing budget cut
• Campaign poor
• Product price uncompetitive
• New product release
• Company trashed by Trump
• Fashion victim
• Delivery delays and/or cost
• Recession
Signal Processing at Scale
• The Cloud is one massive signal
processor, with limitless
compute power and storage
• The Role of Data Integration in
the cloud is the organisation of
data sets for both efficient and
effective signal processing
• Data Lakes & Vaults have
emerged as key cloud
integration patterns
Data Lakes vs
Data Vaults
Data Lake Evolution
• 2011: Horton Works Forms
• 2012: AWS announces Amazon RedShift
• 2014: Data Lake European on premise
projects take off
• 2015: Snowflake released on AWS
• 2015: Hive and Presto released on AWS
• 2017: AWS Athena released
• 2006: Amazon AWS Launches
• 2008: Yahoo Open Sources Hadoop
• 2009: Cloudera Forms
• 2009: AWS Elastic MapReduce
• 2010 (October): Apache Hive release
• 2010 (October): James Dickson,
CTO Pentaho, coined the term Data Lake
Data Lake Signals are Isolated
• Data Lakes encourage detailed
analysis of a very narrow field
• Thinking across separate data
sources is difficult and inconsistent
• A silo mentality can emerge
• Data Scientists spend their time
hunting for the data lake ontology
• Weak support for strategic
decisions
• Too easy to make bad decisions on
limited data
Data Lake Warning
The danger with Data Lakes is that they encourage
decisions based upon what can be easily measured
Data Lakes are Good for
• Starting EDW projects
• Persistent staging areas
• Feedstock for Data Vaults
• Tactical Analysis
• DWH flexibility
• API Calls/Gateway
• Unstructured log analysis
• Operational Monitoring
Data Vault Evolution
• 1990s: Conceived by Dan Linstedt
• 2000: DV 1.0 Released into public domain
• 2014: DV 2.0 Announced
Data Vault Trends
• Strong tools are emerging for source centric
modelling and model population
• The need for business centric modelling
• Patterns emerging for automation of documentation,
validation and reconciliation
• New Data Warehouse Databases complement data
vaults
• GDPR and & PII are driving the need for ontologies
• S3/Athena as a Data Vault?
Data Vaults are Good for
• EDW projects
• Strategic Analysis
• Feedstock for Cubes and Models
Data Vault Signals are related
through business context
Sales are down and here is the
business context
• Broadens the field of vision and
the scope of questions
• Increases the variety, quality and
strength of signal channels
• Different business perspectives
are supported in a consistent
analysis framework
Leaders need situational
awareness
Data Vaults expose relationships between different
business signals

Contenu connexe

Tendances

Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 

Tendances (20)

Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Data modelling 101
Data modelling 101Data modelling 101
Data modelling 101
 
Gartner: Master Data Management Functionality
Gartner: Master Data Management FunctionalityGartner: Master Data Management Functionality
Gartner: Master Data Management Functionality
 
Data Governance and Metadata Management
Data Governance and Metadata ManagementData Governance and Metadata Management
Data Governance and Metadata Management
 
Data Catalog as a Business Enabler
Data Catalog as a Business EnablerData Catalog as a Business Enabler
Data Catalog as a Business Enabler
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Data Profiling, Data Catalogs and Metadata Harmonisation
Data Profiling, Data Catalogs and Metadata HarmonisationData Profiling, Data Catalogs and Metadata Harmonisation
Data Profiling, Data Catalogs and Metadata Harmonisation
 
Data Catalog as the Platform for Data Intelligence
Data Catalog as the Platform for Data IntelligenceData Catalog as the Platform for Data Intelligence
Data Catalog as the Platform for Data Intelligence
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Master Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and GovernanceMaster Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and Governance
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?
DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?
DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?
 

Similaire à Data Vault Vs Data Lake

Data Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified InsightsData Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified Insights
Denodo
 
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
Denodo
 

Similaire à Data Vault Vs Data Lake (20)

Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Are You Killing the Benefits of Your Data Lake?
Are You Killing the Benefits of Your Data Lake?Are You Killing the Benefits of Your Data Lake?
Are You Killing the Benefits of Your Data Lake?
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
DELWP’s Data Lake: Investing in Asset Wealth for Public/Community Benefit – B...
DELWP’s Data Lake: Investing in Asset Wealth for Public/Community Benefit – B...DELWP’s Data Lake: Investing in Asset Wealth for Public/Community Benefit – B...
DELWP’s Data Lake: Investing in Asset Wealth for Public/Community Benefit – B...
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
Data Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified InsightsData Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified Insights
 
Future of Making Things
Future of Making ThingsFuture of Making Things
Future of Making Things
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
 
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data VirtualizationMyth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
 
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
 
The Great Lakes: How to Approach a Big Data Implementation
The Great Lakes: How to Approach a Big Data ImplementationThe Great Lakes: How to Approach a Big Data Implementation
The Great Lakes: How to Approach a Big Data Implementation
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 

Dernier

In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
cnajjemba
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 

Dernier (20)

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 

Data Vault Vs Data Lake

  • 1. The difference between a Data Lake and a Data Vault is the difference between a stethoscope and a radar • A Data Lake reinforces what you already know • A Data Lake provides weak support for strategic decisions • Data Lakes encourage a silo mentality • Data Lakes can show the ‘what’ • Data Vaults help with the ‘why’ • Data Lakes enable drill down • Data Vaults encourage drill across Data Lake vs Vault Summary
  • 2. The hunt for business signals
  • 3. What do we do? Signal Processing or Data Processing? • Signals start conversations • Signals move boardrooms • Signals release IT expenditure • Signal variety, reliability and context are key business drivers • Data Processing ends conversations!
  • 4. Signal Processing is the customer of Data Integration & Warehousing Signal Processing Business Intelligence Artificial Intelligence Reporting Analytics Spreadsheets Dashboards
  • 5. Sales are down but why? There are many interpretations of reality; • Website broken • Marketing budget cut • Campaign poor • Product price uncompetitive • New product release • Company trashed by Trump • Fashion victim • Delivery delays and/or cost • Recession
  • 6. Signal Processing at Scale • The Cloud is one massive signal processor, with limitless compute power and storage • The Role of Data Integration in the cloud is the organisation of data sets for both efficient and effective signal processing • Data Lakes & Vaults have emerged as key cloud integration patterns
  • 8. Data Lake Evolution • 2011: Horton Works Forms • 2012: AWS announces Amazon RedShift • 2014: Data Lake European on premise projects take off • 2015: Snowflake released on AWS • 2015: Hive and Presto released on AWS • 2017: AWS Athena released • 2006: Amazon AWS Launches • 2008: Yahoo Open Sources Hadoop • 2009: Cloudera Forms • 2009: AWS Elastic MapReduce • 2010 (October): Apache Hive release • 2010 (October): James Dickson, CTO Pentaho, coined the term Data Lake
  • 9. Data Lake Signals are Isolated • Data Lakes encourage detailed analysis of a very narrow field • Thinking across separate data sources is difficult and inconsistent • A silo mentality can emerge • Data Scientists spend their time hunting for the data lake ontology • Weak support for strategic decisions • Too easy to make bad decisions on limited data
  • 10. Data Lake Warning The danger with Data Lakes is that they encourage decisions based upon what can be easily measured
  • 11. Data Lakes are Good for • Starting EDW projects • Persistent staging areas • Feedstock for Data Vaults • Tactical Analysis • DWH flexibility • API Calls/Gateway • Unstructured log analysis • Operational Monitoring
  • 12. Data Vault Evolution • 1990s: Conceived by Dan Linstedt • 2000: DV 1.0 Released into public domain • 2014: DV 2.0 Announced
  • 13. Data Vault Trends • Strong tools are emerging for source centric modelling and model population • The need for business centric modelling • Patterns emerging for automation of documentation, validation and reconciliation • New Data Warehouse Databases complement data vaults • GDPR and & PII are driving the need for ontologies • S3/Athena as a Data Vault?
  • 14. Data Vaults are Good for • EDW projects • Strategic Analysis • Feedstock for Cubes and Models
  • 15. Data Vault Signals are related through business context Sales are down and here is the business context • Broadens the field of vision and the scope of questions • Increases the variety, quality and strength of signal channels • Different business perspectives are supported in a consistent analysis framework
  • 16. Leaders need situational awareness Data Vaults expose relationships between different business signals

Notes de l'éditeur

  1. In the pub, signals open conversations Signals move boardrooms not data How our data integration projects are consumed by the board determines the success/failure We should sell signals not technology Flying blind Yield Curves
  2. Human task
  3. Board can’t take action if blind to obvious signals
  4. 10 years since Yahoo open sourced Hadoop Which came first James Dickson or Hive? Up until Hive, Hadoop was hard, separated compute from storage without analysis 4 years since first data lake iteration…poor
  5. Conformed satellites made from rules
  6. Links Perspectives; sales, finance, marketing, operational