SlideShare une entreprise Scribd logo
1  sur  16
Hadoop Data Lake &
classical Data Warehouse:
How to utilize best of both worlds
2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 1
Speaker & Agenda
1. Introduction to Hybrid Architectures
• Classical Data Warehouses
• Hadoop Data Lakes
• Bringing it all together
2. Use Cases on Hybrid Architectures
2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 2
Kolja Rödel
Manager
T
M
E kolja.roedel@woodmark.de
Woodmark Consulting AG
Am Hochacker 4
85630 Grasbrunn / München
1. Introduction to Hybrid
Architectures
2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 3
Really 2 different things? Or just an implementation detail?
2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 4
HDFS
Streaming
Reservoir
Data Lake Archive
Landing Zone
(Stream-)Export
Data Lake ArchitectureClassical BI/DWH Architecture
Structured Data
Unstructured
Data
Semi-structured
Data
Use
Case
1
Use
Case
2
Staging Area
Data Warehouse
Structured Data
Data
Mart 1
Data
Mart n
Data
Mart 2
Use
Case
n
Databasetables
> 10 Terabyte< 10 Terabyte
Data Warehouses at heart of the traditional BI landscape
• Def. Bill Inmon: “a subject-oriented, integrated,
time-variant, non-volatile collection of data in
support of management’s decision-making process“
• Def. Ralph Kimball: “a copy of transaction data
specifically structured for querying and reporting“
Architectural concept:
• Global view on heterogeneous & distributed data
• Layered architecture for distinct purposes
• Integration of source data for consistency:
Single Point of Truth
• Elaborate data model (3NF, Dimensional, Data
• Support of deeper analyses (like time-series)
• Aggregation of KPIs for efficient usage
• Preparation of application-specific data extracts
2018
Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja
Rödel
5
ETL:
transform
Staging
Area
Core Data
Warehouse
Data Marts
Data
Sources
Reporting
& Analysis
ETL:
versionize
clean &
integrate
ETL:
decouple
Query:
filter
Hadoop Data Lakes to face new challeges
Challenges:
• Data growth  explosion (3V)
• All data has potential value for future Use Cases:
no selective archiving & “no” deleting!
• Data cleansing to extract business value
• Retain transparency through Data Governance
Architectural concept:
• Central platform for collecting, processing and
large volumes of multi-structured data
• Layered architecture
• Arrival of raw data (1:1 copy)
• Persistent storage of cleansed, normalized data
• Use Case oriented, filtered data contexts
• No strict data modelling, no transactions (ACID)
• Advanced processing like Streaming & Machine
2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 6Streaming
Reservoir
Data Lake
Archive
(highly compressed,
lower replication
factor)
Landing Zone
(Stream-)Export
Structured Data
Unstructured
Data
Semi-structured
Data
> 10 Terabyte
Apart from implementation, we see a paradigm change:
2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 7
Hadoop Data Lake
Collect all (types of) data
Data-Driven-Business
Schema-On-Read
Data as an enterprise asset
Data Lake Data is loaded in raw
format to the Data
Lake…
… and are selected
and organized with
respect to the Use
Case
Classical BI/DWH
Minimal storage allocation
Hypothesis / Application-Driven-Business
Schema-On-Write
Data as a side product of processes
Data Warehouse
Data is cleansed and
integrated into a consistent
schema before loading to the
DWH…
… and analyses are
executed directly on the
DWH
A Hybrid Architecture unites the beneficial features:
2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 8
Hybrid Architecture
• Processing of structured &
unstructured data
• Parallel processing of large
amounts of data in real-time
• Hypothesis- and Data-driven
analyses
• Highly integrated core data
Hadoop
Data LakeData
Warehouse
Hybrid BI & Big Data Reference Architecture
2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 9
Data Lake DWH
Landing Zone Raw Data Standardized
Data
Business Data Use Case
Data
Provisioning Standardizing
Customizing
Integration Interface Data
Customizing
Portal
SourceSystems
Business
Ready
Speed Layer
Raw Access Early Access
Business
Ready
Provisioning
Archive
Ingestion
Reservoir
Data
Customizing
Why still keep the Data Warehouse?
2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 10
Protection of investments
• Proven technology for reliable results (e.g. reporting)
• Employees‘ skills
• Existing analyses, reports & applications
Query performance for relational Use Cases
• Indexes & Hints
• Mature optimizer
Quality data  stability
• Schema-on-write: deliberate data modelling
• Defined data types
• Transaction concept (all or nothing)
• Contraints: uniqueness (PK), reference (FK), required attributes (nullable), …
2. Use Cases on Hybrid
Architectures
2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 11
Typical IT Use Cases around a manufacturing plant:
2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 12
Long-term archiving
of factory log data
Reducing production
errors via log analysis
Product
customization using a
recommender
Monthly report on
sales numbers
Production
optimization through
self-service analysis
Customer satisfaction
measurement through
Sentiment Analysis
Immediate alerting
based on sensor
streaming
Master Data
Management (MDM)
Effective cross-selling
driven by campaign
management (360°)
Support of customer
service by predictive
maintenance
Criteria for Use Case placement
2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 13
Data Variety
Data Volume
Data Velocity
Response time
Information
Consistency
Algorithmical
Complexity
Bringing the Use Cases to the Reference Architecture
2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 14
Data Lake DWH
Landing Zone Stand.
Data
Business Data Use Case
Data
Reservoir
Data
Interface Data
Portal
Source
Systems
Speed Layer
Raw Data
Archive
Predictive
maintenance
Archiving
log data
Log analysis
Recommender
Monthly report
Self-service
analysis
Sentiment
Analysis
Sensor
streaming
MDM
Campaign
management
Conclusion: How to utilize best of a Hybrid Architecture
• Data Warehouses and Data Lakes follow different paradigms and have differents strengths.
• They complement rather than replace each other.
• Hybrid Architectures allow to address various Use Cases:
2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 15
Use Case Component Layer
EfficientArchiving Data Lake Raw data archive
Streaming of sensor data Data Lake Speed layer
Machine Learning Data Lake Reservoir
Master Data Management DataWarehouse Core layer
Self-ServiceAnalysis and
Standard Reporting
DataWarehouse Portal applications
on Datamarts
Questions, answers & discussion
Thanks for joining!
2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 16

Contenu connexe

Tendances

Tendances (20)

Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
HDInsight for Architects
HDInsight for ArchitectsHDInsight for Architects
HDInsight for Architects
 
data warehouse vs data lake
data warehouse vs data lakedata warehouse vs data lake
data warehouse vs data lake
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
 
Should I move my database to the cloud?
Should I move my database to the cloud?Should I move my database to the cloud?
Should I move my database to the cloud?
 
Get Savvy with Snowflake
Get Savvy with SnowflakeGet Savvy with Snowflake
Get Savvy with Snowflake
 
Customer-Centric Data Management for Better Customer Experiences
Customer-Centric Data Management for Better Customer ExperiencesCustomer-Centric Data Management for Better Customer Experiences
Customer-Centric Data Management for Better Customer Experiences
 
Azure Synapse Analytics
Azure Synapse AnalyticsAzure Synapse Analytics
Azure Synapse Analytics
 
New Analytic Uses of Master Data Management in the Enterprise
New Analytic Uses of Master Data Management in the EnterpriseNew Analytic Uses of Master Data Management in the Enterprise
New Analytic Uses of Master Data Management in the Enterprise
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as Product
 
Elastic Data Warehousing
Elastic Data WarehousingElastic Data Warehousing
Elastic Data Warehousing
 
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
 
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaAzure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
 
Sizing sap s 4 hana using the quick sizer tool
Sizing sap s 4 hana using the quick sizer toolSizing sap s 4 hana using the quick sizer tool
Sizing sap s 4 hana using the quick sizer tool
 

Similaire à Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both worlds - by Kolja Roedel

Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptxBuilding Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
thando80
 

Similaire à Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both worlds - by Kolja Roedel (20)

Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Are You Killing the Benefits of Your Data Lake?
Are You Killing the Benefits of Your Data Lake?Are You Killing the Benefits of Your Data Lake?
Are You Killing the Benefits of Your Data Lake?
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
Has Your Data Gone Rogue?
Has Your Data Gone Rogue?Has Your Data Gone Rogue?
Has Your Data Gone Rogue?
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
TDC2016SP - Trilha NoSQL
TDC2016SP - Trilha NoSQLTDC2016SP - Trilha NoSQL
TDC2016SP - Trilha NoSQL
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptxBuilding Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 
Turning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformTurning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data Platform
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
 
"Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wo...
"Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wo..."Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wo...
"Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wo...
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine Learning
 

Dernier

Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 

Dernier (20)

Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 

Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both worlds - by Kolja Roedel

  • 1. Hadoop Data Lake & classical Data Warehouse: How to utilize best of both worlds 2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 1
  • 2. Speaker & Agenda 1. Introduction to Hybrid Architectures • Classical Data Warehouses • Hadoop Data Lakes • Bringing it all together 2. Use Cases on Hybrid Architectures 2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 2 Kolja Rödel Manager T M E kolja.roedel@woodmark.de Woodmark Consulting AG Am Hochacker 4 85630 Grasbrunn / München
  • 3. 1. Introduction to Hybrid Architectures 2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 3
  • 4. Really 2 different things? Or just an implementation detail? 2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 4 HDFS Streaming Reservoir Data Lake Archive Landing Zone (Stream-)Export Data Lake ArchitectureClassical BI/DWH Architecture Structured Data Unstructured Data Semi-structured Data Use Case 1 Use Case 2 Staging Area Data Warehouse Structured Data Data Mart 1 Data Mart n Data Mart 2 Use Case n Databasetables > 10 Terabyte< 10 Terabyte
  • 5. Data Warehouses at heart of the traditional BI landscape • Def. Bill Inmon: “a subject-oriented, integrated, time-variant, non-volatile collection of data in support of management’s decision-making process“ • Def. Ralph Kimball: “a copy of transaction data specifically structured for querying and reporting“ Architectural concept: • Global view on heterogeneous & distributed data • Layered architecture for distinct purposes • Integration of source data for consistency: Single Point of Truth • Elaborate data model (3NF, Dimensional, Data • Support of deeper analyses (like time-series) • Aggregation of KPIs for efficient usage • Preparation of application-specific data extracts 2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 5 ETL: transform Staging Area Core Data Warehouse Data Marts Data Sources Reporting & Analysis ETL: versionize clean & integrate ETL: decouple Query: filter
  • 6. Hadoop Data Lakes to face new challeges Challenges: • Data growth  explosion (3V) • All data has potential value for future Use Cases: no selective archiving & “no” deleting! • Data cleansing to extract business value • Retain transparency through Data Governance Architectural concept: • Central platform for collecting, processing and large volumes of multi-structured data • Layered architecture • Arrival of raw data (1:1 copy) • Persistent storage of cleansed, normalized data • Use Case oriented, filtered data contexts • No strict data modelling, no transactions (ACID) • Advanced processing like Streaming & Machine 2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 6Streaming Reservoir Data Lake Archive (highly compressed, lower replication factor) Landing Zone (Stream-)Export Structured Data Unstructured Data Semi-structured Data > 10 Terabyte
  • 7. Apart from implementation, we see a paradigm change: 2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 7 Hadoop Data Lake Collect all (types of) data Data-Driven-Business Schema-On-Read Data as an enterprise asset Data Lake Data is loaded in raw format to the Data Lake… … and are selected and organized with respect to the Use Case Classical BI/DWH Minimal storage allocation Hypothesis / Application-Driven-Business Schema-On-Write Data as a side product of processes Data Warehouse Data is cleansed and integrated into a consistent schema before loading to the DWH… … and analyses are executed directly on the DWH
  • 8. A Hybrid Architecture unites the beneficial features: 2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 8 Hybrid Architecture • Processing of structured & unstructured data • Parallel processing of large amounts of data in real-time • Hypothesis- and Data-driven analyses • Highly integrated core data Hadoop Data LakeData Warehouse
  • 9. Hybrid BI & Big Data Reference Architecture 2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 9 Data Lake DWH Landing Zone Raw Data Standardized Data Business Data Use Case Data Provisioning Standardizing Customizing Integration Interface Data Customizing Portal SourceSystems Business Ready Speed Layer Raw Access Early Access Business Ready Provisioning Archive Ingestion Reservoir Data Customizing
  • 10. Why still keep the Data Warehouse? 2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 10 Protection of investments • Proven technology for reliable results (e.g. reporting) • Employees‘ skills • Existing analyses, reports & applications Query performance for relational Use Cases • Indexes & Hints • Mature optimizer Quality data  stability • Schema-on-write: deliberate data modelling • Defined data types • Transaction concept (all or nothing) • Contraints: uniqueness (PK), reference (FK), required attributes (nullable), …
  • 11. 2. Use Cases on Hybrid Architectures 2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 11
  • 12. Typical IT Use Cases around a manufacturing plant: 2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 12 Long-term archiving of factory log data Reducing production errors via log analysis Product customization using a recommender Monthly report on sales numbers Production optimization through self-service analysis Customer satisfaction measurement through Sentiment Analysis Immediate alerting based on sensor streaming Master Data Management (MDM) Effective cross-selling driven by campaign management (360°) Support of customer service by predictive maintenance
  • 13. Criteria for Use Case placement 2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 13 Data Variety Data Volume Data Velocity Response time Information Consistency Algorithmical Complexity
  • 14. Bringing the Use Cases to the Reference Architecture 2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 14 Data Lake DWH Landing Zone Stand. Data Business Data Use Case Data Reservoir Data Interface Data Portal Source Systems Speed Layer Raw Data Archive Predictive maintenance Archiving log data Log analysis Recommender Monthly report Self-service analysis Sentiment Analysis Sensor streaming MDM Campaign management
  • 15. Conclusion: How to utilize best of a Hybrid Architecture • Data Warehouses and Data Lakes follow different paradigms and have differents strengths. • They complement rather than replace each other. • Hybrid Architectures allow to address various Use Cases: 2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 15 Use Case Component Layer EfficientArchiving Data Lake Raw data archive Streaming of sensor data Data Lake Speed layer Machine Learning Data Lake Reservoir Master Data Management DataWarehouse Core layer Self-ServiceAnalysis and Standard Reporting DataWarehouse Portal applications on Datamarts
  • 16. Questions, answers & discussion Thanks for joining! 2018 Hadoop Data Lake & classical DWH: Best of both worlds • © Copyright Woodmark Consulting AG • Kolja Rödel 16

Notes de l'éditeur

  1. Warm welcome from my side Today look at an architectural topic, no fancy library
  2. At first: basics and bring them together Whereas in the second part: Use Cases. Of course only samples, you will find further. But all of them mentioned frequently, so regarded as relevant Different Use Cases joint in our architecture: project example
  3. Inmon: subject-oriented: selection of loaded data with regard to needed KPIs, not with regard to the operative processes integrated (Vereinheitlichung): Die in (operativen) Quellsystemen unterschiedlich strukturierten Daten werden im DWH in einheitlicher Form gespeichert. time-variant: enables analyses with time-reference (developments), daher ist die langfristige Speicherung der Daten im DWH nötig nonvolatile (Beständigkeit): Daten werden dauerhaft (nicht-flüchtig) gespeichert. OLTP: Transaction-orientated, for executive “Daily Business“ OLAP: Analysis-oriented, often aggregated, for evaluation and strategy DWH = Focus on OLAP rather than OLTP (unlike ODS) Applications: Ad-hoc reporting: Analyzing the data at detail level, usually for specialists Standard Reporting (including Balanced Scorecarding): Usually only partly interactive, distributed to business and clients, well formatted Management Dashboards: Usually highly aggregated figures, typically equipped with trafficlights, and trends Regulatory Reporting: Reports and interfaces to official authorities Corporate planning: Based ond historic data and plannng models Recent development More data Faster data (e.g. sensors)  also requirements (e.g. KPIs football game)
  4. Business transformation: modelling, data types, aggregation vs. cheap storage (raw format) and data asset Check your assumption vs. interpret correlations Static schema requires data transformation and manual adaption vs. working with changing schema (-> unknown use cases) Data is the New Oil
  5. Hybride Architekturen integrieren Verarbeitung und Speicherung strukturierter und unstrukturierter Daten, paralleler und redundanter Verarbeitung großer Datenmengen und datengetriebenen Analysen ermöglichen eine schnelle Reaktion auf neue/geänderte Anforderungen Zugriff auf Daten in unterschiedlichen Verarbeitungsschritten (roh, standard, integriert, spezialisiert) Explizite Unterstützung von DevOps (Continuous Integration, Continuous Deployment) Schemalose Speicherung Cloud-basierte Infrastruktur skalieren intelligente Datenreplikation ist im Hadoop-Dateisystem (HDFS) eine Kernfunktionalität. Als Dateisystem kann HDFS ohne weiteres über 1000 Rechner und mehrere Petabyte skalieren. integrieren neue Datentypen (Social Media, Logfiles…) schützen bestehendes Investment Integration bestehender Strukturen (DWH)
  6. Integration von neuen Technologien und bestehende BI Architektur Bewährte Technologie für verlässliches Reporting Machine Learning Streaming Effiziente Speicherung von großen Datenmengen Einbindung von Social Media, Logfiles und anderer neuer Datentypen Schnellere Zugriffszeiten auf die Daten für den Fachbereich Explorative Datenanalyse um ungenutztes Potential nutzbar zu machen Use Case-spezifische Datenaufbereitung
  7. Further use cases: Machine Learning: Pricing, Optimization (Travelling Salesman) Streaming: computing realtime KPIs Predictive Maintenance