SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
Building a Data Lake
An App Dev’s Perspective
GeekNight Hyderabad - March 8th 2017
Geetha Balasundaram
geethab@thoughtworks.com
© 2017 ThoughtWorks Technologies Pvt. Limited
ABOUT ME
Developer @ ThoughtWorks
Building a data lake in the enterprise ecosystem
Helping a retail business make sense of it ( data guided org )
Been part of web development space ( enterprise rewrite )
Equally startled like everyone else by the data engineering space
Share know-how’s and do-how’s from our team’s experience
snithish@thoughtworks.com
© 2017 ThoughtWorks Technologies Pvt. Limited
AGENDA
What is data in the true sense…
Data Warehouse in an enterprise ecosystem...
What is a data lake...
Data lake implementation in an enterprise ecosystem…
How to make effective use of a data lake: technology+process+people
Cluster Administration tool - Cloudera Manager
Pitfalls to avoid
© 2017 ThoughtWorks Technologies Pvt. Limited
Question ???
How did R.Ashwin perform in the last
Test match?
HIGH LEVEL
PROBLEM STATEMENT
© 2017 ThoughtWorks Technologies Pvt. Limited
COMPLEX HISTORICAL DATA
Why?
Exploit and derive as much new insights as possible
Match Made
Enterprise systems produce this nature of complexity
© 2017 ThoughtWorks Technologies Pvt. Limited
DATA WAREHOUSE
https://martinfowler.com/articles/microservices.html
ETL
© 2017 ThoughtWorks Technologies Pvt. Limited
DID MICROSERVICES CAUSE THIS PROBLEM ?
Decentralised Data
https://martinfowler.com/articles/microservices.html
© 2017 ThoughtWorks Technologies Pvt. Limited
MICROSERVICES HELPED
Break down business unit
Break down complexity
Understand the nature of data
© 2017 ThoughtWorks Technologies Pvt. Limited
Question ???
R.Ashwin performed well ( 6/41 ) in yesterday’s match!
Complex historical data can quantify how well he has performed
Can we say why did he do well in this particular match?
What factors affected his enhanced performance?
© 2017 ThoughtWorks Technologies Pvt. Limited
FACT is a FACT
… even when we don’t know how it can be used
© 2017 ThoughtWorks Technologies Pvt. Limited
KEY DIFFERENCE
https://martinfowler.com/bliki/DataLake.html
© 2017 ThoughtWorks Technologies Pvt. Limited
What is a data lake?
© 2017 ThoughtWorks Technologies Pvt. Limited
LAKE is...
.. a large body of water in a more natural state.
The contents of the lake, stream in from a source to fill the lake,
and various users of the lake can come to examine, dive in, or
take samples
https://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes/
© 2017 ThoughtWorks Technologies Pvt. Limited
DATA LAKE is...
.. a large body of water data facts in a more natural state.
The contents of the lake, stream in from a source to fill the lake,
and various users of the lake can come to examine analyse, dive
in build models, or take samples use subset for specific use
cases
© 2017 ThoughtWorks Technologies Pvt. Limited
KEY DIFFERENCE
https://martinfowler.com/bliki/DataLake.html
© 2017 ThoughtWorks Technologies Pvt. Limited
Implementation
© 2017 ThoughtWorks Technologies Pvt. Limited
OUR IMPLEMENTATION - TECH STACK
DATA SOURCE
DATA INGESTION
DATA LAKE
DATA MARTS
DATA ANALYSIS
Staging / Queue
© 2017 ThoughtWorks Technologies Pvt. Limited
© 2017 ThoughtWorks Technologies Pvt. Limited
How to make effective use of a data lake:
technology+process+people
© 2017 ThoughtWorks Technologies Pvt. Limited
Functionality Vs Reality
I need a feature so that I can do this action…..
to
I need this insight so that I can take this action….
eg : I need a functionality to order items anytime before or during a promotion…
to
..I need to know on time, if I have to order items anytime before or during a promotion…
so that I can improve promotion sales
People
© 2017 ThoughtWorks Technologies Pvt. Limited
Start Simple
There is no data lake yet…
Carve out portions of data which are easy wins yet critical to
arrive at the earlier stated insight..
Set up the infrastructure and pipeline
Get your hands dirty..
eg: Sales is an important factor to analyse / predict anything in retail space..
Technology
© 2017 ThoughtWorks Technologies Pvt. Limited
How much should I know about the data ?
As a consumer of data (read ‘not a consumer of service’)
How much should I know about it?
Schema ⇔ Contracts
Nature of the data versioned vs latest
transactional vs reference
facts vs aggregate
frequency of change
…..
Technology
© 2017 ThoughtWorks Technologies Pvt. Limited
DATA INSIGHT - Part 1
Incrementally add
new data to the
lake
Serve data
for analysis
eg: What data wrt promotions do I need to bring into the datalake ??
Sales → improve promotion sales
Technology
© 2017 ThoughtWorks Technologies Pvt. Limited
DATA INSIGHT - Part 2
Sales + Promotions → improve promotion sales
How does adding more data to the lake help arriving at new insights..?
history of past promotions sales = how much to order for this promotion
history of past promotion sales + ‘X’ = how much to order for this promotion
history of past promotion sales + ‘X’ + ‘Y’ …… = how much to order for this promotion
eg: seasonality has a strong correlation with sales
history of past promotion sales + ‘X’ + ‘Y’ …… + ‘A’ = how much to order for this promotion after the start
People
© 2017 ThoughtWorks Technologies Pvt. Limited
Think Agile
Sales + Promotions + X factor → improve promotion sales
Near perfect list of
parameters
Progressive set of
parameters
Sales + Promotions → is the quantity arrived from these factors (known to business) ordered on time?
Process
© 2017 ThoughtWorks Technologies Pvt. Limited
DataMarts
... as a store of bottled water – cleansed and packaged and
structured for easy consumption
© 2017 ThoughtWorks Technologies Pvt. Limited
DataMarts
... as a store of data subset - curated from meaningful facts
bundled into logical groups for arriving at useful insights
© 2017 ThoughtWorks Technologies Pvt. Limited
Easy Insight
Sales + Promotions →
is the quantity arrived from these factors (known to business) ordered on time?
System : Tells me what is the quantity that is supposed to be ordered
for this promotion..
System : Tells me in realtime what is the quantity that is ordered
Technology
© 2017 ThoughtWorks Technologies Pvt. Limited
Cluster Administration Tool
Cloudera Manager
© 2017 ThoughtWorks Technologies Pvt. Limited
Think DevOps
Scale | Performance | Memory | Resource Contention |
Optimization | Stability |
Need for an ecosystem - to monitor how well the different tools
play together without chaos
Tools
© 2017 ThoughtWorks Technologies Pvt. Limited
QUICK RECAP
What is data in the true sense…
Data Warehouse in an enterprise ecosystem...
What is a data lake...
Data lake implementation in an enterprise ecosystem...
How to make effective use of a data lake…
Cluster Administration tool - Cloudera Manager
© 2017 ThoughtWorks Technologies Pvt. Limited
PITFALLS TO AVOID
Data envy - Ref:https://martinfowler.com/bliki/Datensparsamkeit.html
Tool envy
Reliable data is a luxury
Understanding the nature of data is a must
Dialogue with the data scientist
Treating the data lake like a RDBMS
Keeping the business involved
Data flow state visibility
© 2017 ThoughtWorks Technologies Pvt. Limited

Contenu connexe

Tendances

Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesDataWorks Summit
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMark Kromer
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefitsRicky Barron
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseStephen Alex
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...NoSQLmatters
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
 
The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.Richard Vermillion
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 
Data Lake Architecture
Data Lake ArchitectureData Lake Architecture
Data Lake ArchitectureDATAVERSITY
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digitalsambiswal
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionDataWorks Summit
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paperSupratim Ray
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHumza Naseer
 
Data Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management RequirementsData Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management RequirementsSnapLogic
 
Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher   Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher Tamir Dresher
 

Tendances (20)

Datalake Architecture
Datalake ArchitectureDatalake Architecture
Datalake Architecture
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data Architectures
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
Data Lake Architecture
Data Lake ArchitectureData Lake Architecture
Data Lake Architecture
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digital
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paper
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing Architectures
 
Data Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management RequirementsData Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management Requirements
 
Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher   Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 

En vedette

GeekNight 22.0 Multi-paradigm programming in Scala and Akka
GeekNight 22.0 Multi-paradigm programming in Scala and AkkaGeekNight 22.0 Multi-paradigm programming in Scala and Akka
GeekNight 22.0 Multi-paradigm programming in Scala and AkkaGeekNightHyderabad
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 
Understanding the Intelligent Cloud
Understanding the Intelligent CloudUnderstanding the Intelligent Cloud
Understanding the Intelligent CloudGeekNightHyderabad
 
Planning and Optimizing Data Lake Architecture - Milos Milovanovic
 Planning and Optimizing Data Lake Architecture - Milos Milovanovic Planning and Optimizing Data Lake Architecture - Milos Milovanovic
Planning and Optimizing Data Lake Architecture - Milos MilovanovicInstitute of Contemporary Sciences
 
The Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThomas Kelly, PMP
 
Geek Night 16.0 - Evolution of Programming Languages
Geek Night 16.0 - Evolution of Programming LanguagesGeek Night 16.0 - Evolution of Programming Languages
Geek Night 16.0 - Evolution of Programming LanguagesGeekNightHyderabad
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Hakka Labs
 
AddReality company overview
AddReality company overviewAddReality company overview
AddReality company overviewAddReality
 
Automate Hadoop Cluster Deployment in a Banking Ecosystem
Automate Hadoop Cluster Deployment in a Banking EcosystemAutomate Hadoop Cluster Deployment in a Banking Ecosystem
Automate Hadoop Cluster Deployment in a Banking EcosystemHellmar Becker
 
Hadoop con 2015 hadoop enables enterprise data lake
Hadoop con 2015   hadoop enables enterprise data lakeHadoop con 2015   hadoop enables enterprise data lake
Hadoop con 2015 hadoop enables enterprise data lakeJames Chen
 
Big Data from Small Places
Big Data from Small PlacesBig Data from Small Places
Big Data from Small PlacesInitial State
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InSnapLogic
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016StampedeCon
 
Talend winter 2017 overview webinar
Talend winter 2017 overview webinarTalend winter 2017 overview webinar
Talend winter 2017 overview webinarJean-Michel Franco
 
Présentation de Talend Winter 2017
Présentation de Talend Winter 2017 Présentation de Talend Winter 2017
Présentation de Talend Winter 2017 Jean-Michel Franco
 
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Seeling Cheung
 
Blockchain in Banking: A Measured Approach
Blockchain in Banking: A Measured ApproachBlockchain in Banking: A Measured Approach
Blockchain in Banking: A Measured ApproachCognizant
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkDongwon Kim
 
Keeping Your Cloud Infrastructure Healthy with the Internet of Things
Keeping Your Cloud Infrastructure Healthy with the Internet of ThingsKeeping Your Cloud Infrastructure Healthy with the Internet of Things
Keeping Your Cloud Infrastructure Healthy with the Internet of ThingsJennifer Stern
 

En vedette (20)

GeekNight 22.0 Multi-paradigm programming in Scala and Akka
GeekNight 22.0 Multi-paradigm programming in Scala and AkkaGeekNight 22.0 Multi-paradigm programming in Scala and Akka
GeekNight 22.0 Multi-paradigm programming in Scala and Akka
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Understanding the Intelligent Cloud
Understanding the Intelligent CloudUnderstanding the Intelligent Cloud
Understanding the Intelligent Cloud
 
Planning and Optimizing Data Lake Architecture - Milos Milovanovic
 Planning and Optimizing Data Lake Architecture - Milos Milovanovic Planning and Optimizing Data Lake Architecture - Milos Milovanovic
Planning and Optimizing Data Lake Architecture - Milos Milovanovic
 
The Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT Strategy
 
Geek Night 16.0 - Evolution of Programming Languages
Geek Night 16.0 - Evolution of Programming LanguagesGeek Night 16.0 - Evolution of Programming Languages
Geek Night 16.0 - Evolution of Programming Languages
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
 
AddReality company overview
AddReality company overviewAddReality company overview
AddReality company overview
 
Automate Hadoop Cluster Deployment in a Banking Ecosystem
Automate Hadoop Cluster Deployment in a Banking EcosystemAutomate Hadoop Cluster Deployment in a Banking Ecosystem
Automate Hadoop Cluster Deployment in a Banking Ecosystem
 
Hadoop con 2015 hadoop enables enterprise data lake
Hadoop con 2015   hadoop enables enterprise data lakeHadoop con 2015   hadoop enables enterprise data lake
Hadoop con 2015 hadoop enables enterprise data lake
 
Big Data from Small Places
Big Data from Small PlacesBig Data from Small Places
Big Data from Small Places
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump In
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
 
Talend winter 2017 overview webinar
Talend winter 2017 overview webinarTalend winter 2017 overview webinar
Talend winter 2017 overview webinar
 
Présentation de Talend Winter 2017
Présentation de Talend Winter 2017 Présentation de Talend Winter 2017
Présentation de Talend Winter 2017
 
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
 
Blockchain in Banking: A Measured Approach
Blockchain in Banking: A Measured ApproachBlockchain in Banking: A Measured Approach
Blockchain in Banking: A Measured Approach
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
 
Keeping Your Cloud Infrastructure Healthy with the Internet of Things
Keeping Your Cloud Infrastructure Healthy with the Internet of ThingsKeeping Your Cloud Infrastructure Healthy with the Internet of Things
Keeping Your Cloud Infrastructure Healthy with the Internet of Things
 

Similaire à Building a Data Lake - An App Dev's Perspective

David Noy – Realising the true potential of software-defined storage
David Noy – Realising the true potential of software-defined storageDavid Noy – Realising the true potential of software-defined storage
David Noy – Realising the true potential of software-defined storageVeritas Technologies LLC
 
Architecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseArchitecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseAmazon Web Services
 
Peter Grimmond – Harnessing the power of data
Peter Grimmond – Harnessing the power of dataPeter Grimmond – Harnessing the power of data
Peter Grimmond – Harnessing the power of dataVeritas Technologies LLC
 
Taking DevOps Monitoring to the Next Level - The 5 Step Guide to Monitoring N...
Taking DevOps Monitoring to the Next Level - The 5 Step Guide to Monitoring N...Taking DevOps Monitoring to the Next Level - The 5 Step Guide to Monitoring N...
Taking DevOps Monitoring to the Next Level - The 5 Step Guide to Monitoring N...Deborah Schalm
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data SnapLogic
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataMatt Stubbs
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataMatt Stubbs
 
Modernizing IT in the Platform Era
Modernizing IT in the Platform EraModernizing IT in the Platform Era
Modernizing IT in the Platform EraApcera
 
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 TiVo: How to Scale New Products with a Data Lake on AWS and Qubole TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and QuboleAmazon Web Services
 
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 TiVo: How to Scale New Products with a Data Lake on AWS and Qubole TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and QuboleAmazon Web Services
 
Denodo DataFest 2017: Company Leadership from Data Leadership
Denodo DataFest 2017: Company Leadership from Data LeadershipDenodo DataFest 2017: Company Leadership from Data Leadership
Denodo DataFest 2017: Company Leadership from Data LeadershipDenodo
 
5 Steps to Achieving the Single Pane of Glass Across DevOps -- APM, NPM, Metr...
5 Steps to Achieving the Single Pane of Glass Across DevOps -- APM, NPM, Metr...5 Steps to Achieving the Single Pane of Glass Across DevOps -- APM, NPM, Metr...
5 Steps to Achieving the Single Pane of Glass Across DevOps -- APM, NPM, Metr...DevOps.com
 
Big Data LDN 2017: The Logical Data Warehouse – A Modern Analytical Architect...
Big Data LDN 2017: The Logical Data Warehouse – A Modern Analytical Architect...Big Data LDN 2017: The Logical Data Warehouse – A Modern Analytical Architect...
Big Data LDN 2017: The Logical Data Warehouse – A Modern Analytical Architect...Matt Stubbs
 
Enhancing BI with Predictive Analytics with Case Study
Enhancing BI with Predictive Analytics with Case StudyEnhancing BI with Predictive Analytics with Case Study
Enhancing BI with Predictive Analytics with Case StudySenturus
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneySai Paravastu
 
Keys toSuccess: Business Intelligence Proven, Practical Strategies That Work
Keys toSuccess: Business Intelligence Proven, Practical Strategies That WorkKeys toSuccess: Business Intelligence Proven, Practical Strategies That Work
Keys toSuccess: Business Intelligence Proven, Practical Strategies That WorkSenturus
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningKai Wähner
 
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Apache spark empowering the real time data driven enterprise - StreamAnalytix...Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Apache spark empowering the real time data driven enterprise - StreamAnalytix...Impetus Technologies
 
Fine Tune Your Archive: Best Practices for Optimizing Enterprise Vault
Fine Tune Your Archive: Best Practices for Optimizing Enterprise Vault Fine Tune Your Archive: Best Practices for Optimizing Enterprise Vault
Fine Tune Your Archive: Best Practices for Optimizing Enterprise Vault Veritas Technologies LLC
 
Mike Palmer of Veritas: Debunking the myths of multi-cloud to achieve 360 Dat...
Mike Palmer of Veritas: Debunking the myths of multi-cloud to achieve 360 Dat...Mike Palmer of Veritas: Debunking the myths of multi-cloud to achieve 360 Dat...
Mike Palmer of Veritas: Debunking the myths of multi-cloud to achieve 360 Dat...Veritas Technologies LLC
 

Similaire à Building a Data Lake - An App Dev's Perspective (20)

David Noy – Realising the true potential of software-defined storage
David Noy – Realising the true potential of software-defined storageDavid Noy – Realising the true potential of software-defined storage
David Noy – Realising the true potential of software-defined storage
 
Architecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseArchitecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the Enterprise
 
Peter Grimmond – Harnessing the power of data
Peter Grimmond – Harnessing the power of dataPeter Grimmond – Harnessing the power of data
Peter Grimmond – Harnessing the power of data
 
Taking DevOps Monitoring to the Next Level - The 5 Step Guide to Monitoring N...
Taking DevOps Monitoring to the Next Level - The 5 Step Guide to Monitoring N...Taking DevOps Monitoring to the Next Level - The 5 Step Guide to Monitoring N...
Taking DevOps Monitoring to the Next Level - The 5 Step Guide to Monitoring N...
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Modernizing IT in the Platform Era
Modernizing IT in the Platform EraModernizing IT in the Platform Era
Modernizing IT in the Platform Era
 
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 TiVo: How to Scale New Products with a Data Lake on AWS and Qubole TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 TiVo: How to Scale New Products with a Data Lake on AWS and Qubole TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 
Denodo DataFest 2017: Company Leadership from Data Leadership
Denodo DataFest 2017: Company Leadership from Data LeadershipDenodo DataFest 2017: Company Leadership from Data Leadership
Denodo DataFest 2017: Company Leadership from Data Leadership
 
5 Steps to Achieving the Single Pane of Glass Across DevOps -- APM, NPM, Metr...
5 Steps to Achieving the Single Pane of Glass Across DevOps -- APM, NPM, Metr...5 Steps to Achieving the Single Pane of Glass Across DevOps -- APM, NPM, Metr...
5 Steps to Achieving the Single Pane of Glass Across DevOps -- APM, NPM, Metr...
 
Big Data LDN 2017: The Logical Data Warehouse – A Modern Analytical Architect...
Big Data LDN 2017: The Logical Data Warehouse – A Modern Analytical Architect...Big Data LDN 2017: The Logical Data Warehouse – A Modern Analytical Architect...
Big Data LDN 2017: The Logical Data Warehouse – A Modern Analytical Architect...
 
Enhancing BI with Predictive Analytics with Case Study
Enhancing BI with Predictive Analytics with Case StudyEnhancing BI with Predictive Analytics with Case Study
Enhancing BI with Predictive Analytics with Case Study
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 
Keys toSuccess: Business Intelligence Proven, Practical Strategies That Work
Keys toSuccess: Business Intelligence Proven, Practical Strategies That WorkKeys toSuccess: Business Intelligence Proven, Practical Strategies That Work
Keys toSuccess: Business Intelligence Proven, Practical Strategies That Work
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
 
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Apache spark empowering the real time data driven enterprise - StreamAnalytix...Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
 
Fine Tune Your Archive: Best Practices for Optimizing Enterprise Vault
Fine Tune Your Archive: Best Practices for Optimizing Enterprise Vault Fine Tune Your Archive: Best Practices for Optimizing Enterprise Vault
Fine Tune Your Archive: Best Practices for Optimizing Enterprise Vault
 
Mike Palmer of Veritas: Debunking the myths of multi-cloud to achieve 360 Dat...
Mike Palmer of Veritas: Debunking the myths of multi-cloud to achieve 360 Dat...Mike Palmer of Veritas: Debunking the myths of multi-cloud to achieve 360 Dat...
Mike Palmer of Veritas: Debunking the myths of multi-cloud to achieve 360 Dat...
 

Plus de GeekNightHyderabad

Testing strategies in microservices
Testing strategies in microservicesTesting strategies in microservices
Testing strategies in microservicesGeekNightHyderabad
 
Scaling enterprise digital platforms with kubernetes
Scaling enterprise digital platforms with kubernetesScaling enterprise digital platforms with kubernetes
Scaling enterprise digital platforms with kubernetesGeekNightHyderabad
 
FreedomBox & Community Wi-Fi networks
FreedomBox & Community Wi-Fi networksFreedomBox & Community Wi-Fi networks
FreedomBox & Community Wi-Fi networksGeekNightHyderabad
 
Rendezvous with aucovei (autonomous connected car)
Rendezvous with aucovei (autonomous connected car)Rendezvous with aucovei (autonomous connected car)
Rendezvous with aucovei (autonomous connected car)GeekNightHyderabad
 
Role of AI & ML in beauty care industry
Role of AI & ML in beauty care industryRole of AI & ML in beauty care industry
Role of AI & ML in beauty care industryGeekNightHyderabad
 
Design lean agile_thinking presentation
Design lean agile_thinking presentationDesign lean agile_thinking presentation
Design lean agile_thinking presentationGeekNightHyderabad
 
Hardware hacking and internet of things
Hardware hacking and internet of thingsHardware hacking and internet of things
Hardware hacking and internet of thingsGeekNightHyderabad
 
Spring to Cloud - REST To Microservices
Spring to Cloud - REST To MicroservicesSpring to Cloud - REST To Microservices
Spring to Cloud - REST To MicroservicesGeekNightHyderabad
 
Building Cloud Native Applications Using Spring Boot and Spring Cloud
Building Cloud Native Applications Using Spring Boot and Spring CloudBuilding Cloud Native Applications Using Spring Boot and Spring Cloud
Building Cloud Native Applications Using Spring Boot and Spring CloudGeekNightHyderabad
 
Progressive Web Applications - The Next Gen Web Technologies
Progressive Web Applications - The Next Gen Web TechnologiesProgressive Web Applications - The Next Gen Web Technologies
Progressive Web Applications - The Next Gen Web TechnologiesGeekNightHyderabad
 
Scaling a Game Server: From 500 to 100,000 Users
Scaling a Game Server: From 500 to 100,000 UsersScaling a Game Server: From 500 to 100,000 Users
Scaling a Game Server: From 500 to 100,000 UsersGeekNightHyderabad
 
Big Data - Need of Converged Data Platform
Big Data - Need of Converged Data PlatformBig Data - Need of Converged Data Platform
Big Data - Need of Converged Data PlatformGeekNightHyderabad
 
Geek Night 17.0 - Artificial Intelligence and Machine Learning
Geek Night 17.0 - Artificial Intelligence and Machine LearningGeek Night 17.0 - Artificial Intelligence and Machine Learning
Geek Night 17.0 - Artificial Intelligence and Machine LearningGeekNightHyderabad
 
Geek Night 15.0 - Touring the Dark-Side of the Internet
Geek Night 15.0 - Touring the Dark-Side of the InternetGeek Night 15.0 - Touring the Dark-Side of the Internet
Geek Night 15.0 - Touring the Dark-Side of the InternetGeekNightHyderabad
 

Plus de GeekNightHyderabad (20)

Testing strategies in microservices
Testing strategies in microservicesTesting strategies in microservices
Testing strategies in microservices
 
Metaprogramming ruby
Metaprogramming rubyMetaprogramming ruby
Metaprogramming ruby
 
Scaling enterprise digital platforms with kubernetes
Scaling enterprise digital platforms with kubernetesScaling enterprise digital platforms with kubernetes
Scaling enterprise digital platforms with kubernetes
 
FreedomBox & Community Wi-Fi networks
FreedomBox & Community Wi-Fi networksFreedomBox & Community Wi-Fi networks
FreedomBox & Community Wi-Fi networks
 
Rendezvous with aucovei (autonomous connected car)
Rendezvous with aucovei (autonomous connected car)Rendezvous with aucovei (autonomous connected car)
Rendezvous with aucovei (autonomous connected car)
 
Role of AI & ML in beauty care industry
Role of AI & ML in beauty care industryRole of AI & ML in beauty care industry
Role of AI & ML in beauty care industry
 
Breaking down a monolith
Breaking down a monolithBreaking down a monolith
Breaking down a monolith
 
Design lean agile_thinking presentation
Design lean agile_thinking presentationDesign lean agile_thinking presentation
Design lean agile_thinking presentation
 
Scaling pipelines
Scaling pipelinesScaling pipelines
Scaling pipelines
 
Blockchain beyond bitcoin
Blockchain beyond bitcoinBlockchain beyond bitcoin
Blockchain beyond bitcoin
 
Http/2
Http/2Http/2
Http/2
 
Hardware hacking and internet of things
Hardware hacking and internet of thingsHardware hacking and internet of things
Hardware hacking and internet of things
 
Spring to Cloud - REST To Microservices
Spring to Cloud - REST To MicroservicesSpring to Cloud - REST To Microservices
Spring to Cloud - REST To Microservices
 
Serverless
ServerlessServerless
Serverless
 
Building Cloud Native Applications Using Spring Boot and Spring Cloud
Building Cloud Native Applications Using Spring Boot and Spring CloudBuilding Cloud Native Applications Using Spring Boot and Spring Cloud
Building Cloud Native Applications Using Spring Boot and Spring Cloud
 
Progressive Web Applications - The Next Gen Web Technologies
Progressive Web Applications - The Next Gen Web TechnologiesProgressive Web Applications - The Next Gen Web Technologies
Progressive Web Applications - The Next Gen Web Technologies
 
Scaling a Game Server: From 500 to 100,000 Users
Scaling a Game Server: From 500 to 100,000 UsersScaling a Game Server: From 500 to 100,000 Users
Scaling a Game Server: From 500 to 100,000 Users
 
Big Data - Need of Converged Data Platform
Big Data - Need of Converged Data PlatformBig Data - Need of Converged Data Platform
Big Data - Need of Converged Data Platform
 
Geek Night 17.0 - Artificial Intelligence and Machine Learning
Geek Night 17.0 - Artificial Intelligence and Machine LearningGeek Night 17.0 - Artificial Intelligence and Machine Learning
Geek Night 17.0 - Artificial Intelligence and Machine Learning
 
Geek Night 15.0 - Touring the Dark-Side of the Internet
Geek Night 15.0 - Touring the Dark-Side of the InternetGeek Night 15.0 - Touring the Dark-Side of the Internet
Geek Night 15.0 - Touring the Dark-Side of the Internet
 

Dernier

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Dernier (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

Building a Data Lake - An App Dev's Perspective

  • 1. Building a Data Lake An App Dev’s Perspective GeekNight Hyderabad - March 8th 2017 Geetha Balasundaram geethab@thoughtworks.com © 2017 ThoughtWorks Technologies Pvt. Limited
  • 2. ABOUT ME Developer @ ThoughtWorks Building a data lake in the enterprise ecosystem Helping a retail business make sense of it ( data guided org ) Been part of web development space ( enterprise rewrite ) Equally startled like everyone else by the data engineering space Share know-how’s and do-how’s from our team’s experience snithish@thoughtworks.com © 2017 ThoughtWorks Technologies Pvt. Limited
  • 3. AGENDA What is data in the true sense… Data Warehouse in an enterprise ecosystem... What is a data lake... Data lake implementation in an enterprise ecosystem… How to make effective use of a data lake: technology+process+people Cluster Administration tool - Cloudera Manager Pitfalls to avoid © 2017 ThoughtWorks Technologies Pvt. Limited
  • 4. Question ??? How did R.Ashwin perform in the last Test match? HIGH LEVEL PROBLEM STATEMENT © 2017 ThoughtWorks Technologies Pvt. Limited
  • 5. COMPLEX HISTORICAL DATA Why? Exploit and derive as much new insights as possible Match Made Enterprise systems produce this nature of complexity © 2017 ThoughtWorks Technologies Pvt. Limited
  • 7. DID MICROSERVICES CAUSE THIS PROBLEM ? Decentralised Data https://martinfowler.com/articles/microservices.html © 2017 ThoughtWorks Technologies Pvt. Limited
  • 8. MICROSERVICES HELPED Break down business unit Break down complexity Understand the nature of data © 2017 ThoughtWorks Technologies Pvt. Limited
  • 9. Question ??? R.Ashwin performed well ( 6/41 ) in yesterday’s match! Complex historical data can quantify how well he has performed Can we say why did he do well in this particular match? What factors affected his enhanced performance? © 2017 ThoughtWorks Technologies Pvt. Limited
  • 10. FACT is a FACT … even when we don’t know how it can be used © 2017 ThoughtWorks Technologies Pvt. Limited
  • 12. What is a data lake? © 2017 ThoughtWorks Technologies Pvt. Limited
  • 13. LAKE is... .. a large body of water in a more natural state. The contents of the lake, stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples https://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes/ © 2017 ThoughtWorks Technologies Pvt. Limited
  • 14. DATA LAKE is... .. a large body of water data facts in a more natural state. The contents of the lake, stream in from a source to fill the lake, and various users of the lake can come to examine analyse, dive in build models, or take samples use subset for specific use cases © 2017 ThoughtWorks Technologies Pvt. Limited
  • 16. Implementation © 2017 ThoughtWorks Technologies Pvt. Limited
  • 17. OUR IMPLEMENTATION - TECH STACK DATA SOURCE DATA INGESTION DATA LAKE DATA MARTS DATA ANALYSIS Staging / Queue © 2017 ThoughtWorks Technologies Pvt. Limited
  • 18. © 2017 ThoughtWorks Technologies Pvt. Limited
  • 19. How to make effective use of a data lake: technology+process+people © 2017 ThoughtWorks Technologies Pvt. Limited
  • 20. Functionality Vs Reality I need a feature so that I can do this action….. to I need this insight so that I can take this action…. eg : I need a functionality to order items anytime before or during a promotion… to ..I need to know on time, if I have to order items anytime before or during a promotion… so that I can improve promotion sales People © 2017 ThoughtWorks Technologies Pvt. Limited
  • 21. Start Simple There is no data lake yet… Carve out portions of data which are easy wins yet critical to arrive at the earlier stated insight.. Set up the infrastructure and pipeline Get your hands dirty.. eg: Sales is an important factor to analyse / predict anything in retail space.. Technology © 2017 ThoughtWorks Technologies Pvt. Limited
  • 22. How much should I know about the data ? As a consumer of data (read ‘not a consumer of service’) How much should I know about it? Schema ⇔ Contracts Nature of the data versioned vs latest transactional vs reference facts vs aggregate frequency of change ….. Technology © 2017 ThoughtWorks Technologies Pvt. Limited
  • 23. DATA INSIGHT - Part 1 Incrementally add new data to the lake Serve data for analysis eg: What data wrt promotions do I need to bring into the datalake ?? Sales → improve promotion sales Technology © 2017 ThoughtWorks Technologies Pvt. Limited
  • 24. DATA INSIGHT - Part 2 Sales + Promotions → improve promotion sales How does adding more data to the lake help arriving at new insights..? history of past promotions sales = how much to order for this promotion history of past promotion sales + ‘X’ = how much to order for this promotion history of past promotion sales + ‘X’ + ‘Y’ …… = how much to order for this promotion eg: seasonality has a strong correlation with sales history of past promotion sales + ‘X’ + ‘Y’ …… + ‘A’ = how much to order for this promotion after the start People © 2017 ThoughtWorks Technologies Pvt. Limited
  • 25. Think Agile Sales + Promotions + X factor → improve promotion sales Near perfect list of parameters Progressive set of parameters Sales + Promotions → is the quantity arrived from these factors (known to business) ordered on time? Process © 2017 ThoughtWorks Technologies Pvt. Limited
  • 26. DataMarts ... as a store of bottled water – cleansed and packaged and structured for easy consumption © 2017 ThoughtWorks Technologies Pvt. Limited
  • 27. DataMarts ... as a store of data subset - curated from meaningful facts bundled into logical groups for arriving at useful insights © 2017 ThoughtWorks Technologies Pvt. Limited
  • 28. Easy Insight Sales + Promotions → is the quantity arrived from these factors (known to business) ordered on time? System : Tells me what is the quantity that is supposed to be ordered for this promotion.. System : Tells me in realtime what is the quantity that is ordered Technology © 2017 ThoughtWorks Technologies Pvt. Limited
  • 29. Cluster Administration Tool Cloudera Manager © 2017 ThoughtWorks Technologies Pvt. Limited
  • 30. Think DevOps Scale | Performance | Memory | Resource Contention | Optimization | Stability | Need for an ecosystem - to monitor how well the different tools play together without chaos Tools © 2017 ThoughtWorks Technologies Pvt. Limited
  • 31. QUICK RECAP What is data in the true sense… Data Warehouse in an enterprise ecosystem... What is a data lake... Data lake implementation in an enterprise ecosystem... How to make effective use of a data lake… Cluster Administration tool - Cloudera Manager © 2017 ThoughtWorks Technologies Pvt. Limited
  • 32. PITFALLS TO AVOID Data envy - Ref:https://martinfowler.com/bliki/Datensparsamkeit.html Tool envy Reliable data is a luxury Understanding the nature of data is a must Dialogue with the data scientist Treating the data lake like a RDBMS Keeping the business involved Data flow state visibility © 2017 ThoughtWorks Technologies Pvt. Limited