SlideShare une entreprise Scribd logo
1  sur  26
© 2017 MapR TechnologiesMapR Confidential 1
+
Mathieu Dumoulin
Data Engineer PS APAC – Tokyo, Japan
Wednesday, May 10, 2017
© 2017 MapR TechnologiesMapR Confidential 2
Today’s goals
• Machine Learning
• Enterprise Machine Learning
• Challenge of Enterprise ML
• MapR Unique Features for ML
• H2O and MapR
© 2017 MapR TechnologiesMapR Confidential 3
Machine Learning
Machine learning is a type of artificial intelligence (AI) that
provides computers with the ability to learn without being
explicitly programmed.
ML allows a computer to make predictions on data (usually
based on historical data)
© 2017 MapR TechnologiesMapR Confidential 4
1. Is this A or B?
2. Is this weird?
3. How much – or – How
many?
4. How is this organized?
5. What should I do next?
6. What’s similar?
Questions Data Science Can Answer
1. Classification
2. Anomaly Detection
3. Regression
4. Clustering
5. Reinforcement Learning
6. Recommendation
© 2017 MapR TechnologiesMapR Confidential 5
ML In the Enterprise…
… isn’t so easy after all.
© 2017 MapR TechnologiesMapR Confidential 6
Enterprise ML: Business Value Outputs
Growing number of ML use cases at successful companies
Anomaly
Detection
異常検出
Customer 360
Fraud
Detection
不正検出
Log Security
Analysis
ログ分析
Recommender
Engines
レコメンデーション
Sensor Data
Analysis (IoT)
Personalized
Offers
個人化
Ad Tech
© 2017 MapR TechnologiesMapR Confidential 7
Machine Learning Tools
© 2017 MapR TechnologiesMapR Confidential 8
What Most ML Tools Give You
A common rule of thumb
is that the modeling task
is about 10% of the total
effort of a ML project.
The choice of tool matters (to the
DS), but any top level ML
tool/library can eventually get good
results (if the data allows it at all)
© 2017 MapR TechnologiesMapR Confidential 9
Enterprise ML Projects: More than Just Modeling
© 2017 MapR TechnologiesMapR Confidential 10
Business Value is in Production
All the business value results
from a sufficiently accurate
model running in production
What it means:
Deploying a weaker model in production sooner is
MUCH better than endless work for an excellent model
(But you can make Google money if you get a world
class model in production)
© 2017 MapR TechnologiesMapR Confidential 11
Data Cleaning and Feature Engineering
80% of
the work!
© 2017 MapR TechnologiesMapR Confidential 12
Workflow View of Machine Learning
1
2
3
4
56
7 8
© 2017 MapR TechnologiesMapR Confidential 13
Enterprise ML Challenges
Data comes from
many sources
maybe very large
Data isn’t
always labeled!
Needs ETL
and cleaning
Finding the best
algorithm and
parameters can use a
lot of CPU
Real time data?
Production data
from many
sources?
Needs to run on a server
somewhere
The predictions
are used by
another system...
© 2017 MapR TechnologiesMapR Confidential 14
The Open Source Solution (I’m not joking!)
Ref: http://advancedspark.com/ , https://github.com/fluxcapacitor/pipeline
Separate
Clusters!
© 2017 MapR TechnologiesMapR Confidential 15
What Data Scientists and ML Engineers Want
I know where the data
is and how to access
it.
My work is made easier in ALL PHASES
of the ML project, not just modeling
Let me use
my favorite
tools at all
scales (MB,
GB, TB, PB)
© 2016 MapR Technologies© 2016 MapR TechnologiesMapR Confidential
An Ideal Platform For Enterprise ML
An ideal platform for ML:
•Scales with you and your data
•Freedom to use any tool
– Open source DS tools: Jupyter, Zeppelin, Spark, H2O, TF,…
– Legacy/local tools: NLP tools, scikit-learn, R
•Data can be versioned and kept reliably
•Combines storage, DB, compute and streams
•Supports both model building and model deployment
•Supports security when needed
© 2017 MapR TechnologiesMapR Confidential 17
Our Humble proposal
MapR is The Best Platform for Enterprise ML
on the market today
© 2017 MapR TechnologiesMapR Confidential 18
MapR Converged Data Platform
Open Source Engines & Tools Commercial Engines & Applications
Utility-Grade Platform Services
DataProcessing
Enterprise Storage
MapR-FS MapR-DB MapR Streams
Database Event Streaming
Global Namespace High Availability Data Protection Self-healing Unified Security Real-time Multi-tenancy
Search &
Others
Cloud &
Managed
Services
Custom Apps
UnifiedManagementandMonitoring
© 2017 MapR TechnologiesMapR Confidential 19
The MapR Stack: Converged + Open
© 2017 MapR TechnologiesMapR Confidential 20
• NFS mount and POSIX file system
– Small scale Python or R data exploration on the real data
– Keep the raw data, ETL work is easily reused
• Supports standard big data ecosystem (Spark)
• NFS mount can ingest data from any enterprise system that
can output files
– Even if they don’t support Hadoop!
• Much faster than HDFS
– Serve production models directly from MapR
MapR Supports All Tools Out of the Box
© 2017 MapR TechnologiesMapR Confidential 21
• Volumes and Topologies
– GPU enabled nodes for distributed deep learning on the same cluster
• Don’t waste resources
• Keep data locality
• Avoid unnecessary data movement
• Avoid multiple copies of data (which is the real one?)
• POSIX file system
– Use any DL framework on the cluster data
MapR has production experience with CaffeOnSpark (Samsung
Micro-Electronics) and has a new TensorFlow QSS
MapR Supports Deep Learning
© 2017 MapR TechnologiesMapR Confidential 22
• Volumes and Snapshots
– Experimental reproducibility
– Create models on real production data
– Easy to compare models on the same data
– Easy to evaluate a model across time on different snapshots
– snapshot of models: a time machine, built-in
• Volume Quotas
– Support multiple projects and teams on the same cluster
– Share storage resources efficiently
Clever Uses for Volumes and Snapshots
© 2017 MapR TechnologiesMapR Confidential 23
Remember that > 90% of the work in Enterprise ML is to realize the
workflow. This is where MapR shines! 
• Operational capabilities (MapR DB, MapR Client)
– Serve production models directly from MapR
• Snapshots and Mirrors
– Do A/B testing with almost no coding
– Promote the mirror to go back to the previous state
• Just update the path in the production system - no redeployment!
• MapR Streams for Real-time predictions
– Zero configuration Kafka – it just works!
– Kafka REST Proxy for max interoperability
– Supports microservices and Stateful Containers
Support the ML Workflow, Not Just Modeling
© 2017 MapR TechnologiesMapR Confidential 24
MapR 💖 Enterprise Machine Learning
• Features that work together to support all phases of ML
• Supports your existing tools/code and the state of the art
large scale frameworks
• Easier to manage, more robust and secure.
• MapR is made for the enterprise and great for ML
© 2017 MapR TechnologiesMapR Confidential 25
MapR Converged Application Blueprint
• Microservices connected by real-time streams
– Ideal to serve predictions from ML models
• Next-Generation large-scale architecture
• Working example:
https://www.mapr.com/appblueprint/overview
© 2017 MapR TechnologiesMapR Confidential 26
Q&A
ENGAGE WITH US
@mapr
mdumoulin@mapr.com

Contenu connexe

Tendances

Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLDESMOND YUEN
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient DataCarol McDonald
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataCarol McDonald
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBaseCarol McDonald
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Carol McDonald
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)Spark Summit
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Carol McDonald
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
ASGARD Splunk Conf 2016
ASGARD Splunk Conf 2016ASGARD Splunk Conf 2016
ASGARD Splunk Conf 2016Keith Kraus
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareCarol McDonald
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceData Works MD
 
Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksJustin Brandenburg
 
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014Amazon Web Services
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesCarol McDonald
 
Open Source RAPIDS GPU Platform to Accelerate Predictive Data Analytics
Open Source RAPIDS GPU Platform to Accelerate Predictive Data AnalyticsOpen Source RAPIDS GPU Platform to Accelerate Predictive Data Analytics
Open Source RAPIDS GPU Platform to Accelerate Predictive Data Analyticsinside-BigData.com
 

Tendances (20)

Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDL
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient Data
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
ASGARD Splunk Conf 2016
ASGARD Splunk Conf 2016ASGARD Splunk Conf 2016
ASGARD Splunk Conf 2016
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data Science
 
Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural Networks
 
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
 
Open Source RAPIDS GPU Platform to Accelerate Predictive Data Analytics
Open Source RAPIDS GPU Platform to Accelerate Predictive Data AnalyticsOpen Source RAPIDS GPU Platform to Accelerate Predictive Data Analytics
Open Source RAPIDS GPU Platform to Accelerate Predictive Data Analytics
 

Similaire à MapR and Machine Learning Primer

Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Chris Fregly
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleIan Downard
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Mathieu Dumoulin
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logisticsTed Dunning
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Carol McDonald
 
Big Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business SolutionsBig Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business SolutionsMatt Stubbs
 
Map r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetupMap r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetupAlan Iovine
 
Using TensorFlow for Machine Learning
Using TensorFlow for Machine LearningUsing TensorFlow for Machine Learning
Using TensorFlow for Machine LearningJustin Brandenburg
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...Matt Stubbs
 
Real-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionReal-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionDataWorks Summit
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...DataWorks Summit/Hadoop Summit
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Big Data LDN 2017: Real World Impact of a Global Data Fabric
Big Data LDN 2017: Real World Impact of a Global Data FabricBig Data LDN 2017: Real World Impact of a Global Data Fabric
Big Data LDN 2017: Real World Impact of a Global Data FabricMatt Stubbs
 

Similaire à MapR and Machine Learning Primer (20)

Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
 
Big Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business SolutionsBig Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business Solutions
 
Map r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetupMap r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetup
 
Using TensorFlow for Machine Learning
Using TensorFlow for Machine LearningUsing TensorFlow for Machine Learning
Using TensorFlow for Machine Learning
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...
 
Real-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionReal-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in Action
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
Streaming in the Extreme
Streaming in the ExtremeStreaming in the Extreme
Streaming in the Extreme
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Big Data LDN 2017: Real World Impact of a Global Data Fabric
Big Data LDN 2017: Real World Impact of a Global Data FabricBig Data LDN 2017: Real World Impact of a Global Data Fabric
Big Data LDN 2017: Real World Impact of a Global Data Fabric
 

Dernier

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 

Dernier (20)

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 

MapR and Machine Learning Primer

  • 1. © 2017 MapR TechnologiesMapR Confidential 1 + Mathieu Dumoulin Data Engineer PS APAC – Tokyo, Japan Wednesday, May 10, 2017
  • 2. © 2017 MapR TechnologiesMapR Confidential 2 Today’s goals • Machine Learning • Enterprise Machine Learning • Challenge of Enterprise ML • MapR Unique Features for ML • H2O and MapR
  • 3. © 2017 MapR TechnologiesMapR Confidential 3 Machine Learning Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. ML allows a computer to make predictions on data (usually based on historical data)
  • 4. © 2017 MapR TechnologiesMapR Confidential 4 1. Is this A or B? 2. Is this weird? 3. How much – or – How many? 4. How is this organized? 5. What should I do next? 6. What’s similar? Questions Data Science Can Answer 1. Classification 2. Anomaly Detection 3. Regression 4. Clustering 5. Reinforcement Learning 6. Recommendation
  • 5. © 2017 MapR TechnologiesMapR Confidential 5 ML In the Enterprise… … isn’t so easy after all.
  • 6. © 2017 MapR TechnologiesMapR Confidential 6 Enterprise ML: Business Value Outputs Growing number of ML use cases at successful companies Anomaly Detection 異常検出 Customer 360 Fraud Detection 不正検出 Log Security Analysis ログ分析 Recommender Engines レコメンデーション Sensor Data Analysis (IoT) Personalized Offers 個人化 Ad Tech
  • 7. © 2017 MapR TechnologiesMapR Confidential 7 Machine Learning Tools
  • 8. © 2017 MapR TechnologiesMapR Confidential 8 What Most ML Tools Give You A common rule of thumb is that the modeling task is about 10% of the total effort of a ML project. The choice of tool matters (to the DS), but any top level ML tool/library can eventually get good results (if the data allows it at all)
  • 9. © 2017 MapR TechnologiesMapR Confidential 9 Enterprise ML Projects: More than Just Modeling
  • 10. © 2017 MapR TechnologiesMapR Confidential 10 Business Value is in Production All the business value results from a sufficiently accurate model running in production What it means: Deploying a weaker model in production sooner is MUCH better than endless work for an excellent model (But you can make Google money if you get a world class model in production)
  • 11. © 2017 MapR TechnologiesMapR Confidential 11 Data Cleaning and Feature Engineering 80% of the work!
  • 12. © 2017 MapR TechnologiesMapR Confidential 12 Workflow View of Machine Learning 1 2 3 4 56 7 8
  • 13. © 2017 MapR TechnologiesMapR Confidential 13 Enterprise ML Challenges Data comes from many sources maybe very large Data isn’t always labeled! Needs ETL and cleaning Finding the best algorithm and parameters can use a lot of CPU Real time data? Production data from many sources? Needs to run on a server somewhere The predictions are used by another system...
  • 14. © 2017 MapR TechnologiesMapR Confidential 14 The Open Source Solution (I’m not joking!) Ref: http://advancedspark.com/ , https://github.com/fluxcapacitor/pipeline Separate Clusters!
  • 15. © 2017 MapR TechnologiesMapR Confidential 15 What Data Scientists and ML Engineers Want I know where the data is and how to access it. My work is made easier in ALL PHASES of the ML project, not just modeling Let me use my favorite tools at all scales (MB, GB, TB, PB)
  • 16. © 2016 MapR Technologies© 2016 MapR TechnologiesMapR Confidential An Ideal Platform For Enterprise ML An ideal platform for ML: •Scales with you and your data •Freedom to use any tool – Open source DS tools: Jupyter, Zeppelin, Spark, H2O, TF,… – Legacy/local tools: NLP tools, scikit-learn, R •Data can be versioned and kept reliably •Combines storage, DB, compute and streams •Supports both model building and model deployment •Supports security when needed
  • 17. © 2017 MapR TechnologiesMapR Confidential 17 Our Humble proposal MapR is The Best Platform for Enterprise ML on the market today
  • 18. © 2017 MapR TechnologiesMapR Confidential 18 MapR Converged Data Platform Open Source Engines & Tools Commercial Engines & Applications Utility-Grade Platform Services DataProcessing Enterprise Storage MapR-FS MapR-DB MapR Streams Database Event Streaming Global Namespace High Availability Data Protection Self-healing Unified Security Real-time Multi-tenancy Search & Others Cloud & Managed Services Custom Apps UnifiedManagementandMonitoring
  • 19. © 2017 MapR TechnologiesMapR Confidential 19 The MapR Stack: Converged + Open
  • 20. © 2017 MapR TechnologiesMapR Confidential 20 • NFS mount and POSIX file system – Small scale Python or R data exploration on the real data – Keep the raw data, ETL work is easily reused • Supports standard big data ecosystem (Spark) • NFS mount can ingest data from any enterprise system that can output files – Even if they don’t support Hadoop! • Much faster than HDFS – Serve production models directly from MapR MapR Supports All Tools Out of the Box
  • 21. © 2017 MapR TechnologiesMapR Confidential 21 • Volumes and Topologies – GPU enabled nodes for distributed deep learning on the same cluster • Don’t waste resources • Keep data locality • Avoid unnecessary data movement • Avoid multiple copies of data (which is the real one?) • POSIX file system – Use any DL framework on the cluster data MapR has production experience with CaffeOnSpark (Samsung Micro-Electronics) and has a new TensorFlow QSS MapR Supports Deep Learning
  • 22. © 2017 MapR TechnologiesMapR Confidential 22 • Volumes and Snapshots – Experimental reproducibility – Create models on real production data – Easy to compare models on the same data – Easy to evaluate a model across time on different snapshots – snapshot of models: a time machine, built-in • Volume Quotas – Support multiple projects and teams on the same cluster – Share storage resources efficiently Clever Uses for Volumes and Snapshots
  • 23. © 2017 MapR TechnologiesMapR Confidential 23 Remember that > 90% of the work in Enterprise ML is to realize the workflow. This is where MapR shines!  • Operational capabilities (MapR DB, MapR Client) – Serve production models directly from MapR • Snapshots and Mirrors – Do A/B testing with almost no coding – Promote the mirror to go back to the previous state • Just update the path in the production system - no redeployment! • MapR Streams for Real-time predictions – Zero configuration Kafka – it just works! – Kafka REST Proxy for max interoperability – Supports microservices and Stateful Containers Support the ML Workflow, Not Just Modeling
  • 24. © 2017 MapR TechnologiesMapR Confidential 24 MapR 💖 Enterprise Machine Learning • Features that work together to support all phases of ML • Supports your existing tools/code and the state of the art large scale frameworks • Easier to manage, more robust and secure. • MapR is made for the enterprise and great for ML
  • 25. © 2017 MapR TechnologiesMapR Confidential 25 MapR Converged Application Blueprint • Microservices connected by real-time streams – Ideal to serve predictions from ML models • Next-Generation large-scale architecture • Working example: https://www.mapr.com/appblueprint/overview
  • 26. © 2017 MapR TechnologiesMapR Confidential 26 Q&A ENGAGE WITH US @mapr mdumoulin@mapr.com

Notes de l'éditeur

  1. 1. Will this tire fail in the next 1,000 miles: Yes or no? Which brings in more customers: a $5 coupon or a 25% discount? 2. If you have a car with pressure gauges, you might want to know: Is this pressure gauge reading normal? If you're monitoring the internet you’d want to know: Is this message from the internet typical? 3. What will the temperature be next Tuesday? What will my fourth quarter sales be? 4. Which viewers like the same types of movies? Which printer models fail the same way? 5. Reinforcement learning was inspired by how the brains of rats and humans respond to punishment and rewards. These algorithms learn from outcomes, and decide on the next action. Typically, reinforcement learning is a good fit for automated systems that have to make lots of small decisions without human guidance. 6. What did similar people/customer buy/watch/listen to? What other movies will you like, if you like product A?
  2. Data Ingestion is a non-trivial task for enterprise The best systems combine data from multiple sources Adding more data is a highly specialized task Data Governance for ML Dataset versions Test data versions Model versions Test results versions Model Deployment is a non-trivial integration task with another external enterprise system May need to be scalable, HA and fault-tolerant What about after deployment? A/B Testing Understanding performance Dealing with data drift
  3. Data Ingestion is a non-trivial task for enterprise The best systems combine data from multiple sources Adding more data is a highly specialized task Data Governance for ML Dataset versions Test data versions Model versions Test results versions Model Deployment is a non-trivial integration task with another external enterprise system May need to be scalable, HA and fault-tolerant What about after deployment? A/B Testing Understanding performance Dealing with data drift
  4. Data Ingestion is a non-trivial task for enterprise The best systems combine data from multiple sources Adding more data is a highly specialized task Data Governance for ML Dataset versions Test data versions Model versions Test results versions Model Deployment is a non-trivial integration task with another external enterprise system May need to be scalable, HA and fault-tolerant What about after deployment? A/B Testing Understanding performance Dealing with data drift
  5. Data Ingestion is a non-trivial task for enterprise The best systems combine data from multiple sources Adding more data is a highly specialized task Data Governance for ML Dataset versions Test data versions Model versions Test results versions Model Deployment is a non-trivial integration task with another external enterprise system May need to be scalable, HA and fault-tolerant What about after deployment? A/B Testing Understanding performance Dealing with data drift
  6. Get training data (example: images) Get labels for the training data (examples: what the image is about, the image labels) Transform the data into numbers (machine learning algorithms can’t deal with raw data, only vectors of numbers) Heavily iterative work to find the best set of features Try many different algorithms, and tune their parameters for best performance Heavily iterative work to find the best algorithm and parameter values The best algorithm, trained on your data, with its parameters tuned for best performance is your predictive model Get new data Transform it to match the same format as your training feature vectors The model will output a predicted label for the new data This is a lot of work, but glosses over a HUGE amount of work required to get business value in an enterprise setting
  7. Here is a small sample of the issues faced in putting ML to work in an enterprise
  8. This is for real. Chris Fregly is the Pipeline.io guy and he’s building enterprise ML systems with this set of tools. This is the set of tools required to be able to provide a true 100% open source end to end story for enterprise ML. How does MapR simplify this picture? What tools still remain useful if we’d run on MapR?
  9. Can support OSS tools like R, scikit, Theano and TensorFlow first to avoid more expensive licences. NoSQL, Kafka, Spark, etc...
  10. Indeed, ML Tools are only really good at modeling. They typically provide limited support for feature engineering. In addition Tools typically only support testing models on a single dataset, with no support for comparing production models to experimental models, comparing models across different versions of a dataset, etc. Such capabilities need to be custom made and are typically done with low quality ad-hoc code by data scientists. Most of the work resides in the ETL to get the data in the first place, the data cleaning and feature engineering not supported by the tools, as well as as the work required to deploy a model to production. MapR really shines on that 90% of the work, and supports all ML tools just the same (indeed, better) as any competing platform.
  11. legacy tools: R, Python, Bash, SPSS, Hive/Pig State of the art: Apache projects like Drill, Impala, Spark, Zepplin, Mesos, Flink, …