SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
Aki Ariga | Field Data Scientist
2018.05.17
2 © Cloudera, Inc. All rights reserved.
● Field Data Scientist at Cloudera
● Previously research engineer at Toshiba, Rails developer at Cookpad
● Co-author of “ ”
● Founder of kawasaki.rb & MLCT
● Twitter: @chezou
● GitHub: https://github.com/chezou/
:
3 © Cloudera, Inc. All rights reserved.
Hidden technical debt in Machine learning systems [2]
Project
procedure
Culture
+
+
© Cloudera, Inc. All rights reserved.
Building a Data-driven product ≠ Research
5 © Cloudera, Inc. All rights reserved.
A journey for Data-driven product
1.
2.
3. A/B
4. A/B
5.
6.
7.
http://tjo.hatenablog.com/entry/2016/01/18/080000 ( )
Culture
BI
Statistics
ML
6 © Cloudera, Inc. All rights reserved.
1.
2.
3.
4.
5.
6.
7.
8.
Procedure in a Machine Learning project
Step.4 7
7 © Cloudera, Inc. All rights reserved.
•
•
•
• / Web
•
Typical project member recommendation for ML project
© Cloudera, Inc. All rights reserved.
What’s the difference between academia and industry for ML?
9 © Cloudera, Inc. All rights reserved.
Production by Nick Youngson CC BY-SA 3.0 Alpha Stock Images
10 © Cloudera, Inc. All rights reserved.
Sample data science/machine learning workflow
From data to exploration to action
Data Engineering Data Science (Exploratory) Production (Operational)
Data
Wrangling
Data
Exploration
Model Training
& Testing
Production
Data Pipelines Batch Scoring
Online Scoring
Serving
Data GovernanceCuration
Data Engineering
Acquisition
Reports,
Dashboards
Data Models Predictions Business value
1.
12 © Cloudera, Inc. All rights reserved.
1.
2.
3.
Production
MLOps
13 © Cloudera, Inc. All rights reserved.
1.
2.
3.
Production
MLOps
14 © Cloudera, Inc. All rights reserved.
1. Train by batch, predict on the fly, serve via REST API
2. Train by batch, predict by batch, serve through the shared DB
3. Train, predict, serve by streaming
4. Train by batch, predict on mobile app
1.
15 © Cloudera, Inc. All rights reserved.
Web Application
DB
Trained Model
Execute training
Extract feature
Prediction
result
Activity log/
Contents data
Feature
Training result
Feature
Batch SystemAPI Server
REST
API
User ID/
Item ID
ML System
Pattern 1: Train by batch, predict on the fly, serve via REST API
1.
16 © Cloudera, Inc. All rights reserved.
Extract feature &
Train/update model
Extract feature & Predict
Trained Model
Activity log
Export model as
PMML
Model building layer
Predicting &
serving layer
Updated model
CDSW
Prediction results
HDFSRequest to predict
Load model
Example architecture: PMML + OpenScoring
1.
17 © Cloudera, Inc. All rights reserved.
Extract feature &
Train/update model
Extract feature & Predict
Trained Model
Activity log
Save model on
object storage
Model building layer
Predicting &
serving layer
Updated model
Prediction results
HDFSRequest to predict
Load model
Object
storage
Pack the runtime
env with Docker
CDSW
Example architecture: Docker based API Server
1.
18 © Cloudera, Inc. All rights reserved.
Web Application
DB
Trained Model
Batch System
Execute training
Extract feature
Prediction
result
Activity log/
Contents data
Feature
Training result
Feature
Serve prediction
Training BatchPrediction Batch
Pattern 2: Train by batch, predict by batch, serve through the shared DB
1.
19 © Cloudera, Inc. All rights reserved.
Kudu/HBase
Extract feature &
Train/update model
Extract feature & Predict
Activity log
Prediction results
Model building &
predicting layerServing layer
Updated model
Activity log Load trained
model
Prediction results
HDFS
CDSW
Historical
data
Historical
data
Example architecture: Serving by HBase/Kudu
Trained Model
1.
20 © Cloudera, Inc. All rights reserved.
Web Application
Trained Model
Stream-based ML System
(e.g. Spark Streaming)
Train & Predict
Extract feature
Prediction
results
Recent
log data
Feature Model updates
Model
- Querying for prediction
- Showing or sending alerts
- This component may work
with message queue like
Kafka
Messagequeue
(e.g.Kafka)
Log data
Prediction
results
Pattern 3: Train, predict, serve by streaming
1.
21 © Cloudera, Inc. All rights reserved.
Mobile Application
DB
Trained Model
Batch System
Execute training
Extract feature
Extract feature
Request for
prediction Activity logs/
Contents data
Prediction
result
Activity log/
Contents data
Feature
Training resultFeature
DB
Trained Model
Convert
model
Pattern 4: Train by batch, predict on a mobile app
1.
22 © Cloudera, Inc. All rights reserved.
Extract feature &
Train/update model
Extract feature & Predict
Trained Model
Activity log
Convert model to
TFLite/CoreML
Model building layer
Predicting &
serving layer
Updated model
Prediction results
HDFSRequest to predict
Load model
Storage in a
smart phone
CDSW
Example architecture: Serving on a mobile app
1.
23 © Cloudera, Inc. All rights reserved.
Pattern 4’: Federated learning
https://research.googleblog.com/2017/04/federated-learning-
collaborative.html
1.
24 © Cloudera, Inc. All rights reserved.
4 patterns Comparison
1.
Pattern 1 (REST API) Pattern 2 (Shared DB) Pattern 3 (Streaming) Pattern 4 (Mobile app)
Training by batch by batch NRT (by streaming) by batch
Prediction NRT (on the fly) by batch NRT (by streaming) NRT (on the fly)
Prediction result
delivery
NRT (via REST API) NRT
(through the shared DB)
NRT
(by streaming via MQ )
NRT (via in-process API
on mobile)
Latency for prediction
from getting new data
So so So so ~ Long Very low Low
Required time to predict Short Long Short Short
Tight/loose coupling
with app
Loose Loose Loose Tight
Dependency of
languages
Independent Independent Independent Depends on frameworks
System management
difficulty
So so Easy Very Hard So so
NRT: Near real time
25 © Cloudera, Inc. All rights reserved.
CI, CD and Blue Green deployment
https://www.slideshare.net/hiroakikudo77/ss-84593653/14
1.
26 © Cloudera, Inc. All rights reserved.
1.
2.
3.
Production
MLOps
27 © Cloudera, Inc. All rights reserved.
• /Feedback loop
•
•
2.
28 © Cloudera, Inc. All rights reserved.
•
• ) MeCab
•
• )
•
•
•
/Feedback loop
https://twitter.com/hagino3000/status/986257856730034177
2.
29 © Cloudera, Inc. All rights reserved.
•
• “safe to serve” & “desired prediction quality” [4]
• (offline) (online)
• “Silent failures” [3]
• ) Join
• )
•
•
•
• serving
2.
30 © Cloudera, Inc. All rights reserved.
• •
• [1]
• ) DVC, Bitemporal Modeling
• [4]
• )
•
• [2,4]
• [4]
2.
31 © Cloudera, Inc. All rights reserved.
1.
2.
3.
Production
MLOps
32 © Cloudera, Inc. All rights reserved.
•
• [7]
• Google, Facebook [4, 9]
• /
• /
•
•
Researcher, Dev, Ops:
https://www.slideshare.net/syou6162/ss-88255142
3.
33 © Cloudera, Inc. All rights reserved.
• IoT
[8]
•
•
(GDPR)
3.
34 © Cloudera, Inc. All rights reserved.
• Data-driven product
•
•
•
• ML systems Production
•
•
•
•
35 © Cloudera, Inc. All rights reserved.
• [1] “My model has higher BLEU, can I ship it? The Joel Test for machine learning systems”, L. Park,
2017, ACML-AIMLP Workshop
• [2] “Hidden Technical Debt in Machine Learning Systems”, D. Sculley et al., NIPS’ 15
• [3] “Rules of Machine Learning: Best Practices for ML Engineering”, M. Zinkevich
• [4] “TFX: A TensorFlow-Based Production-Scale Machine Learning Platform”, A. Naresh et al., KDD
2017
• [5] “What’s your ML test score? A rubric for ML production systems”, E. Breck et al., Reliable Machine
Learning in the Wild - NIPS 2016 Workshop (2016)
• [6] , 2017, ML Ops Study #1
• [7] , , 2018, HACKER TACKLE 2018
• [8] “DevOps for models: How to manage millions of models in production—and at the edge”, T. Tung
et al., Strata Data Singapore, 2017
• [9] “Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective”, K. Hazelwood
et al., IEEE HPCA, 2018
THANK YOU

Contenu connexe

Similaire à 仕事ではじめる機械学習

How to deploy machine learning models into production
How to deploy machine learning models into productionHow to deploy machine learning models into production
How to deploy machine learning models into productionDataWorks Summit
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Luciano Resende
 
Exploring Google APIs with Python
Exploring Google APIs with PythonExploring Google APIs with Python
Exploring Google APIs with Pythonwesley chun
 
Deployment Automation for Hybrid Cloud and Multi-Platform Environments
Deployment Automation for Hybrid Cloud and Multi-Platform EnvironmentsDeployment Automation for Hybrid Cloud and Multi-Platform Environments
Deployment Automation for Hybrid Cloud and Multi-Platform EnvironmentsIBM UrbanCode Products
 
Perth DevOps Meetup - Introducing the IBM Innovation Lab - 12112015
Perth DevOps Meetup - Introducing the IBM Innovation Lab - 12112015Perth DevOps Meetup - Introducing the IBM Innovation Lab - 12112015
Perth DevOps Meetup - Introducing the IBM Innovation Lab - 12112015Christophe Lucas
 
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and DataflowHow to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and DataflowDaniel Zivkovic
 
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Sotrender
 
Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapNeo4j
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowWes McKinney
 
Cloud Roundtable | Pivoltal: Agile platform
Cloud Roundtable | Pivoltal: Agile platformCloud Roundtable | Pivoltal: Agile platform
Cloud Roundtable | Pivoltal: Agile platformCodemotion
 
Enabling .NET Apps with Monitoring and Management Using Steeltoe
Enabling .NET Apps with Monitoring and Management Using SteeltoeEnabling .NET Apps with Monitoring and Management Using Steeltoe
Enabling .NET Apps with Monitoring and Management Using SteeltoeVMware Tanzu
 
Breaking the Monolith
Breaking the MonolithBreaking the Monolith
Breaking the MonolithVMware Tanzu
 
Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18Cloudera, Inc.
 
Custom Runtimes for the Cloud
Custom Runtimes for the CloudCustom Runtimes for the Cloud
Custom Runtimes for the CloudCloudBees
 
CSC AWS re:Invent Enterprise DevOps session
CSC AWS re:Invent Enterprise DevOps sessionCSC AWS re:Invent Enterprise DevOps session
CSC AWS re:Invent Enterprise DevOps sessionTom Laszewski
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Cloudera, Inc.
 
DevOps on Oracle Cloud
DevOps on Oracle CloudDevOps on Oracle Cloud
DevOps on Oracle CloudMee Nam Lee
 
Accessing Google Cloud APIs
Accessing Google Cloud APIsAccessing Google Cloud APIs
Accessing Google Cloud APIswesley chun
 

Similaire à 仕事ではじめる機械学習 (20)

How to deploy machine learning models into production
How to deploy machine learning models into productionHow to deploy machine learning models into production
How to deploy machine learning models into production
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
 
Exploring Google APIs with Python
Exploring Google APIs with PythonExploring Google APIs with Python
Exploring Google APIs with Python
 
Deployment Automation for Hybrid Cloud and Multi-Platform Environments
Deployment Automation for Hybrid Cloud and Multi-Platform EnvironmentsDeployment Automation for Hybrid Cloud and Multi-Platform Environments
Deployment Automation for Hybrid Cloud and Multi-Platform Environments
 
Perth DevOps Meetup - Introducing the IBM Innovation Lab - 12112015
Perth DevOps Meetup - Introducing the IBM Innovation Lab - 12112015Perth DevOps Meetup - Introducing the IBM Innovation Lab - 12112015
Perth DevOps Meetup - Introducing the IBM Innovation Lab - 12112015
 
SamSegalResume
SamSegalResumeSamSegalResume
SamSegalResume
 
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and DataflowHow to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
 
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
 
Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and Roadmap
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
 
Cloud Roundtable | Pivoltal: Agile platform
Cloud Roundtable | Pivoltal: Agile platformCloud Roundtable | Pivoltal: Agile platform
Cloud Roundtable | Pivoltal: Agile platform
 
Enabling .NET Apps with Monitoring and Management Using Steeltoe
Enabling .NET Apps with Monitoring and Management Using SteeltoeEnabling .NET Apps with Monitoring and Management Using Steeltoe
Enabling .NET Apps with Monitoring and Management Using Steeltoe
 
Breaking the Monolith
Breaking the MonolithBreaking the Monolith
Breaking the Monolith
 
Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18
 
Custom Runtimes for the Cloud
Custom Runtimes for the CloudCustom Runtimes for the Cloud
Custom Runtimes for the Cloud
 
CSC AWS re:Invent Enterprise DevOps session
CSC AWS re:Invent Enterprise DevOps sessionCSC AWS re:Invent Enterprise DevOps session
CSC AWS re:Invent Enterprise DevOps session
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

 
DevOps on Oracle Cloud
DevOps on Oracle CloudDevOps on Oracle Cloud
DevOps on Oracle Cloud
 
Accessing Google Cloud APIs
Accessing Google Cloud APIsAccessing Google Cloud APIs
Accessing Google Cloud APIs
 
Sam segal resume
Sam segal resumeSam segal resume
Sam segal resume
 

Plus de Aki Ariga

Challenges for machine learning systems toward continuous improvement
Challenges for machine learning systems toward continuous improvementChallenges for machine learning systems toward continuous improvement
Challenges for machine learning systems toward continuous improvementAki Ariga
 
Managing Machine Learning workflows on Treasure Data
Managing Machine Learning workflows on Treasure DataManaging Machine Learning workflows on Treasure Data
Managing Machine Learning workflows on Treasure DataAki Ariga
 
主人が外資系IT企業に転職して4ヶ月が過ぎました
主人が外資系IT企業に転職して4ヶ月が過ぎました主人が外資系IT企業に転職して4ヶ月が過ぎました
主人が外資系IT企業に転職して4ヶ月が過ぎましたAki Ariga
 
R&D at Foodtech company - #CookpadTechConf 2016
R&D at Foodtech company - #CookpadTechConf 2016R&D at Foodtech company - #CookpadTechConf 2016
R&D at Foodtech company - #CookpadTechConf 2016Aki Ariga
 
Why I started Machine Learning Casual Talks? #MLCT
Why I started Machine Learning Casual Talks? #MLCTWhy I started Machine Learning Casual Talks? #MLCT
Why I started Machine Learning Casual Talks? #MLCTAki Ariga
 
クックパッドサマーインターン2015 機械学習・自然言語処理 実習課題
クックパッドサマーインターン2015 機械学習・自然言語処理 実習課題クックパッドサマーインターン2015 機械学習・自然言語処理 実習課題
クックパッドサマーインターン2015 機械学習・自然言語処理 実習課題Aki Ariga
 
Rubyistがgemの前にPypiデビューするのは間違っているだろうか
Rubyistがgemの前にPypiデビューするのは間違っているだろうかRubyistがgemの前にPypiデビューするのは間違っているだろうか
Rubyistがgemの前にPypiデビューするのは間違っているだろうかAki Ariga
 
Machine Learning Casual Talks Intro #MLCT
Machine Learning Casual Talks Intro #MLCTMachine Learning Casual Talks Intro #MLCT
Machine Learning Casual Talks Intro #MLCTAki Ariga
 
Make Julia more popular in Japan!!1 #JuliaTokyo
Make Julia more popular in Japan!!1 #JuliaTokyoMake Julia more popular in Japan!!1 #JuliaTokyo
Make Julia more popular in Japan!!1 #JuliaTokyoAki Ariga
 
Refrection of kawasaki.rb
Refrection of kawasaki.rbRefrection of kawasaki.rb
Refrection of kawasaki.rbAki Ariga
 
Introduction and benchmarking of MeCab.jl #JapanR
Introduction and benchmarking of MeCab.jl  #JapanRIntroduction and benchmarking of MeCab.jl  #JapanR
Introduction and benchmarking of MeCab.jl #JapanRAki Ariga
 
Recommendation for iruby #tqrk08
Recommendation for iruby #tqrk08Recommendation for iruby #tqrk08
Recommendation for iruby #tqrk08Aki Ariga
 
The book that changed me
The book that changed meThe book that changed me
The book that changed meAki Ariga
 
Introduction of Mecab.jl #JuliaTokyo
Introduction of Mecab.jl #JuliaTokyoIntroduction of Mecab.jl #JuliaTokyo
Introduction of Mecab.jl #JuliaTokyoAki Ariga
 
Introduction to Kanagawa Ruby Kaigi01 #kana01
Introduction to Kanagawa Ruby Kaigi01 #kana01Introduction to Kanagawa Ruby Kaigi01 #kana01
Introduction to Kanagawa Ruby Kaigi01 #kana01Aki Ariga
 
Julia 100 exercises #JuliaTokyo
Julia 100 exercises #JuliaTokyoJulia 100 exercises #JuliaTokyo
Julia 100 exercises #JuliaTokyoAki Ariga
 
Machine Learning Casual Talks opening talk
Machine Learning Casual Talks opening talkMachine Learning Casual Talks opening talk
Machine Learning Casual Talks opening talkAki Ariga
 
Gong anyware
Gong anywareGong anyware
Gong anywareAki Ariga
 
gsub with ActiveSupport::SafeBuffer
gsub with ActiveSupport::SafeBuffergsub with ActiveSupport::SafeBuffer
gsub with ActiveSupport::SafeBufferAki Ariga
 
はじめて翻訳記事を書いたら300ブクマ超えた話
はじめて翻訳記事を書いたら300ブクマ超えた話はじめて翻訳記事を書いたら300ブクマ超えた話
はじめて翻訳記事を書いたら300ブクマ超えた話Aki Ariga
 

Plus de Aki Ariga (20)

Challenges for machine learning systems toward continuous improvement
Challenges for machine learning systems toward continuous improvementChallenges for machine learning systems toward continuous improvement
Challenges for machine learning systems toward continuous improvement
 
Managing Machine Learning workflows on Treasure Data
Managing Machine Learning workflows on Treasure DataManaging Machine Learning workflows on Treasure Data
Managing Machine Learning workflows on Treasure Data
 
主人が外資系IT企業に転職して4ヶ月が過ぎました
主人が外資系IT企業に転職して4ヶ月が過ぎました主人が外資系IT企業に転職して4ヶ月が過ぎました
主人が外資系IT企業に転職して4ヶ月が過ぎました
 
R&D at Foodtech company - #CookpadTechConf 2016
R&D at Foodtech company - #CookpadTechConf 2016R&D at Foodtech company - #CookpadTechConf 2016
R&D at Foodtech company - #CookpadTechConf 2016
 
Why I started Machine Learning Casual Talks? #MLCT
Why I started Machine Learning Casual Talks? #MLCTWhy I started Machine Learning Casual Talks? #MLCT
Why I started Machine Learning Casual Talks? #MLCT
 
クックパッドサマーインターン2015 機械学習・自然言語処理 実習課題
クックパッドサマーインターン2015 機械学習・自然言語処理 実習課題クックパッドサマーインターン2015 機械学習・自然言語処理 実習課題
クックパッドサマーインターン2015 機械学習・自然言語処理 実習課題
 
Rubyistがgemの前にPypiデビューするのは間違っているだろうか
Rubyistがgemの前にPypiデビューするのは間違っているだろうかRubyistがgemの前にPypiデビューするのは間違っているだろうか
Rubyistがgemの前にPypiデビューするのは間違っているだろうか
 
Machine Learning Casual Talks Intro #MLCT
Machine Learning Casual Talks Intro #MLCTMachine Learning Casual Talks Intro #MLCT
Machine Learning Casual Talks Intro #MLCT
 
Make Julia more popular in Japan!!1 #JuliaTokyo
Make Julia more popular in Japan!!1 #JuliaTokyoMake Julia more popular in Japan!!1 #JuliaTokyo
Make Julia more popular in Japan!!1 #JuliaTokyo
 
Refrection of kawasaki.rb
Refrection of kawasaki.rbRefrection of kawasaki.rb
Refrection of kawasaki.rb
 
Introduction and benchmarking of MeCab.jl #JapanR
Introduction and benchmarking of MeCab.jl  #JapanRIntroduction and benchmarking of MeCab.jl  #JapanR
Introduction and benchmarking of MeCab.jl #JapanR
 
Recommendation for iruby #tqrk08
Recommendation for iruby #tqrk08Recommendation for iruby #tqrk08
Recommendation for iruby #tqrk08
 
The book that changed me
The book that changed meThe book that changed me
The book that changed me
 
Introduction of Mecab.jl #JuliaTokyo
Introduction of Mecab.jl #JuliaTokyoIntroduction of Mecab.jl #JuliaTokyo
Introduction of Mecab.jl #JuliaTokyo
 
Introduction to Kanagawa Ruby Kaigi01 #kana01
Introduction to Kanagawa Ruby Kaigi01 #kana01Introduction to Kanagawa Ruby Kaigi01 #kana01
Introduction to Kanagawa Ruby Kaigi01 #kana01
 
Julia 100 exercises #JuliaTokyo
Julia 100 exercises #JuliaTokyoJulia 100 exercises #JuliaTokyo
Julia 100 exercises #JuliaTokyo
 
Machine Learning Casual Talks opening talk
Machine Learning Casual Talks opening talkMachine Learning Casual Talks opening talk
Machine Learning Casual Talks opening talk
 
Gong anyware
Gong anywareGong anyware
Gong anyware
 
gsub with ActiveSupport::SafeBuffer
gsub with ActiveSupport::SafeBuffergsub with ActiveSupport::SafeBuffer
gsub with ActiveSupport::SafeBuffer
 
はじめて翻訳記事を書いたら300ブクマ超えた話
はじめて翻訳記事を書いたら300ブクマ超えた話はじめて翻訳記事を書いたら300ブクマ超えた話
はじめて翻訳記事を書いたら300ブクマ超えた話
 

Dernier

Javier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier Fernández Muñoz
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Romil Mishra
 
Theory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfTheory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfShreyas Pandit
 
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxTriangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxRomil Mishra
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptxmohitesoham12
 
Forming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).pptForming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).pptNoman khan
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewsandhya757531
 
Novel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsNovel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsResearcher Researcher
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSsandhya757531
 
A brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProA brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProRay Yuan Liu
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdfCaalaaAbdulkerim
 
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxCurve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxRomil Mishra
 
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.elesangwon
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...gerogepatton
 
CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfBalamuruganV28
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfalene1
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 
Secure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech LabsSecure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech Labsamber724300
 

Dernier (20)

Javier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptx
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________
 
Theory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfTheory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdf
 
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxTriangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptx
 
Forming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).pptForming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).ppt
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overview
 
Novel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsNovel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending Actuators
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
 
A brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProA brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision Pro
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdf
 
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxCurve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
 
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
 
CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdf
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 
Secure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech LabsSecure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech Labs
 
ASME-B31.4-2019-estandar para diseño de ductos
ASME-B31.4-2019-estandar para diseño de ductosASME-B31.4-2019-estandar para diseño de ductos
ASME-B31.4-2019-estandar para diseño de ductos
 

仕事ではじめる機械学習

  • 1. Aki Ariga | Field Data Scientist 2018.05.17
  • 2. 2 © Cloudera, Inc. All rights reserved. ● Field Data Scientist at Cloudera ● Previously research engineer at Toshiba, Rails developer at Cookpad ● Co-author of “ ” ● Founder of kawasaki.rb & MLCT ● Twitter: @chezou ● GitHub: https://github.com/chezou/ :
  • 3. 3 © Cloudera, Inc. All rights reserved. Hidden technical debt in Machine learning systems [2] Project procedure Culture + +
  • 4. © Cloudera, Inc. All rights reserved. Building a Data-driven product ≠ Research
  • 5. 5 © Cloudera, Inc. All rights reserved. A journey for Data-driven product 1. 2. 3. A/B 4. A/B 5. 6. 7. http://tjo.hatenablog.com/entry/2016/01/18/080000 ( ) Culture BI Statistics ML
  • 6. 6 © Cloudera, Inc. All rights reserved. 1. 2. 3. 4. 5. 6. 7. 8. Procedure in a Machine Learning project Step.4 7
  • 7. 7 © Cloudera, Inc. All rights reserved. • • • • / Web • Typical project member recommendation for ML project
  • 8. © Cloudera, Inc. All rights reserved. What’s the difference between academia and industry for ML?
  • 9. 9 © Cloudera, Inc. All rights reserved. Production by Nick Youngson CC BY-SA 3.0 Alpha Stock Images
  • 10. 10 © Cloudera, Inc. All rights reserved. Sample data science/machine learning workflow From data to exploration to action Data Engineering Data Science (Exploratory) Production (Operational) Data Wrangling Data Exploration Model Training & Testing Production Data Pipelines Batch Scoring Online Scoring Serving Data GovernanceCuration Data Engineering Acquisition Reports, Dashboards Data Models Predictions Business value 1.
  • 11. 12 © Cloudera, Inc. All rights reserved. 1. 2. 3. Production MLOps
  • 12. 13 © Cloudera, Inc. All rights reserved. 1. 2. 3. Production MLOps
  • 13. 14 © Cloudera, Inc. All rights reserved. 1. Train by batch, predict on the fly, serve via REST API 2. Train by batch, predict by batch, serve through the shared DB 3. Train, predict, serve by streaming 4. Train by batch, predict on mobile app 1.
  • 14. 15 © Cloudera, Inc. All rights reserved. Web Application DB Trained Model Execute training Extract feature Prediction result Activity log/ Contents data Feature Training result Feature Batch SystemAPI Server REST API User ID/ Item ID ML System Pattern 1: Train by batch, predict on the fly, serve via REST API 1.
  • 15. 16 © Cloudera, Inc. All rights reserved. Extract feature & Train/update model Extract feature & Predict Trained Model Activity log Export model as PMML Model building layer Predicting & serving layer Updated model CDSW Prediction results HDFSRequest to predict Load model Example architecture: PMML + OpenScoring 1.
  • 16. 17 © Cloudera, Inc. All rights reserved. Extract feature & Train/update model Extract feature & Predict Trained Model Activity log Save model on object storage Model building layer Predicting & serving layer Updated model Prediction results HDFSRequest to predict Load model Object storage Pack the runtime env with Docker CDSW Example architecture: Docker based API Server 1.
  • 17. 18 © Cloudera, Inc. All rights reserved. Web Application DB Trained Model Batch System Execute training Extract feature Prediction result Activity log/ Contents data Feature Training result Feature Serve prediction Training BatchPrediction Batch Pattern 2: Train by batch, predict by batch, serve through the shared DB 1.
  • 18. 19 © Cloudera, Inc. All rights reserved. Kudu/HBase Extract feature & Train/update model Extract feature & Predict Activity log Prediction results Model building & predicting layerServing layer Updated model Activity log Load trained model Prediction results HDFS CDSW Historical data Historical data Example architecture: Serving by HBase/Kudu Trained Model 1.
  • 19. 20 © Cloudera, Inc. All rights reserved. Web Application Trained Model Stream-based ML System (e.g. Spark Streaming) Train & Predict Extract feature Prediction results Recent log data Feature Model updates Model - Querying for prediction - Showing or sending alerts - This component may work with message queue like Kafka Messagequeue (e.g.Kafka) Log data Prediction results Pattern 3: Train, predict, serve by streaming 1.
  • 20. 21 © Cloudera, Inc. All rights reserved. Mobile Application DB Trained Model Batch System Execute training Extract feature Extract feature Request for prediction Activity logs/ Contents data Prediction result Activity log/ Contents data Feature Training resultFeature DB Trained Model Convert model Pattern 4: Train by batch, predict on a mobile app 1.
  • 21. 22 © Cloudera, Inc. All rights reserved. Extract feature & Train/update model Extract feature & Predict Trained Model Activity log Convert model to TFLite/CoreML Model building layer Predicting & serving layer Updated model Prediction results HDFSRequest to predict Load model Storage in a smart phone CDSW Example architecture: Serving on a mobile app 1.
  • 22. 23 © Cloudera, Inc. All rights reserved. Pattern 4’: Federated learning https://research.googleblog.com/2017/04/federated-learning- collaborative.html 1.
  • 23. 24 © Cloudera, Inc. All rights reserved. 4 patterns Comparison 1. Pattern 1 (REST API) Pattern 2 (Shared DB) Pattern 3 (Streaming) Pattern 4 (Mobile app) Training by batch by batch NRT (by streaming) by batch Prediction NRT (on the fly) by batch NRT (by streaming) NRT (on the fly) Prediction result delivery NRT (via REST API) NRT (through the shared DB) NRT (by streaming via MQ ) NRT (via in-process API on mobile) Latency for prediction from getting new data So so So so ~ Long Very low Low Required time to predict Short Long Short Short Tight/loose coupling with app Loose Loose Loose Tight Dependency of languages Independent Independent Independent Depends on frameworks System management difficulty So so Easy Very Hard So so NRT: Near real time
  • 24. 25 © Cloudera, Inc. All rights reserved. CI, CD and Blue Green deployment https://www.slideshare.net/hiroakikudo77/ss-84593653/14 1.
  • 25. 26 © Cloudera, Inc. All rights reserved. 1. 2. 3. Production MLOps
  • 26. 27 © Cloudera, Inc. All rights reserved. • /Feedback loop • • 2.
  • 27. 28 © Cloudera, Inc. All rights reserved. • • ) MeCab • • ) • • • /Feedback loop https://twitter.com/hagino3000/status/986257856730034177 2.
  • 28. 29 © Cloudera, Inc. All rights reserved. • • “safe to serve” & “desired prediction quality” [4] • (offline) (online) • “Silent failures” [3] • ) Join • ) • • • • serving 2.
  • 29. 30 © Cloudera, Inc. All rights reserved. • • • [1] • ) DVC, Bitemporal Modeling • [4] • ) • • [2,4] • [4] 2.
  • 30. 31 © Cloudera, Inc. All rights reserved. 1. 2. 3. Production MLOps
  • 31. 32 © Cloudera, Inc. All rights reserved. • • [7] • Google, Facebook [4, 9] • / • / • • Researcher, Dev, Ops: https://www.slideshare.net/syou6162/ss-88255142 3.
  • 32. 33 © Cloudera, Inc. All rights reserved. • IoT [8] • • (GDPR) 3.
  • 33. 34 © Cloudera, Inc. All rights reserved. • Data-driven product • • • • ML systems Production • • • •
  • 34. 35 © Cloudera, Inc. All rights reserved. • [1] “My model has higher BLEU, can I ship it? The Joel Test for machine learning systems”, L. Park, 2017, ACML-AIMLP Workshop • [2] “Hidden Technical Debt in Machine Learning Systems”, D. Sculley et al., NIPS’ 15 • [3] “Rules of Machine Learning: Best Practices for ML Engineering”, M. Zinkevich • [4] “TFX: A TensorFlow-Based Production-Scale Machine Learning Platform”, A. Naresh et al., KDD 2017 • [5] “What’s your ML test score? A rubric for ML production systems”, E. Breck et al., Reliable Machine Learning in the Wild - NIPS 2016 Workshop (2016) • [6] , 2017, ML Ops Study #1 • [7] , , 2018, HACKER TACKLE 2018 • [8] “DevOps for models: How to manage millions of models in production—and at the edge”, T. Tung et al., Strata Data Singapore, 2017 • [9] “Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective”, K. Hazelwood et al., IEEE HPCA, 2018