SlideShare une entreprise Scribd logo
1  sur  64
Télécharger pour lire hors ligne
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Containerized Machine Learning
on AWS
Asi f K h a n , Tech n i ca l Bu si n ess Devel opmen t Ma n a ger, AWS
H ok u to H osh i , H ea d of I n fra stru ctu re , C ook pa d I n c.
Yu i ch i ro Someya , Ma ch i n e Lea rn i n g En gi n eer, C ook pa d I n c.
November 28, 2017
CON309
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What are we going to talk about
• What is deep learning
• The lifecycle of a deep learning
application
• Deploying deep learning functions on
Amazon EC2 Container Service (Amazon
ECS)
• Cookpad Inc. use case
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Significantly improve many applications on multiple domains
“deep learning” trend in the past 10 years
image understanding speech recognition natural language
processing
…
Machine learning
autonomy
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Machine learning at Amazon
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The circle of machine learning (ML)
Front-end team
Data engineering team
Analysts/DS team
DevOps team
Business problem
Data
ML model
ML application
ML is great
How do we scale it?
How do we apply it?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Solution for building AI apps with CICD
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS AI services
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon ECS
EC2 INSTANCES
LOAD
BALANCER
Internet
ECS
AGENT
TASK
Container
TASK
Container
ECS
AGENT
TASK
Container
TASK
Container
AGENT COMMUNICATION
SERVICE
Amazon
ECS
API
CLUSTER MANAGEMENT
ENGINE
KEY/VALUE STORE
ECS
AGENT
TASK
Container
TASK
Container
LOAD
BALANCER
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS developer tools
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Let’s do this…
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data scientist
Jupyter
Notebook
Amazon
S3
1. Upload
model
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data scientist
Jupyter
Notebook
AWS
CodeCommit
2. Commit code
Application developer
Amazon
S3
1. Upload
model
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data scientist
Jupyter
Notebook
AWS
CodeCommit
2. Commit code
Application developer
Amazon
S3
1. Upload
model
Mobile client
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data scientist
Jupyter
Notebook
AWS
CodeCommit
2. Commit code
Application developer
Amazon
S3
1. Upload
model
Amazon ECR
Instance
Spot Instance
Application Load
Balancer
Amazon Route 53
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data scientist
Jupyter
Notebook
AWS
CodeCommit
2. Commit code
Application developer
Amazon
S3
1. Upload
model
Amazon Route 53
AWS
CodePipeline
Ops
AWS
CodeBuild
Amazon ECR
Instance
Spot Instance
Application Load
Balancer
Amazon Route 53
Ops
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data scientist
Jupyter
Notebook
AWS
CodeCommit
2. Commit code
Application developer
Amazon
S3
1. Upload
model
Amazon ECR
Instance
Spot Instance
Application Load
Balancer
Amazon Route 53
AWS
CodePipeline
Ops
AWS
CodeBuild
3.UploadMXNet+applicationimage
Ops
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data scientist
Jupyter
Notebook
AWS
CodeCommit
2. Commit code
Application developer
Amazon
S3
1. Upload
model
Amazon Route 53
AWS
CodePipeline
Ops
AWS
CodeBuild
3.UploadMXNet+applicationimage
4. Predict
Amazon ECR
Instance
Spot Instance
Application Load
Balancer
Amazon Route 53
Ops
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data scientist
Jupyter
Notebook
AWS
CodeCommit
2. Commit code
Application developer
Amazon
S3
1. Upload
model
Amazon ECR
Instance
Spot Instance
Application Load
Balancer
Amazon Route 53
AWS
CodePipeline
Ops
AWS
CodeBuild
4. Predict
[ "probability=0.115212, class=n04366367 suspension bridge” ]
3.UploadMXNet+applicationimage
Ops
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data scientist
Jupyter
Notebook
AWS
CodeCommit
2. Commit code
Application developer
Amazon
S3
1. Upload
model Amazon ECR
Instance
Spot Instance
Application Load
Balancer
Amazon Route 53
AWS
CodePipeline
Ops
AWS
CodeBuild
3.UploadMXNet+applicationimage
4. Predict
[ "probability=0.115212, class=n04366367 suspension bridge” ]
5.Uploaddata
Ops
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data scientist
Jupyter
Notebook
AWS
CodeCommit
2. Commit code
Application developer
Amazon
S3
1. Upload
model
Amazon ECR
Instance
Spot Instance
Application Load
Balancer
Amazon Route 53
AWS
CodePipeline
Ops
AWS
CodeBuild
3.UploadMXNet+applicationimage
4. Predict
[ "probability=0.115212, class=n04366367 suspension bridge” ]
5.Uploaddata
6.Updateservice
Ops
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Give it a spin
http://amzn.to/2zWpQij
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
‣ Hokuto Hoshi (@kani_b)
‣ Head of Infrastructure, Cookpad Inc.
‣ hokuto@cookpad.com
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is Cookpad?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
About Cookpad
• “Make everyday cooking fun!” - Since 1998
• https://cookpad.com/
• Largest online recipe sharing and search service in
Japan
Over 2.7M user-authored recipes
About 60M Monthly users in Japan
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• https://cookpad.com/#{your_country_code}
• us, uk, id, es, fr, br, ae, etc.…
Cookpad is global
67 countries
21 languages
Offices in Japan, UK, Spain, Indonesia etc.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Our infrastructure
All-in on AWS since 2011
~1,400 EC2 instances
200+ ECS services
15,000+ requests/sec
150+ developers
9 SREs
9 Machine learning engineers
Over 2 regions
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cooking Log (料理きろく)
Our very first Deep Learning powered feature
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cooking log
(料理きろく)
Collect "Food Photos" from
Camera Roll automatically
Powered by
Convolutional Neural Network
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
DEMO
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
12,000,000+
"food" photos
140,000+
Users
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• There were several new challenges
• Semi real-time image classification in production
• Different workloads from the rest of web applications
• Especially we needed:
• Scalable infrastructure for new workloads
• Environment isolation for new challenge
Our first deep learning feature
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• What we needed: Scalable infrastructure for massive photo uploading and semi real-time
classification
• Clients send tiny thumbnails after taking photos (difficult to predict traffic)
• Traffic spikes are coming sometimes (e.g. The TV show introduces our app)
• What we chose: Asynchronous architecture
• Uploading and classification take time (~ several hundred ms)
• Synchronous processing with API servers
gives users a bad experience
• Upload directly from clients to Amazon S3 using presigned URL
• Enable Amazon S3 notification and Amazon SQS for queue of classification
Scalable and asynchronous classification
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Environment isolation
• What we needed: Isolated environment in production
• Different languages (Python for Machine Learning, Ruby for web application)
• Different workloads (It’s our first product that uses deep learning)
• Different hardware (GPU)
• What we chose: Container environment
• The container ensures runtimes are isolated
• Language environment, GPU drivers, many configurations
• Amazon ECS provides managed and scalable Docker environment
• And we had already used containers on Amazon ECS!
• We run all classification in Amazon ECS cluster on g2.xlarge
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
API server
1. Request API server to
issue Amazon S3 presigned URL
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
API server
1. Request API server to
issue Amazon S3 presigned URL
2. Upload thumbnail
to Amazon S3 bucket
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
API server
1. Request API server to
issue Amazon S3 presigned URL
2. Upload thumbnail
to Amazon S3 bucket
{ bucket: thumbnails
key: 123.jpg }
3. Queue upload event
From Amazon S3 to Amazon SQS
(s3://thumbnails/123.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
API server
1. Request API server to
issue Amazon S3 presigned URL
2. Upload thumbnail
to Amazon S3 bucket
{ bucket: thumbnails
key: 123.jpg }
3. Queue upload event
From Amazon S3 to Amazon SQS
ECS cluster
4. Dequeue message
5. Download thumbnail
From Amazon S3
6. Classify thumbnail and
send result to API server
(s3://thumbnails/123.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
API server
1. Request API server to
issue S3 pre-signed url
{ bucket: thumbnails
key: 123.jpg }
ECS cluster
4. Dequeue message
5. Download thumbnail
From S3
6. Classify thumbnail and
send result to API server
Scalable without any code
2. Upload thumbnail
to Amazon S3 bucket
3. Queue upload event
From Amazon S3 to Amazon SQS
(s3://thumbnails/123.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
(s3://thumbnails/123.jpg)
API server
1. Request API server to
issue S3 pre-signed url
2. Upload thumbnail
to S3 bucket
{ bucket: thumbnails
key: 123.jpg }
3. Queue upload event
From S3 to SQS
5. Download thumbnail
From S3
6. Classify thumbnail and
send result to API server ECS cluster
4. Dequeue message
Scaling ECS tasks
by SQS queue length
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
(s3://thumbnails/123.jpg)
1. Request API server to
issue Amazon S3 presigned URL
2. Upload thumbnail
to Amazon S3 bucket
{ bucket: thumbnails
key: 123.jpg }
3. Queue upload event
From Amazon S3 to Amazon SQS
ECS cluster
4. Dequeue message
5. Download thumbnail
From Amazon S3
6. Classify thumbnail and
send result to API serverAPI server
Only need to wait for results from ECS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
API server
1. Request API server to
issue Amazon S3 presigned URL
2. Upload thumbnail
to Amazon S3 bucket
{ bucket: thumbnails
key: 123.jpg }
3. Queue upload event
from Amazon S3 to Amazon SQS
4. Dequeue message
5. Download thumbnail
From S3
6. Classify thumbnail and
send result to API server ECS cluster
What is happening in containers?
(s3://thumbnails/123.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Machine learning and infrastructure
Infrastructure to accelerate our machine learning projects.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
‣ Yuichiro Someya (ayemos)
‣ Machine Learning Engineer @ Cookpad Inc.
# 2016(new grads) ~ Current
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
API server
1. Request API server to
issue Amazon S3 presigned URL
2. Upload thumbnail
to S3 bucket
{ bucket: thumbnails
key: 123.jpg }
3. Queue upload event
From Amazon S3 to Amazon SQS
4. Dequeue message
5. Download thumbnail
From S3
6. Classify thumbnail and
send result to API server ECS cluster
What is happening in containers?
(s3://thumbnails/123.jpg)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ECS cluster
The task in detail
(Report results to externals)
Image classification!
Classifier
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
(Serialized)
classifier
• Fetch the classifier
• Dequeue from Amazon SQS
• Load the image from
Amazon S3
• Report the results back
ECS cluster(Serialized)
classifier
Deploy the classifier on Amazon ECS
Task definition?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Democratize task definitions
• hako: https://github.com/eagletmt/hako/
• Container deploy tool (Amazon ECS compatible)
• Use yaml as definition file format
• Each developer writes app.yml and sends Pull request
• 200+ applications are at work
scheduler:
type: ecs
region: ap-northeast-1
cluster: hako-production-g2
desired_count: 1
app:
image: food-photo-classifier
cpu: 128
memory: 3072
memory_reservation: 2048
env:
AWS_REGION: ap-northeast-1
COOKPADNET_ENV: production
...
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why we’re using hako
• `hako` behaves as abstract operation layer over Amazon ECS (or other docker manager)
• Higher-level operations: Deploy/Rollback/Stop/Remove
• Manages *secret* environment variables
• Pluggable pre/post development operations as `scripts`
• Operations like DNS settings, Consul registrations, and so on
• We want each developer can deploy tasks on Amazon ECS individually.
• `hako` handles Infra/SRE work around Amazon ECS.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ECS cluster(Serialized)
classifier
Deploy the model on Amazon ECS
$ hako deploy classifier.yml
• Fetch the classifier
• Dequeue from Amazon
SQS
• Load the image from
Amazon S3
• Report the results back
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ECS cluster (g2)
Task(Container)
GPU
NVIDIA Driver
(kernel modules)
NVIDIA Driver
(user libraries)
App, Worker
ECS and GPU
• Install drivers to clusters
(ref: https://github.com/NVIDIA/nvidia-docker/wiki/NVIDIA-driver )
• CUDA to the container
(CUDA  Driver version compatibility is relatively loose)
• GPU device files have to be visible and writable
• Privileged flag
(migrating to linux_parameters.devices option)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ECS cluster(Serialized)
classifier
Deploy the model on Amazon ECS
$ hako deploy classifier.yml
• Fetch the classifier
• Dequeue from Amazon
SQS
• Load the image from
Amazon S3
• Report the results back
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Food/Nonfood
image classifier
Food photos
from Cookpad
Random photos
from other datasets
(Nonfood)
Labeled image dataset
train
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Whole set
Food
(100,000 photos~)
Nonfood
(100,000 photos~)
?
It’s hard to obtain “Complement set”
{food} ≠ {non-food}
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Whole set
Food
Nonfood
{Food: 50%, Nonfood: 50%}
?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Whole set
Food
Nonfood
Plushies
Rebuild dataset
making use of posterior insights
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Food/Nonhood
image classifier
Food Photos
from Cookpad
Random Photos
from other datasets
(Nonfood)
Labeled image dataset
97.9% Accurate
(Precision: 97.6%, Recall: 95.8%)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
CUDA 8.0/cuDNN7
CUDA 9.0/cuDNN7
`ssh workbench-001`
New!
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cooking log
• Scalable food/non-food image classification
• Asynchronous and isolated architecture
• Containerized GPU workloads
• `hako` makes it easy to define and deploy applications
Machine learning infrastructure
• Great environment makes our research fast and creative!
• Managing multiple AMIs using Packer
• Dedicated Instances
• Operate instances via chat bot
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
https://info.cookpad.com/us
https://github.com/cookpad
We’re hiring!
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!

Contenu connexe

Tendances

How to Build Scalable Serverless Applications
How to Build Scalable Serverless ApplicationsHow to Build Scalable Serverless Applications
How to Build Scalable Serverless Applications
Amazon Web Services
 

Tendances (20)

IOT207_Panasonic—Building the Road of the Future on AWS
IOT207_Panasonic—Building the Road of the Future on AWSIOT207_Panasonic—Building the Road of the Future on AWS
IOT207_Panasonic—Building the Road of the Future on AWS
 
規劃大規模遷移到 AWS 的最佳實踐
規劃大規模遷移到 AWS 的最佳實踐規劃大規模遷移到 AWS 的最佳實踐
規劃大規模遷移到 AWS 的最佳實踐
 
CON318_Interstella 8888 Monolith to Microservices with Amazon ECS
CON318_Interstella 8888 Monolith to Microservices with Amazon ECSCON318_Interstella 8888 Monolith to Microservices with Amazon ECS
CON318_Interstella 8888 Monolith to Microservices with Amazon ECS
 
ALX328_Smart Devices Everywhere
ALX328_Smart Devices EverywhereALX328_Smart Devices Everywhere
ALX328_Smart Devices Everywhere
 
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
 
How to Build Scalable Serverless Applications
How to Build Scalable Serverless ApplicationsHow to Build Scalable Serverless Applications
How to Build Scalable Serverless Applications
 
Massively Parallel Data Processing with PyWren and AWS Lambda - SRV424 - re:I...
Massively Parallel Data Processing with PyWren and AWS Lambda - SRV424 - re:I...Massively Parallel Data Processing with PyWren and AWS Lambda - SRV424 - re:I...
Massively Parallel Data Processing with PyWren and AWS Lambda - SRV424 - re:I...
 
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
 
CMP314_Bringing Deep Learning to the Cloud with Amazon EC2
CMP314_Bringing Deep Learning to the Cloud with Amazon EC2CMP314_Bringing Deep Learning to the Cloud with Amazon EC2
CMP314_Bringing Deep Learning to the Cloud with Amazon EC2
 
SID302_Force Multiply Your Security Team with Automation and Alexa
SID302_Force Multiply Your Security Team with Automation and AlexaSID302_Force Multiply Your Security Team with Automation and Alexa
SID302_Force Multiply Your Security Team with Automation and Alexa
 
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017
 
Introduction to AWS Fargate & Amazon Elastic Container Service for Kubernetes
Introduction to AWS Fargate & Amazon Elastic Container Service for KubernetesIntroduction to AWS Fargate & Amazon Elastic Container Service for Kubernetes
Introduction to AWS Fargate & Amazon Elastic Container Service for Kubernetes
 
ALX306_Build a Game Skill for Echo Buttons!
ALX306_Build a Game Skill for Echo Buttons!ALX306_Build a Game Skill for Echo Buttons!
ALX306_Build a Game Skill for Echo Buttons!
 
Scaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersScaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million Users
 
Optimising Cost and Efficiency on AWS
Optimising Cost and Efficiency on AWSOptimising Cost and Efficiency on AWS
Optimising Cost and Efficiency on AWS
 
SID344-Soup to Nuts Identity Federation for AWS
SID344-Soup to Nuts Identity Federation for AWSSID344-Soup to Nuts Identity Federation for AWS
SID344-Soup to Nuts Identity Federation for AWS
 
NEW LAUNCH! Introducing Amazon EKS - CON215 - re:Invent 2017
NEW LAUNCH! Introducing Amazon EKS - CON215 - re:Invent 2017NEW LAUNCH! Introducing Amazon EKS - CON215 - re:Invent 2017
NEW LAUNCH! Introducing Amazon EKS - CON215 - re:Invent 2017
 
WIN204-Simplifying Microsoft Architectures with AWS Services
WIN204-Simplifying Microsoft Architectures with AWS ServicesWIN204-Simplifying Microsoft Architectures with AWS Services
WIN204-Simplifying Microsoft Architectures with AWS Services
 
NEW LAUNCH! Deep dive on Amazon Neptune - DAT318 - re:Invent 2017
NEW LAUNCH! Deep dive on Amazon Neptune - DAT318 - re:Invent 2017NEW LAUNCH! Deep dive on Amazon Neptune - DAT318 - re:Invent 2017
NEW LAUNCH! Deep dive on Amazon Neptune - DAT318 - re:Invent 2017
 
ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...
ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...
ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...
 

Similaire à CON309_Containerized Machine Learning on AWS

Similaire à CON309_Containerized Machine Learning on AWS (20)

Moving to Amazon ECS – the Not-So-Obvious Benefits - CON356 - re:Invent 2017
Moving to Amazon ECS – the Not-So-Obvious Benefits - CON356 - re:Invent 2017Moving to Amazon ECS – the Not-So-Obvious Benefits - CON356 - re:Invent 2017
Moving to Amazon ECS – the Not-So-Obvious Benefits - CON356 - re:Invent 2017
 
Create a Serverless Image Processing Platform
Create a Serverless Image Processing PlatformCreate a Serverless Image Processing Platform
Create a Serverless Image Processing Platform
 
Machine Learning State of the Union - MCL210 - re:Invent 2017
Machine Learning State of the Union - MCL210 - re:Invent 2017Machine Learning State of the Union - MCL210 - re:Invent 2017
Machine Learning State of the Union - MCL210 - re:Invent 2017
 
Introducing Amazon Fargate
Introducing Amazon FargateIntroducing Amazon Fargate
Introducing Amazon Fargate
 
Genomics on aws-webinar-april2018
Genomics on aws-webinar-april2018Genomics on aws-webinar-april2018
Genomics on aws-webinar-april2018
 
Managing Container Images with Amazon ECR - AWS Online Tech Talks
Managing Container Images with Amazon ECR - AWS Online Tech TalksManaging Container Images with Amazon ECR - AWS Online Tech Talks
Managing Container Images with Amazon ECR - AWS Online Tech Talks
 
Driving Innovation with Containers - CON203 - re:Invent 2017
Driving Innovation with Containers - CON203 - re:Invent 2017Driving Innovation with Containers - CON203 - re:Invent 2017
Driving Innovation with Containers - CON203 - re:Invent 2017
 
Create a Serverless Image Processing Platform - ARC326 - re:Invent 2017
Create a Serverless Image Processing Platform - ARC326 - re:Invent 2017Create a Serverless Image Processing Platform - ARC326 - re:Invent 2017
Create a Serverless Image Processing Platform - ARC326 - re:Invent 2017
 
Interstella 8888: CICD for Containers on AWS - CON319 - re:Invent 2017
Interstella 8888: CICD for Containers on AWS - CON319 - re:Invent 2017Interstella 8888: CICD for Containers on AWS - CON319 - re:Invent 2017
Interstella 8888: CICD for Containers on AWS - CON319 - re:Invent 2017
 
CON319_Interstella GTC CICD for Containers on AWS
CON319_Interstella GTC CICD for Containers on AWSCON319_Interstella GTC CICD for Containers on AWS
CON319_Interstella GTC CICD for Containers on AWS
 
Devoxx: Building AI-powered applications on AWS
Devoxx: Building AI-powered applications on AWSDevoxx: Building AI-powered applications on AWS
Devoxx: Building AI-powered applications on AWS
 
High-Throughput Genomics on AWS - LFS309 - re:Invent 2017
High-Throughput Genomics on AWS - LFS309 - re:Invent 2017High-Throughput Genomics on AWS - LFS309 - re:Invent 2017
High-Throughput Genomics on AWS - LFS309 - re:Invent 2017
 
LFS309-High-Throughput Genomics on AWS.pdf
LFS309-High-Throughput Genomics on AWS.pdfLFS309-High-Throughput Genomics on AWS.pdf
LFS309-High-Throughput Genomics on AWS.pdf
 
AWS 容器服務入門實務
AWS 容器服務入門實務AWS 容器服務入門實務
AWS 容器服務入門實務
 
Amazon ECS Deep Dive
Amazon ECS Deep DiveAmazon ECS Deep Dive
Amazon ECS Deep Dive
 
Amazon Amazon Elastic Container Service (Amazon ECS)
Amazon Amazon Elastic Container Service (Amazon ECS)Amazon Amazon Elastic Container Service (Amazon ECS)
Amazon Amazon Elastic Container Service (Amazon ECS)
 
Batch Processing with Containers on AWS - CON304 - re:Invent 2017
Batch Processing with Containers on AWS - CON304 - re:Invent 2017Batch Processing with Containers on AWS - CON304 - re:Invent 2017
Batch Processing with Containers on AWS - CON304 - re:Invent 2017
 
Building Web Apps on AWS
Building Web Apps on AWSBuilding Web Apps on AWS
Building Web Apps on AWS
 
Serverless in Action on AWS
Serverless in Action on AWSServerless in Action on AWS
Serverless in Action on AWS
 
Design, Build, and Modernize Your Web Applications with AWS
 Design, Build, and Modernize Your Web Applications with AWS Design, Build, and Modernize Your Web Applications with AWS
Design, Build, and Modernize Your Web Applications with AWS
 

Plus de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

CON309_Containerized Machine Learning on AWS

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Containerized Machine Learning on AWS Asi f K h a n , Tech n i ca l Bu si n ess Devel opmen t Ma n a ger, AWS H ok u to H osh i , H ea d of I n fra stru ctu re , C ook pa d I n c. Yu i ch i ro Someya , Ma ch i n e Lea rn i n g En gi n eer, C ook pa d I n c. November 28, 2017 CON309
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What are we going to talk about • What is deep learning • The lifecycle of a deep learning application • Deploying deep learning functions on Amazon EC2 Container Service (Amazon ECS) • Cookpad Inc. use case
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Significantly improve many applications on multiple domains “deep learning” trend in the past 10 years image understanding speech recognition natural language processing … Machine learning autonomy
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Machine learning at Amazon
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The circle of machine learning (ML) Front-end team Data engineering team Analysts/DS team DevOps team Business problem Data ML model ML application ML is great How do we scale it? How do we apply it?
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Solution for building AI apps with CICD
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS AI services
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon ECS EC2 INSTANCES LOAD BALANCER Internet ECS AGENT TASK Container TASK Container ECS AGENT TASK Container TASK Container AGENT COMMUNICATION SERVICE Amazon ECS API CLUSTER MANAGEMENT ENGINE KEY/VALUE STORE ECS AGENT TASK Container TASK Container LOAD BALANCER
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS developer tools
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Let’s do this…
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data scientist Jupyter Notebook Amazon S3 1. Upload model
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data scientist Jupyter Notebook AWS CodeCommit 2. Commit code Application developer Amazon S3 1. Upload model
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data scientist Jupyter Notebook AWS CodeCommit 2. Commit code Application developer Amazon S3 1. Upload model Mobile client
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data scientist Jupyter Notebook AWS CodeCommit 2. Commit code Application developer Amazon S3 1. Upload model Amazon ECR Instance Spot Instance Application Load Balancer Amazon Route 53
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data scientist Jupyter Notebook AWS CodeCommit 2. Commit code Application developer Amazon S3 1. Upload model Amazon Route 53 AWS CodePipeline Ops AWS CodeBuild Amazon ECR Instance Spot Instance Application Load Balancer Amazon Route 53 Ops
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data scientist Jupyter Notebook AWS CodeCommit 2. Commit code Application developer Amazon S3 1. Upload model Amazon ECR Instance Spot Instance Application Load Balancer Amazon Route 53 AWS CodePipeline Ops AWS CodeBuild 3.UploadMXNet+applicationimage Ops
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data scientist Jupyter Notebook AWS CodeCommit 2. Commit code Application developer Amazon S3 1. Upload model Amazon Route 53 AWS CodePipeline Ops AWS CodeBuild 3.UploadMXNet+applicationimage 4. Predict Amazon ECR Instance Spot Instance Application Load Balancer Amazon Route 53 Ops
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data scientist Jupyter Notebook AWS CodeCommit 2. Commit code Application developer Amazon S3 1. Upload model Amazon ECR Instance Spot Instance Application Load Balancer Amazon Route 53 AWS CodePipeline Ops AWS CodeBuild 4. Predict [ "probability=0.115212, class=n04366367 suspension bridge” ] 3.UploadMXNet+applicationimage Ops
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data scientist Jupyter Notebook AWS CodeCommit 2. Commit code Application developer Amazon S3 1. Upload model Amazon ECR Instance Spot Instance Application Load Balancer Amazon Route 53 AWS CodePipeline Ops AWS CodeBuild 3.UploadMXNet+applicationimage 4. Predict [ "probability=0.115212, class=n04366367 suspension bridge” ] 5.Uploaddata Ops
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data scientist Jupyter Notebook AWS CodeCommit 2. Commit code Application developer Amazon S3 1. Upload model Amazon ECR Instance Spot Instance Application Load Balancer Amazon Route 53 AWS CodePipeline Ops AWS CodeBuild 3.UploadMXNet+applicationimage 4. Predict [ "probability=0.115212, class=n04366367 suspension bridge” ] 5.Uploaddata 6.Updateservice Ops
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Give it a spin http://amzn.to/2zWpQij
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ‣ Hokuto Hoshi (@kani_b) ‣ Head of Infrastructure, Cookpad Inc. ‣ hokuto@cookpad.com
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is Cookpad?
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. About Cookpad • “Make everyday cooking fun!” - Since 1998 • https://cookpad.com/ • Largest online recipe sharing and search service in Japan Over 2.7M user-authored recipes About 60M Monthly users in Japan
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • https://cookpad.com/#{your_country_code} • us, uk, id, es, fr, br, ae, etc.… Cookpad is global 67 countries 21 languages Offices in Japan, UK, Spain, Indonesia etc.
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Our infrastructure All-in on AWS since 2011 ~1,400 EC2 instances 200+ ECS services 15,000+ requests/sec 150+ developers 9 SREs 9 Machine learning engineers Over 2 regions
  • 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cooking Log (料理きろく) Our very first Deep Learning powered feature
  • 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cooking log (料理きろく) Collect "Food Photos" from Camera Roll automatically Powered by Convolutional Neural Network
  • 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. DEMO
  • 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 12,000,000+ "food" photos 140,000+ Users
  • 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • There were several new challenges • Semi real-time image classification in production • Different workloads from the rest of web applications • Especially we needed: • Scalable infrastructure for new workloads • Environment isolation for new challenge Our first deep learning feature
  • 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • What we needed: Scalable infrastructure for massive photo uploading and semi real-time classification • Clients send tiny thumbnails after taking photos (difficult to predict traffic) • Traffic spikes are coming sometimes (e.g. The TV show introduces our app) • What we chose: Asynchronous architecture • Uploading and classification take time (~ several hundred ms) • Synchronous processing with API servers gives users a bad experience • Upload directly from clients to Amazon S3 using presigned URL • Enable Amazon S3 notification and Amazon SQS for queue of classification Scalable and asynchronous classification
  • 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Environment isolation • What we needed: Isolated environment in production • Different languages (Python for Machine Learning, Ruby for web application) • Different workloads (It’s our first product that uses deep learning) • Different hardware (GPU) • What we chose: Container environment • The container ensures runtimes are isolated • Language environment, GPU drivers, many configurations • Amazon ECS provides managed and scalable Docker environment • And we had already used containers on Amazon ECS! • We run all classification in Amazon ECS cluster on g2.xlarge
  • 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. API server 1. Request API server to issue Amazon S3 presigned URL
  • 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. API server 1. Request API server to issue Amazon S3 presigned URL 2. Upload thumbnail to Amazon S3 bucket
  • 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. API server 1. Request API server to issue Amazon S3 presigned URL 2. Upload thumbnail to Amazon S3 bucket { bucket: thumbnails key: 123.jpg } 3. Queue upload event From Amazon S3 to Amazon SQS (s3://thumbnails/123.jpg)
  • 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. API server 1. Request API server to issue Amazon S3 presigned URL 2. Upload thumbnail to Amazon S3 bucket { bucket: thumbnails key: 123.jpg } 3. Queue upload event From Amazon S3 to Amazon SQS ECS cluster 4. Dequeue message 5. Download thumbnail From Amazon S3 6. Classify thumbnail and send result to API server (s3://thumbnails/123.jpg)
  • 41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. API server 1. Request API server to issue S3 pre-signed url { bucket: thumbnails key: 123.jpg } ECS cluster 4. Dequeue message 5. Download thumbnail From S3 6. Classify thumbnail and send result to API server Scalable without any code 2. Upload thumbnail to Amazon S3 bucket 3. Queue upload event From Amazon S3 to Amazon SQS (s3://thumbnails/123.jpg)
  • 42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. (s3://thumbnails/123.jpg) API server 1. Request API server to issue S3 pre-signed url 2. Upload thumbnail to S3 bucket { bucket: thumbnails key: 123.jpg } 3. Queue upload event From S3 to SQS 5. Download thumbnail From S3 6. Classify thumbnail and send result to API server ECS cluster 4. Dequeue message Scaling ECS tasks by SQS queue length
  • 43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. (s3://thumbnails/123.jpg) 1. Request API server to issue Amazon S3 presigned URL 2. Upload thumbnail to Amazon S3 bucket { bucket: thumbnails key: 123.jpg } 3. Queue upload event From Amazon S3 to Amazon SQS ECS cluster 4. Dequeue message 5. Download thumbnail From Amazon S3 6. Classify thumbnail and send result to API serverAPI server Only need to wait for results from ECS
  • 44. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. API server 1. Request API server to issue Amazon S3 presigned URL 2. Upload thumbnail to Amazon S3 bucket { bucket: thumbnails key: 123.jpg } 3. Queue upload event from Amazon S3 to Amazon SQS 4. Dequeue message 5. Download thumbnail From S3 6. Classify thumbnail and send result to API server ECS cluster What is happening in containers? (s3://thumbnails/123.jpg)
  • 45. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Machine learning and infrastructure Infrastructure to accelerate our machine learning projects.
  • 46. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ‣ Yuichiro Someya (ayemos) ‣ Machine Learning Engineer @ Cookpad Inc. # 2016(new grads) ~ Current
  • 47. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. API server 1. Request API server to issue Amazon S3 presigned URL 2. Upload thumbnail to S3 bucket { bucket: thumbnails key: 123.jpg } 3. Queue upload event From Amazon S3 to Amazon SQS 4. Dequeue message 5. Download thumbnail From S3 6. Classify thumbnail and send result to API server ECS cluster What is happening in containers? (s3://thumbnails/123.jpg)
  • 48. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ECS cluster The task in detail (Report results to externals) Image classification! Classifier
  • 49. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. (Serialized) classifier • Fetch the classifier • Dequeue from Amazon SQS • Load the image from Amazon S3 • Report the results back ECS cluster(Serialized) classifier Deploy the classifier on Amazon ECS Task definition?
  • 50. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Democratize task definitions • hako: https://github.com/eagletmt/hako/ • Container deploy tool (Amazon ECS compatible) • Use yaml as definition file format • Each developer writes app.yml and sends Pull request • 200+ applications are at work scheduler: type: ecs region: ap-northeast-1 cluster: hako-production-g2 desired_count: 1 app: image: food-photo-classifier cpu: 128 memory: 3072 memory_reservation: 2048 env: AWS_REGION: ap-northeast-1 COOKPADNET_ENV: production ...
  • 51. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why we’re using hako • `hako` behaves as abstract operation layer over Amazon ECS (or other docker manager) • Higher-level operations: Deploy/Rollback/Stop/Remove • Manages *secret* environment variables • Pluggable pre/post development operations as `scripts` • Operations like DNS settings, Consul registrations, and so on • We want each developer can deploy tasks on Amazon ECS individually. • `hako` handles Infra/SRE work around Amazon ECS.
  • 52. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ECS cluster(Serialized) classifier Deploy the model on Amazon ECS $ hako deploy classifier.yml • Fetch the classifier • Dequeue from Amazon SQS • Load the image from Amazon S3 • Report the results back
  • 53. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ECS cluster (g2) Task(Container) GPU NVIDIA Driver (kernel modules) NVIDIA Driver (user libraries) App, Worker ECS and GPU • Install drivers to clusters (ref: https://github.com/NVIDIA/nvidia-docker/wiki/NVIDIA-driver ) • CUDA to the container (CUDA  Driver version compatibility is relatively loose) • GPU device files have to be visible and writable • Privileged flag (migrating to linux_parameters.devices option)
  • 54. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ECS cluster(Serialized) classifier Deploy the model on Amazon ECS $ hako deploy classifier.yml • Fetch the classifier • Dequeue from Amazon SQS • Load the image from Amazon S3 • Report the results back
  • 55. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Food/Nonfood image classifier Food photos from Cookpad Random photos from other datasets (Nonfood) Labeled image dataset train
  • 56. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Whole set Food (100,000 photos~) Nonfood (100,000 photos~) ? It’s hard to obtain “Complement set” {food} ≠ {non-food}
  • 57. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Whole set Food Nonfood {Food: 50%, Nonfood: 50%} ?
  • 58. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Whole set Food Nonfood Plushies Rebuild dataset making use of posterior insights
  • 59. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Food/Nonhood image classifier Food Photos from Cookpad Random Photos from other datasets (Nonfood) Labeled image dataset 97.9% Accurate (Precision: 97.6%, Recall: 95.8%)
  • 60. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 61. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. CUDA 8.0/cuDNN7 CUDA 9.0/cuDNN7 `ssh workbench-001` New!
  • 62. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cooking log • Scalable food/non-food image classification • Asynchronous and isolated architecture • Containerized GPU workloads • `hako` makes it easy to define and deploy applications Machine learning infrastructure • Great environment makes our research fast and creative! • Managing multiple AMIs using Packer • Dedicated Instances • Operate instances via chat bot
  • 63. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. https://info.cookpad.com/us https://github.com/cookpad We’re hiring!
  • 64. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you!