Soumettre la recherche
Mettre en ligne
[AWS Tech Talk] Using containers for deep learning workflows
•
1 j'aime
•
75 vues
S
shashank4
Suivre
Slides supporting the following webinar: https://www.youtube.com/watch?v=wbDVGAbd_dM
Lire moins
Lire la suite
Technologie
Signaler
Partager
Signaler
Partager
1 sur 22
Recommandé
<iframe src="http://video.yandex.ru/iframe/ya-events/0ro6nfi3fv.5216/" hei...
<iframe src="http://video.yandex.ru/iframe/ya-events/0ro6nfi3fv.5216/" hei...
Yandex
Build, train, and deploy Machine Learning models at scale (May 2018)
Build, train, and deploy Machine Learning models at scale (May 2018)
Julien SIMON
Using Docker For Development
Using Docker For Development
Laura Frank Tacho
Microsoft Azure News - 2018 December
Microsoft Azure News - 2018 December
Daniel Toomey
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Indrajit Poddar
Kubernetes for the VI Admin
Kubernetes for the VI Admin
Kendrick Coleman
Deploying a Kubernetes App with Amazon EKS
Deploying a Kubernetes App with Amazon EKS
Laura Frank Tacho
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn
Yahoo Developer Network
Recommandé
<iframe src="http://video.yandex.ru/iframe/ya-events/0ro6nfi3fv.5216/" hei...
<iframe src="http://video.yandex.ru/iframe/ya-events/0ro6nfi3fv.5216/" hei...
Yandex
Build, train, and deploy Machine Learning models at scale (May 2018)
Build, train, and deploy Machine Learning models at scale (May 2018)
Julien SIMON
Using Docker For Development
Using Docker For Development
Laura Frank Tacho
Microsoft Azure News - 2018 December
Microsoft Azure News - 2018 December
Daniel Toomey
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Indrajit Poddar
Kubernetes for the VI Admin
Kubernetes for the VI Admin
Kendrick Coleman
Deploying a Kubernetes App with Amazon EKS
Deploying a Kubernetes App with Amazon EKS
Laura Frank Tacho
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn
Yahoo Developer Network
20200219 AWS Black Belt Online Seminar オンプレミスとAWS間の冗長化接続
20200219 AWS Black Belt Online Seminar オンプレミスとAWS間の冗長化接続
Amazon Web Services Japan
AWS FIS の実験テンプレートを書いてみよう!!
AWS FIS の実験テンプレートを書いてみよう!!
政雄 金森
Moving to Containers: Building with Docker and Amazon ECS - CON310 - re:Inven...
Moving to Containers: Building with Docker and Amazon ECS - CON310 - re:Inven...
Amazon Web Services
Build, Train & Deploy Machine Learning Models at Scale
Build, Train & Deploy Machine Learning Models at Scale
Amazon Web Services
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
Amazon Web Services
Cloud-Native Application and Kubernetes
Cloud-Native Application and Kubernetes
Alex Glikson
[AWS Dev Day] 앱 현대화 | AWS Fargate를 사용한 서버리스 컨테이너 활용 하기 - 삼성전자 개발자 포털 사례 - 정영준...
[AWS Dev Day] 앱 현대화 | AWS Fargate를 사용한 서버리스 컨테이너 활용 하기 - 삼성전자 개발자 포털 사례 - 정영준...
Amazon Web Services Korea
Linux Administration Training | Linux Administration Will Never Go Out Of Fas...
Linux Administration Training | Linux Administration Will Never Go Out Of Fas...
Edureka!
Aws container webinar day 1
Aws container webinar day 1
HoseokSeo7
Netflix in the Cloud
Netflix in the Cloud
Adrian Cockcroft
Azure from scratch part 4
Azure from scratch part 4
Girish Kalamati
Docker Based Hadoop Provisioning
Docker Based Hadoop Provisioning
DataWorks Summit
20191201 kubernetes managed weblogic revival - part 2
20191201 kubernetes managed weblogic revival - part 2
makker_nl
Amazon Web Services EC2 Basics
Amazon Web Services EC2 Basics
Onur ŞALK
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
Provectus
Pitt Immersion Day Module 2 - ec2 overview
Pitt Immersion Day Module 2 - ec2 overview
EagleDream Technologies
AWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at Netflix
Adrian Cockcroft
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
Altoros
Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)
Julien SIMON
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
Adrian Cockcroft
Setting up custom machine learning environments on AWS - AIM309 - New York AW...
Setting up custom machine learning environments on AWS - AIM309 - New York AW...
Amazon Web Services
Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019
Arun Gupta
Contenu connexe
Tendances
20200219 AWS Black Belt Online Seminar オンプレミスとAWS間の冗長化接続
20200219 AWS Black Belt Online Seminar オンプレミスとAWS間の冗長化接続
Amazon Web Services Japan
AWS FIS の実験テンプレートを書いてみよう!!
AWS FIS の実験テンプレートを書いてみよう!!
政雄 金森
Moving to Containers: Building with Docker and Amazon ECS - CON310 - re:Inven...
Moving to Containers: Building with Docker and Amazon ECS - CON310 - re:Inven...
Amazon Web Services
Build, Train & Deploy Machine Learning Models at Scale
Build, Train & Deploy Machine Learning Models at Scale
Amazon Web Services
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
Amazon Web Services
Cloud-Native Application and Kubernetes
Cloud-Native Application and Kubernetes
Alex Glikson
[AWS Dev Day] 앱 현대화 | AWS Fargate를 사용한 서버리스 컨테이너 활용 하기 - 삼성전자 개발자 포털 사례 - 정영준...
[AWS Dev Day] 앱 현대화 | AWS Fargate를 사용한 서버리스 컨테이너 활용 하기 - 삼성전자 개발자 포털 사례 - 정영준...
Amazon Web Services Korea
Linux Administration Training | Linux Administration Will Never Go Out Of Fas...
Linux Administration Training | Linux Administration Will Never Go Out Of Fas...
Edureka!
Aws container webinar day 1
Aws container webinar day 1
HoseokSeo7
Netflix in the Cloud
Netflix in the Cloud
Adrian Cockcroft
Azure from scratch part 4
Azure from scratch part 4
Girish Kalamati
Docker Based Hadoop Provisioning
Docker Based Hadoop Provisioning
DataWorks Summit
20191201 kubernetes managed weblogic revival - part 2
20191201 kubernetes managed weblogic revival - part 2
makker_nl
Amazon Web Services EC2 Basics
Amazon Web Services EC2 Basics
Onur ŞALK
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
Provectus
Pitt Immersion Day Module 2 - ec2 overview
Pitt Immersion Day Module 2 - ec2 overview
EagleDream Technologies
AWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at Netflix
Adrian Cockcroft
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
Altoros
Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)
Julien SIMON
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
Adrian Cockcroft
Tendances
(20)
20200219 AWS Black Belt Online Seminar オンプレミスとAWS間の冗長化接続
20200219 AWS Black Belt Online Seminar オンプレミスとAWS間の冗長化接続
AWS FIS の実験テンプレートを書いてみよう!!
AWS FIS の実験テンプレートを書いてみよう!!
Moving to Containers: Building with Docker and Amazon ECS - CON310 - re:Inven...
Moving to Containers: Building with Docker and Amazon ECS - CON310 - re:Inven...
Build, Train & Deploy Machine Learning Models at Scale
Build, Train & Deploy Machine Learning Models at Scale
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
Cloud-Native Application and Kubernetes
Cloud-Native Application and Kubernetes
[AWS Dev Day] 앱 현대화 | AWS Fargate를 사용한 서버리스 컨테이너 활용 하기 - 삼성전자 개발자 포털 사례 - 정영준...
[AWS Dev Day] 앱 현대화 | AWS Fargate를 사용한 서버리스 컨테이너 활용 하기 - 삼성전자 개발자 포털 사례 - 정영준...
Linux Administration Training | Linux Administration Will Never Go Out Of Fas...
Linux Administration Training | Linux Administration Will Never Go Out Of Fas...
Aws container webinar day 1
Aws container webinar day 1
Netflix in the Cloud
Netflix in the Cloud
Azure from scratch part 4
Azure from scratch part 4
Docker Based Hadoop Provisioning
Docker Based Hadoop Provisioning
20191201 kubernetes managed weblogic revival - part 2
20191201 kubernetes managed weblogic revival - part 2
Amazon Web Services EC2 Basics
Amazon Web Services EC2 Basics
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
Pitt Immersion Day Module 2 - ec2 overview
Pitt Immersion Day Module 2 - ec2 overview
AWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at Netflix
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
Similaire à [AWS Tech Talk] Using containers for deep learning workflows
Setting up custom machine learning environments on AWS - AIM309 - New York AW...
Setting up custom machine learning environments on AWS - AIM309 - New York AW...
Amazon Web Services
Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019
Arun Gupta
Deep Dive on Amazon Elastic Container Service (ECS) | AWS Summit Tel Aviv 2019
Deep Dive on Amazon Elastic Container Service (ECS) | AWS Summit Tel Aviv 2019
Amazon Web Services
Deep Dive on Amazon Elastic Container Service (ECS) | AWS Summit Tel Aviv 2019
Deep Dive on Amazon Elastic Container Service (ECS) | AWS Summit Tel Aviv 2019
AWS Summits
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Amazon Web Services
MXNet Paris Workshop - Intro To MXNet
MXNet Paris Workshop - Intro To MXNet
Apache MXNet
Optimize your Machine Learning workloads | AWS Summit Tel Aviv 2019
Optimize your Machine Learning workloads | AWS Summit Tel Aviv 2019
Amazon Web Services
Optimize your Machine Learning workloads | AWS Summit Tel Aviv 2019
Optimize your Machine Learning workloads | AWS Summit Tel Aviv 2019
AWS Summits
Amazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for Kubernetes
Amazon Web Services
Optimize your machine learning workloads on AWS (March 2019)
Optimize your machine learning workloads on AWS (March 2019)
Julien SIMON
Breaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdf
Amazon Web Services
Architecting security and governance through policy guardrails in Amazon EKS ...
Architecting security and governance through policy guardrails in Amazon EKS ...
Amazon Web Services
Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)
Julien SIMON
Breaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdf
Amazon Web Services
Expert Tips for Successful Kubernetes Deployment on AWS
Expert Tips for Successful Kubernetes Deployment on AWS
Amazon Web Services
Cloud-Native Operations with Kubernetes and CI/CD
Cloud-Native Operations with Kubernetes and CI/CD
VMware Tanzu
Run Your CI/CD and Test Workloads for 90% Less with Amazon EC2 Spot Instances...
Run Your CI/CD and Test Workloads for 90% Less with Amazon EC2 Spot Instances...
Amazon Web Services
AWS Container Services – 유재석 (AWS 솔루션즈 아키텍트)
AWS Container Services – 유재석 (AWS 솔루션즈 아키텍트)
Amazon Web Services Korea
Amazon Container Services – 유재석 (AWS 솔루션즈 아키텍트)
Amazon Container Services – 유재석 (AWS 솔루션즈 아키텍트)
Amazon Web Services Korea
Building a Recommender System Using Amazon SageMaker's Factorization Machine ...
Building a Recommender System Using Amazon SageMaker's Factorization Machine ...
Amazon Web Services
Similaire à [AWS Tech Talk] Using containers for deep learning workflows
(20)
Setting up custom machine learning environments on AWS - AIM309 - New York AW...
Setting up custom machine learning environments on AWS - AIM309 - New York AW...
Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019
Deep Dive on Amazon Elastic Container Service (ECS) | AWS Summit Tel Aviv 2019
Deep Dive on Amazon Elastic Container Service (ECS) | AWS Summit Tel Aviv 2019
Deep Dive on Amazon Elastic Container Service (ECS) | AWS Summit Tel Aviv 2019
Deep Dive on Amazon Elastic Container Service (ECS) | AWS Summit Tel Aviv 2019
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
MXNet Paris Workshop - Intro To MXNet
MXNet Paris Workshop - Intro To MXNet
Optimize your Machine Learning workloads | AWS Summit Tel Aviv 2019
Optimize your Machine Learning workloads | AWS Summit Tel Aviv 2019
Optimize your Machine Learning workloads | AWS Summit Tel Aviv 2019
Optimize your Machine Learning workloads | AWS Summit Tel Aviv 2019
Amazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for Kubernetes
Optimize your machine learning workloads on AWS (March 2019)
Optimize your machine learning workloads on AWS (March 2019)
Breaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdf
Architecting security and governance through policy guardrails in Amazon EKS ...
Architecting security and governance through policy guardrails in Amazon EKS ...
Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)
Breaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdf
Expert Tips for Successful Kubernetes Deployment on AWS
Expert Tips for Successful Kubernetes Deployment on AWS
Cloud-Native Operations with Kubernetes and CI/CD
Cloud-Native Operations with Kubernetes and CI/CD
Run Your CI/CD and Test Workloads for 90% Less with Amazon EC2 Spot Instances...
Run Your CI/CD and Test Workloads for 90% Less with Amazon EC2 Spot Instances...
AWS Container Services – 유재석 (AWS 솔루션즈 아키텍트)
AWS Container Services – 유재석 (AWS 솔루션즈 아키텍트)
Amazon Container Services – 유재석 (AWS 솔루션즈 아키텍트)
Amazon Container Services – 유재석 (AWS 솔루션즈 아키텍트)
Building a Recommender System Using Amazon SageMaker's Factorization Machine ...
Building a Recommender System Using Amazon SageMaker's Factorization Machine ...
Dernier
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
naman860154
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
RTylerCroy
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Martijn de Jong
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Michael W. Hawkins
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Puma Security, LLC
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Delhi Call girls
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Drew Madelung
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Safe Software
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Delhi Call girls
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
The Digital Insurer
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
debabhi2
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Neo4j
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
wesley chun
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
Enterprise Knowledge
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
Results
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
Dernier
(20)
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
[AWS Tech Talk] Using containers for deep learning workflows
1.
© 2019, Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Shashank Prasanna, Sr. Technical Evangelist, AI/ML 30th September 2019 Using Containers for Deep Learning Workflows
2.
© 2019, Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Agenda Common deep learning setups and challenges Using containers for deep learning workflows • Demo 1: Containers for deep learning training workflows Scaling deep learning training • Demo 2: Submitting training jobs using containers to Amazon Elastic Kubernetes Services (Amazon EKS) • Demo 3: Running large-scale experiments using containers on Amazon SageMaker Summary and Q&A
3.
© 2019, Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Common machine learning setups 1. Code & frameworks 2. Compute (CPUs, GPUs) 3. Storage CLI EC2 instance DL AMI Amazon S3 CLI On-premises
4.
© 2019, Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Deep learning workflow Data acquisition curation and labeling Data preparation for training Large-scale experimentation Distributed training Model optimization and validation Deployment Need for scale
5.
© 2019, Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Deep learning is computationally expensive, but can be scaled-out CLI EC2 instance this… CLI Cluster …to this
6.
© 2019, Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Scaling-out deep learning training Parallel experiments Distributed training Distributing training of a single model to train faster Different models running parallel to find the best model
7.
© 2019, Amazon
Web Services, Inc. or its Affiliates. All rights reserved. …but there are challenges to scaling CLI Cluster Code and dependencies Infrastructure management Cluster management 1 2 3
8.
© 2019, Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Machine learning stack is complex • “My code requires building several dependencies from source” • “My code isn’t taking advantage the GPU/GPUs” • “is cudnn, nccl installed, is it the right version?” • “My code is running slow on CPUs” • “oh wait, is it taking advantage of AVX instruction set ?!?” • “I updated my drivers and training is now slower/errors out” • “My cluster runs a different version of framework/linux distro” Makes portability, collaboration, scaling training really really hard! Code and dependencies 1
9.
© 2019, Amazon
Web Services, Inc. or its Affiliates. All rights reserved. NVIDIA drivers 436.15 Ubuntu 16.04 TensorFlow 1.13 Keras horovod numpy scipy others… Mkl 2019 v3CPU: cudnn 7.1 cublas 10 nccl 2 CUDA toolkit 10 GPU: scikit-learn pandas openmpi Python My code Development system NVIDIA drivers 410.68 Centos 7 Training cluster TensorFlow 1.14 Keras horovod numpy scipy others… Mkl 2019 v2CPU: cudnn 7.5 cublas 10 nccl 2.4 CUDA toolkit 10 GPU: scikit-learn pandas openmpi Python My code Multiple points of failureDevelopment system Training cluster Code and dependencies 1
10.
© 2019, Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Containers for Machine Learning Container runtime Infrastructure NVIDIA drivers Host OS Packages:TensorFlow mkl cudnn cublas Nccl CUDA toolkit CPU: GPU: TensorFlow Container Image Keras horovod numpy scipy others… scikit- learn pandas openmpi Python + Your training scripts ML environments that are: Code and dependencies 1
11.
© 2019, Amazon
Web Services, Inc. or its Affiliates. All rights reserved. TensorFlow mkl cudnn cublas Nccl CUDA toolkit NVIDIA drivers Host OS CPU: GPU: Container runtime TensorFlow Container Image Keras horovod numpy scipy others… scikit- learn pandas openmpi Python Development system NVIDIA drivers Host OS Container runtime Training cluster Container registry push TensorFlow mkl cudnn cublas Nccl CUDA toolkit CPU: GPU: TensorFlow Container Image Keras horovod numpy scipy others… scikit- learn pandas openmpi Pythonpull + Your training scripts + Your training scripts
12.
© 2019, Amazon
Web Services, Inc. or its Affiliates. All rights reserved. AWS Deep Learning Containers https://docs.aws.amazon.com/dlami/latest/devguide/deep-learning-containers-images.html Code and dependencies 1
13.
© 2019, Amazon
Web Services, Inc. or its Affiliates. All rights reserved. DEMO 1: Containers for deep learning workflows AWS Cloud Amazon ECR Deep learning container images AWS DL containers EC2 instance GPUs CLI Amazon EBS Datasets and checkpoints
14.
© 2019, Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Challenges with scaling deep learning CLI Cluster Code and dependencies Infrastructure management Cluster management
15.
© 2019, Amazon
Web Services, Inc. or its Affiliates. All rights reserved. ML infrastructure and cluster management Image registry Container image repository Amazon Elastic Container Registry (Amazon ECR) Compute Where the containers run Amazon EC2 Jupyter notebook instances high performance algorithms Large-scale training Optimization One-click deployment Fully managed with auto-scaling ML services Fully-managed service that covers the entire machine learning workflow Amazon SageMaker Management Deployment, scheduling, scaling, and management of containerized applications Amazon Elastic Kubernetes Service (Amazon EKS) Amazon Elastic Container Service (Amazon ECS)
16.
© 2019, Amazon
Web Services, Inc. or its Affiliates. All rights reserved. DEMO 2: Submitting training jobs to Amazon Elastic Kubernetes Services (Amazon EKS) Approach: 1. Provision a Kubernetes cluster Custom container Code files Container registry Amazon EKS cluster
17.
© 2019, Amazon
Web Services, Inc. or its affiliates. All rights reserved. Create a Kubernetes cluster Create cluster Submit a training jobs CLI eksctl create cluster --name eks-gpu --version 1.13 --region us-west-2 --nodegroup-name gpu-nodes --node-type p3.8xlarge --nodes 4 --timeout=40m --ssh-access --ssh-public-key=<public-key> --auto-kubeconfig
18.
© 2019, Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Learn more: Amazon EKS, Kubeflow and Katib Amazon Elastic Kubernetes Service (Amazon EKS) Machine learning workflows on Kubernetes Hyperparameter Tuning and Neural Architecture Search kubeflow.org/docs/aws/ aws.amazon.com/eks/
19.
© 2019, Amazon
Web Services, Inc. or its Affiliates. All rights reserved. DEMO 3: Hyperparameter search experiment using Amazon SageMaker SageMaker SDK Fully-managed SageMaker cluster Amazon S3 Container registry Custom container Code files Docker build Approach: Webinar: Machine Learning with Containers and Amazon SageMaker
20.
© 2019, Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Takeaways • Containers let you build l • Leverage services such as Amazon SageMaker and Kubernetes + Kubeflow to manage large-scale ML workloads. • Choose fully-managed or self-managed based on needs Code and dependencies Infrastructure management Cluster management
21.
© 2019, Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Resources docs.aws.amazon.com/sagemaker/ latest/dg/whatis.html Documentation github.com/awslabs/ amazon-sagemaker-examples Examples on GitHub aws.amazon.com/blogs/machine- learning/category/artificial-intelligence/ AWS ML Blog docs.aws.amazon.com/dlami/latest/devgui de/deep-learning-containers-images.html Webinar: Machine Learning with Containers and Amazon SageMaker
22.
© 2019, Amazon
Web Services, Inc. or its Affiliates. All rights reserved. Thank you! Shashank Prasanna, Sr. Technical Evangelist, AI/ML Questions? Happy to help: Twitter: @shshnkp LinkedIn: linkedin.com/in/shashankprasanna Demo code and configuration scripts: https://github.com/shashankprasanna/using -containers-for-dl