Build, train, and deploy Machine Learning models at scale (May 2018)

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Julien Simon
Principal Technical Evangelist, AI and Machine Learning
@julsimon
Build, train, and deploy machine
learning models at scale

ML is still too complicated for everyday developers
Collect and prepare
training data
Choose and optimize
your ML algorithm
Set up and manage
environments for
training
Train and
tune model
(trial and error)
Deploy model
in production
Scale and manage the
production
environment

Amazon SageMaker
Collect and prepare
training data
Choose and optimize
your ML algorithm
Set up and manage
environments for
training
Deploy model
in production
production
environment
Easily build, train, and deploy Machine Learning models
Train and
tune model
(trial and error)

Amazon SageMaker
Pre-built
notebooks for
common
problems
K-MeansClustering
Principal Component Analysis
Neural TopicModelling
FactorizationMachines
Linear Learner
XGBoost
Latent Dirichlet Allocation
ImageClassification
Seq2Seq,
And more!
ALGORITHMS
Apache MXNet
TensorFlow
Caffe2, CNTK,
PyTorch, Torch
FRAMEWORKS Set up and manage
environments for training
Train and tune
model (trial and
error)
Deploy model
in production
production environment
Built-in, high-
performance
algorithms
Build

Amazon SageMaker
Pre-built
notebooks for
common
problems
Built-in, high-
performance
algorithms
One-click
training
Hyperparameter
optimization
Build Train
Deploy model
in production
production
environment

Amazon SageMaker
Fully managed
hosting with auto-
scaling
One-click
deployment
Pre-built
notebooks for
common
problems
Built-in, high-
performance
algorithms
One-click
training
Hyperparameter
optimization
Build Train Deploy

Amazon ECR
Model Training (on EC2)
Model Hosting (on EC2)
Trainingdata
Modelartifacts
Training code Helper code
Helper codeInference code
GroundTruth
Client application
Inference code
Training code
Inference requestInference
response
Inference Endpoint
Amazon SageMaker

Open Source Containers for TF and MXNet
https://github.com/aws/sagemaker-tensorflow-containers
https://github.com/aws/sagemaker-mxnet-containers
• Customize them
• Run them locally for development and testing
• Run them on SageMaker for training and prediction at scale

Bring your own container
https://github.com/aws/sagemaker-container-support
• Integration with SageMaker Python SDK Estimators, including:
• Downloading user-provided Python code
• Deserializing hyperparameters (preserving their Python types)
• bin/entry.py, the Docker entrypoint required by SageMaker
• Reading in the metadata files provided to the container during training
• nginx + Gunicorn HTTP server for serving inference requests
https://github.com/awslabs/amazon-sagemaker-examples/tree/master/advanced_functionality/scikit_bring_your_own
https://github.com/awslabs/amazon-sagemaker-examples/tree/master/advanced_functionality/r_bring_your_own

© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon EC2 C5 instances
AVX 512
72 vCPUs
“Skylake”
144 GiB memory
C5
12 Gbps to EBS
2X vCPUs
2X performance
3X throughput
2.4X memory
C4
36 vCPUs
“Haswell”
4 Gbps to EBS
60 GiB memory
C5 : Ne xt G e n e ra t ion
Co m p ut e -O pt imize d
I n st a n ces wit h
In te l® Xe o n ® Sca la b le
P ro ce sso r
AW S Co m p u t e o p t im ize d
in st a n ces su p p o rt t h e n e w I n t e l®
AVX-5 1 2 a d va n ced in stru ctio n
se t , e n a b ling yo u t o m o re
e ff icient ly ru n ve ct o r p ro ce ssing
wo rklo a d s wit h sin gle a n d
d o u ble f lo a t ing p o in t p re cisio n,
su ch a s A I / m a ch ine le a rn ing o r
vid e o p ro ce ssing.
25% improvement in
price/performance over C4

FasterTensorFlow training on C5
https://aws.amazon.com/blogs/machine-learning/faster-training-with-optimized-tensorflow-1-6-on-
amazon-ec2-c5-and-p3-instances/

Amazon EC2 P3 Instances
• P3.2xlarge, P3.8xlarge, P3.16xlarge
• Up to eight NVIDIA Tesla V100 GPUs in a single instance
• 40,960 CUDA cores, 5120 Tensor cores
• 128GB of GPU memory
• 1 PetaFLOPs of computational performance – 14x better than P2
• 300 GB/s GPU-to-GPU communication (NVLink) – 9x better than P2
T h e f a s t e s t , m o s t p o w e r f u l G P U i n s t a n c e s i n t h e c l o u d

Digital Globe
https://aws.amazon.com/solutions/case-studies/digitalglobe-machine-learning/
• Operating Earth imaging satellites
and providing image analysis
services.
• Over 100 PB of imagery.
• Extensive use of Machine Learning
on SageMaker to extract
information from images.
• Working with the AWS ML Lab, built
a predictive model reducing cloud
storage costs by 50%.

DEMOS
Linear Learner (built-in) – binary classification of MNIST (0 vs 1-9)
Image Classification (built-in) – classifying Caltech-256
TensorFlow – classifying MNIST with a CNN
Spark on EMR + XGBoost (built-in) – classifying spam
Bonus: invoking a SageMaker endpoint with AWS Chalice

Thank you!
Julien Simon
PrincipalTechnical Evangelist, AI and Machine Learning
@julsimon
https://aws.amazon.com/sagemaker
https://github.com/awslabs/amazon-sagemaker-examples
https://github.com/aws/sagemaker-python-sdk
https://github.com/aws/sagemaker-spark
https://medium.com/@julsimon
https://youtube.com/juliensimonfr

Build, train, and deploy Machine Learning models at scale (May 2018)

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Build, train, and deploy Machine Learning models at scale (May 2018)

Similaire à Build, train, and deploy Machine Learning models at scale (May 2018) (20)

Plus de Julien SIMON

Plus de Julien SIMON (20)

Dernier

Dernier (20)

Build, train, and deploy Machine Learning models at scale (May 2018)

Notes de l'éditeur