SlideShare une entreprise Scribd logo
1  sur  42
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cyrus Vahid <cyrusmv@amazon.com>
Principal Evangelist, AI Labs – MXNet
Aug 2018
Apache MXNet and gluon
Building Deep Learning Applications with
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Background
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deductive Reasoning
P Q P ∧ Q P ∨ Q P ∴ Q
T T T T T
T F F T F
F T F T T
F F F F T
• 𝑃 = 𝑇 ∧ 𝑄 = 𝑇 ∴ 𝑃 ∧ 𝑄 = 𝑇
• 𝑃 ∧ 𝑄 ∴ 𝑃 → 𝑄; ∼ 𝑃 ∴ 𝑃 → 𝑄
• P → Q
P
_________
∴ Q
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Rule Based Programming
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Plausible Reasoning
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Programming with Data
Understand
your data
Algorithmically
Discover
Hidden Patents
Generalize
Solution
Algorithm
Apply solution
to unseen
patterns
Make
Predictions
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fundamentals
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Biological & Artificial Neuron
Source: http://cs231n.github.io/neural-networks-1/
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Perceptron
I1 I2 B
O
w1 w2 w3
𝑓 𝑥𝑖, 𝑤𝑖 = Φ(𝑏 + Σ𝑖(𝑤𝑖. 𝑥𝑖))
Φ 𝑥 =
1, 𝑖𝑓 𝑥 ≥ 0.5
0, 𝑖𝑓 𝑥 < 0.5
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Perceptron
I1 I2 B
O
1 1 -1
𝑂1 = 1𝑥1 + 1𝑥1 + −1.5 = 0.5 ∴ Φ(𝑂1) = 1
𝐼1 = 𝐼2 = 𝐵1 = 1
𝑂1 = 1𝑥1 + 0𝑥1 + −1.5 = −0.5 ∴ Φ(𝑂1) = 0
𝐼2 = 0 ; 𝐼1 = 𝐵1 = 1
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Non-Linearity
P Q P ∧ Q P ⨁ Q
T T T T
T F F F
F T F F
F F F T
P
Q
x0
0 0
P
Q
x0
x 0
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deep Learning
hidden layersInput layer
output
Add Non Linearity to output of hidden layer
To transform output into continuous range
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The “Learning” in Deep Learning
0.4 0.3
0.2 0.9
...
backpropagation (gradient descent)
X1 != X
0.4 ± 𝛿 0.3 ± 𝛿
new
weights
new
weights
0
1
0
1
1
.
.
-
X
input
label
...
X1
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activation Function (Φ)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Inputs: Preprocessing, Batches, Epochs
Preprocessing
 Random separation of data into
training, validation, and test sets
 Necessary to measuring the
accuracy of the model
Batch
 Amount of data propagated
through network at every iteration
 Enables faster optimization
through shorter iteration cycles
Epoch
 Complete pass through all the
training data
 Optimization will have multiple
epochs to reduce error rate
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Inputs: Encoding MNIST data
https://www.tensorflow.org/get_started/mnist/beginners
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Inputs: Encoding Pictures into Data
7 x 7 x 3 Matrix
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Classification with the Softmax Function
Softmax converts the output layer into probabilities – necessary for classification
Softmax Function
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Loss Function
• It is an objective function that quantifies how successful
the model was in its predictions
• It is a measure of the difference between a neural net’s
prediction and the actual value – that is, the error
• Typically, we use Cross Entropy Loss, which adjusts
the plain loss calculation to mitigate learning slowdown
• Backpropagation is performed to calculate the error
contribution of each neuron after processing one batch
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Gradient Descent
Iteratively update parameters to get the most optimal value for the objective function
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Weight Initialization
https://stats.stackexchange.com/questions/47590/what-are-good-initial-weights-in-a-neural-network
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Stochastic Gradient Descent
Gradient Descent
A single iteration for the
parameter update runs through
ALL of the training data
Stochastic Gradient Descent,
A single iteration for the
parameter update runs through
a BATCH of the training data
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Optimizers
http://imgur.com/a/Hqolp
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Learning Rates
• Learning Rate: It is a real number
that decides how far to move down in
the direction of steepest gradient
• Online Learning: Weights are
updated at each step (slow to learn)
• Batch Learning: Weights are
updated after all training data is
processed (hard to optimize)
• Mini-Batch: Combination of both
when we break up the training set
into smaller batches and update the
weights after each mini-batch
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Training and Validation Data
Best model
When only evaluating accuracy using the training set, we face the Overfitting issue
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Dropout
Srivastava, Nitish, et al. ”Dropout: a simple way to prevent neural networks from
overfitting”, JMLR 2014
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MXNet
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Computational Dependency/Graph
• 𝑧 = 𝑥 ⋅ 𝑦
• 𝑘 = 𝑎 ⋅ 𝑏
• 𝑡 = 𝜆𝑧 + 𝑘
x y
𝑧
x
𝜆
𝑢
x
a
x
b
k
𝑡
+
1 1
2
3
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Computational Dependency/Graph
net = mx.sym.Variable('data')
net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64)
net = mx.sym.Activation(net, name='relu1', act_type="relu")
net = mx.sym.FullyConnected(net, name='fc2', num_hidden=26)
net = mx.sym.SoftmaxOutput(net, name='softmax')
mx.viz.plot_network(net)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Computational Dependency/Graph
• 𝑧 = 𝑥 ⋅ 𝑦
• 𝑘 = 𝑎 ⋅ 𝑏
• 𝑡 = 𝜆𝑧 + 𝑘
x y
𝑧
x
𝜆
𝑢
x
a
x
b
k
𝑡
+
1 1
2
3
net = mx.sym.Variable('data')
net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64)
net = mx.sym.Activation(net, name='relu1', act_type="relu")
net = mx.sym.FullyConnected(net, name='fc2', num_hidden=26)
net = mx.sym.SoftmaxOutput(net, name='softmax')
mx.viz.plot_network(net)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Computational Dependency/Graph
• 𝑧 = 𝑥 ⋅ 𝑦
• 𝑘 = 𝑎 ⋅ 𝑏
• 𝑡 = 𝜆𝑧 + 𝑘
x y
𝑧
x
𝜆
𝑢
x
a
x
b
k
𝑡
+
1 1
2
3
net = mx.sym.Variable('data')
net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64)
net = mx.sym.Activation(net, name='relu1', act_type="relu")
net = mx.sym.FullyConnected(net, name='fc2', num_hidden=26)
net = mx.sym.SoftmaxOutput(net, name='softmax')
mx.viz.plot_network(net)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ideal
Inception v3
Resnet
Alexnet
88%
Efficiency
1 2 4 8 16 32 64 128 256
Scaling with MXNet
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Imperative vs Symbolic Programming
Imperative Symbolic
Execution Flow is the same as flow of the
code:
Abstract functions are defined and compiled
first, data binding happens next.
Flexible but inefficient: Efficient
• Memory: 4 * 10 * 8 = 320 bytes
• Interim values are available
• No Operation Folding.
• Familiar coding paradigm.
• Memory: 2 * 10 * 8 = 160 bytes
• Interim values are not available
• Operation Folding: Folding
multiple operations into one.
We run one op. instead of
many on GPU. This is possible
because we have access to
whole comp. graph
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Gluon
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Evolution of DL Frameworks
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Advantages of the Gluon API
Simple, Easy-to-
Understand Code
Flexible, Imperative
Structure
Dynamic Graphs
High Performance
 Neural networks can be defined using simple, clear, concise code
 Plug-and-play neural network building blocks – including predefined layers,
optimizers, and initializers
 Eliminates rigidity of neural network model definition and brings together
the model with the training algorithm
 Intuitive, easy-to-debug, familiar code
 Neural networks can change in shape or size during the training process to
address advanced use cases where the size of data fed is variable
 Important area of innovation in Natural Language Processing (NLP)
 There is no sacrifice with respect to training speed
 When it is time to move from prototyping to production, easily cache neural
networks for high performance and a reduced memory footprint
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Code
https://github.com/cyrusmvahid/GluonBootcamp/tree/master/labs/fancy_mnist
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What’s New
• GluonCV, a Deep Learning Toolkit for Computer Vision
• Features:
• training scripts that reproduces SOTA results reported in latest
papers,
• a large set of pre-trained models,
• carefully designed APIs and easy to understand implementations,
• community support.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What’s New
• GluonNLP, a Deep Learning Toolkit for Natural
Language Processing
• Features:
• Training scripts to reproduce SOTA results reported in research
papers.
• Pre-trained models for common NLP tasks.
• Carefully designed APIs that greatly reduce the implementation
complexity.
• Community support.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What’s New
• MXNet backend for Keras: Keras is a high-level neural networks
API, written in Python and capable of running on top of Apache MXNet,
Tensorflow, CNTK, and Theano.
• Performance: MXNet backend provides scalable and fast backend for
new projects and existing code, hence with least effort it can improve
performance of existing models. For more on benchmarking please check:
https://github.com/awslabs/keras-apache-mxnet/tree/master/benchmark
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Refrences
• Mxnet: http://mxnet.incubator.apache.org/
• Gluon 60-min crash course: https://gluon-crash-course.mxnet.io/
• Deep Learning book based on gluon: https://gluon.mxnet.io/
• GluonCV: https://gluon-cv.mxnet.io/
• GluonNLP: https://gluon-nlp.mxnet.io/
• Keras-mxnet: https://github.com/awslabs/keras-apache-mxnet
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!
c y r u s m v @ a m a z o n . c o m

Contenu connexe

Tendances

Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
MLconf
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
MLconf
 
AWS Cost Optimization
AWS Cost OptimizationAWS Cost Optimization
AWS Cost Optimization
Miles Ward
 
Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...
Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...
Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...
Fred Madrid
 

Tendances (20)

Jay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AIJay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AI
 
Training Large-scale Ad Ranking Models in Spark
Training Large-scale Ad Ranking Models in SparkTraining Large-scale Ad Ranking Models in Spark
Training Large-scale Ad Ranking Models in Spark
 
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
 
Dueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement LearningDueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement Learning
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
 
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
 
Barga Data Science lecture 3
Barga Data Science lecture 3Barga Data Science lecture 3
Barga Data Science lecture 3
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
AWS re:Invent 2018 - Machine Learning recap (December 2018)
AWS re:Invent 2018 - Machine Learning recap (December 2018)AWS re:Invent 2018 - Machine Learning recap (December 2018)
AWS re:Invent 2018 - Machine Learning recap (December 2018)
 
Cloud cost optimization (AWS, GCP)
Cloud cost optimization (AWS, GCP)Cloud cost optimization (AWS, GCP)
Cloud cost optimization (AWS, GCP)
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
 
Modern classification techniques
Modern classification techniquesModern classification techniques
Modern classification techniques
 
SparkML: Easy ML Productization for Real-Time Bidding
SparkML: Easy ML Productization for Real-Time BiddingSparkML: Easy ML Productization for Real-Time Bidding
SparkML: Easy ML Productization for Real-Time Bidding
 
AWS Cost Optimization
AWS Cost OptimizationAWS Cost Optimization
AWS Cost Optimization
 
Josh Patterson MLconf slides
Josh Patterson MLconf slidesJosh Patterson MLconf slides
Josh Patterson MLconf slides
 
GBM theory code and parameters
GBM theory code and parametersGBM theory code and parameters
GBM theory code and parameters
 
Data Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsData Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science Competitions
 
Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...
Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...
Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...
 

Similaire à Deep Learning with MXNet

Similaire à Deep Learning with MXNet (20)

Building Applications with Apache MXNet
Building Applications with Apache MXNetBuilding Applications with Apache MXNet
Building Applications with Apache MXNet
 
Build Deep Learning Applications Using MXNet and Amazon SageMaker (AIM418) - ...
Build Deep Learning Applications Using MXNet and Amazon SageMaker (AIM418) - ...Build Deep Learning Applications Using MXNet and Amazon SageMaker (AIM418) - ...
Build Deep Learning Applications Using MXNet and Amazon SageMaker (AIM418) - ...
 
MCL310_Building Deep Learning Applications with Apache MXNet and Gluon
MCL310_Building Deep Learning Applications with Apache MXNet and GluonMCL310_Building Deep Learning Applications with Apache MXNet and Gluon
MCL310_Building Deep Learning Applications with Apache MXNet and Gluon
 
Automatic Model Tuning Using Amazon SageMaker (AIM412) - AWS re:Invent 2018
Automatic Model Tuning Using Amazon SageMaker (AIM412) - AWS re:Invent 2018Automatic Model Tuning Using Amazon SageMaker (AIM412) - AWS re:Invent 2018
Automatic Model Tuning Using Amazon SageMaker (AIM412) - AWS re:Invent 2018
 
Building a Recommender System on AWS
Building a Recommender System on AWSBuilding a Recommender System on AWS
Building a Recommender System on AWS
 
[NEW LAUNCH!] Introducing Amazon SageMaker RL - Build and Train Reinforcement...
[NEW LAUNCH!] Introducing Amazon SageMaker RL - Build and Train Reinforcement...[NEW LAUNCH!] Introducing Amazon SageMaker RL - Build and Train Reinforcement...
[NEW LAUNCH!] Introducing Amazon SageMaker RL - Build and Train Reinforcement...
 
AWS, I Choose You: Pokemon's Battle against the Bots (SEC402-R1) - AWS re:Inv...
AWS, I Choose You: Pokemon's Battle against the Bots (SEC402-R1) - AWS re:Inv...AWS, I Choose You: Pokemon's Battle against the Bots (SEC402-R1) - AWS re:Inv...
AWS, I Choose You: Pokemon's Battle against the Bots (SEC402-R1) - AWS re:Inv...
 
Amazon SageMaker Ground Truth: Build High-Quality and Accurate ML Training Da...
Amazon SageMaker Ground Truth: Build High-Quality and Accurate ML Training Da...Amazon SageMaker Ground Truth: Build High-Quality and Accurate ML Training Da...
Amazon SageMaker Ground Truth: Build High-Quality and Accurate ML Training Da...
 
re:Invent Deep Dive on Amazon SageMaker, Amazon Forecast and Amazon Personalise
re:Invent Deep Dive on Amazon SageMaker, Amazon Forecast and Amazon Personalisere:Invent Deep Dive on Amazon SageMaker, Amazon Forecast and Amazon Personalise
re:Invent Deep Dive on Amazon SageMaker, Amazon Forecast and Amazon Personalise
 
Keynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos EngineeringKeynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos Engineering
 
Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018
Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018
Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018
 
Amazon, awsreinvent2018, Artificial Intelligence & Machine Learning, AIM422, ...
Amazon, awsreinvent2018, Artificial Intelligence & Machine Learning, AIM422, ...Amazon, awsreinvent2018, Artificial Intelligence & Machine Learning, AIM422, ...
Amazon, awsreinvent2018, Artificial Intelligence & Machine Learning, AIM422, ...
 
Predictive Scaling for More Responsive Applications (API330) - AWS re:Invent ...
Predictive Scaling for More Responsive Applications (API330) - AWS re:Invent ...Predictive Scaling for More Responsive Applications (API330) - AWS re:Invent ...
Predictive Scaling for More Responsive Applications (API330) - AWS re:Invent ...
 
RoboMaker로 DeepRacer 자율 주행차 만들기 :: 유정열 - AWS Community Day 2019
RoboMaker로 DeepRacer 자율 주행차 만들기 :: 유정열 - AWS Community Day 2019 RoboMaker로 DeepRacer 자율 주행차 만들기 :: 유정열 - AWS Community Day 2019
RoboMaker로 DeepRacer 자율 주행차 만들기 :: 유정열 - AWS Community Day 2019
 
Supercharge Your ML Model with SageMaker - AWS Summit Sydney 2018
Supercharge Your ML Model with SageMaker - AWS Summit Sydney 2018Supercharge Your ML Model with SageMaker - AWS Summit Sydney 2018
Supercharge Your ML Model with SageMaker - AWS Summit Sydney 2018
 
Accelerate Machine Learning with Ease using Amazon SageMaker
Accelerate Machine Learning with Ease using Amazon SageMakerAccelerate Machine Learning with Ease using Amazon SageMaker
Accelerate Machine Learning with Ease using Amazon SageMaker
 
Run Production Workloads on Spot, Save up to 90% (CMP306-R1) - AWS re:Invent ...
Run Production Workloads on Spot, Save up to 90% (CMP306-R1) - AWS re:Invent ...Run Production Workloads on Spot, Save up to 90% (CMP306-R1) - AWS re:Invent ...
Run Production Workloads on Spot, Save up to 90% (CMP306-R1) - AWS re:Invent ...
 
Keynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practicedKeynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practiced
 
From Data To Insights
From Data To Insights From Data To Insights
From Data To Insights
 
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Deep Learning with MXNet

  • 1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cyrus Vahid <cyrusmv@amazon.com> Principal Evangelist, AI Labs – MXNet Aug 2018 Apache MXNet and gluon Building Deep Learning Applications with
  • 2. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Background
  • 3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Deductive Reasoning P Q P ∧ Q P ∨ Q P ∴ Q T T T T T T F F T F F T F T T F F F F T • 𝑃 = 𝑇 ∧ 𝑄 = 𝑇 ∴ 𝑃 ∧ 𝑄 = 𝑇 • 𝑃 ∧ 𝑄 ∴ 𝑃 → 𝑄; ∼ 𝑃 ∴ 𝑃 → 𝑄 • P → Q P _________ ∴ Q
  • 4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Rule Based Programming
  • 5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Plausible Reasoning
  • 6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Programming with Data Understand your data Algorithmically Discover Hidden Patents Generalize Solution Algorithm Apply solution to unseen patterns Make Predictions
  • 7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Fundamentals
  • 8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Biological & Artificial Neuron Source: http://cs231n.github.io/neural-networks-1/
  • 9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Perceptron I1 I2 B O w1 w2 w3 𝑓 𝑥𝑖, 𝑤𝑖 = Φ(𝑏 + Σ𝑖(𝑤𝑖. 𝑥𝑖)) Φ 𝑥 = 1, 𝑖𝑓 𝑥 ≥ 0.5 0, 𝑖𝑓 𝑥 < 0.5
  • 10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Perceptron I1 I2 B O 1 1 -1 𝑂1 = 1𝑥1 + 1𝑥1 + −1.5 = 0.5 ∴ Φ(𝑂1) = 1 𝐼1 = 𝐼2 = 𝐵1 = 1 𝑂1 = 1𝑥1 + 0𝑥1 + −1.5 = −0.5 ∴ Φ(𝑂1) = 0 𝐼2 = 0 ; 𝐼1 = 𝐵1 = 1
  • 11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Non-Linearity P Q P ∧ Q P ⨁ Q T T T T T F F F F T F F F F F T P Q x0 0 0 P Q x0 x 0
  • 12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Deep Learning hidden layersInput layer output Add Non Linearity to output of hidden layer To transform output into continuous range
  • 13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The “Learning” in Deep Learning 0.4 0.3 0.2 0.9 ... backpropagation (gradient descent) X1 != X 0.4 ± 𝛿 0.3 ± 𝛿 new weights new weights 0 1 0 1 1 . . - X input label ... X1
  • 14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activation Function (Φ)
  • 15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Inputs: Preprocessing, Batches, Epochs Preprocessing  Random separation of data into training, validation, and test sets  Necessary to measuring the accuracy of the model Batch  Amount of data propagated through network at every iteration  Enables faster optimization through shorter iteration cycles Epoch  Complete pass through all the training data  Optimization will have multiple epochs to reduce error rate
  • 16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Inputs: Encoding MNIST data https://www.tensorflow.org/get_started/mnist/beginners
  • 17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Inputs: Encoding Pictures into Data 7 x 7 x 3 Matrix
  • 18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Classification with the Softmax Function Softmax converts the output layer into probabilities – necessary for classification Softmax Function
  • 19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Loss Function • It is an objective function that quantifies how successful the model was in its predictions • It is a measure of the difference between a neural net’s prediction and the actual value – that is, the error • Typically, we use Cross Entropy Loss, which adjusts the plain loss calculation to mitigate learning slowdown • Backpropagation is performed to calculate the error contribution of each neuron after processing one batch
  • 20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Gradient Descent Iteratively update parameters to get the most optimal value for the objective function
  • 21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Weight Initialization https://stats.stackexchange.com/questions/47590/what-are-good-initial-weights-in-a-neural-network
  • 22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Stochastic Gradient Descent Gradient Descent A single iteration for the parameter update runs through ALL of the training data Stochastic Gradient Descent, A single iteration for the parameter update runs through a BATCH of the training data
  • 23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Optimizers http://imgur.com/a/Hqolp
  • 24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Learning Rates • Learning Rate: It is a real number that decides how far to move down in the direction of steepest gradient • Online Learning: Weights are updated at each step (slow to learn) • Batch Learning: Weights are updated after all training data is processed (hard to optimize) • Mini-Batch: Combination of both when we break up the training set into smaller batches and update the weights after each mini-batch
  • 25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Training and Validation Data Best model When only evaluating accuracy using the training set, we face the Overfitting issue
  • 26. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Dropout Srivastava, Nitish, et al. ”Dropout: a simple way to prevent neural networks from overfitting”, JMLR 2014
  • 27. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. MXNet
  • 28. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Computational Dependency/Graph • 𝑧 = 𝑥 ⋅ 𝑦 • 𝑘 = 𝑎 ⋅ 𝑏 • 𝑡 = 𝜆𝑧 + 𝑘 x y 𝑧 x 𝜆 𝑢 x a x b k 𝑡 + 1 1 2 3
  • 29. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Computational Dependency/Graph net = mx.sym.Variable('data') net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64) net = mx.sym.Activation(net, name='relu1', act_type="relu") net = mx.sym.FullyConnected(net, name='fc2', num_hidden=26) net = mx.sym.SoftmaxOutput(net, name='softmax') mx.viz.plot_network(net)
  • 30. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Computational Dependency/Graph • 𝑧 = 𝑥 ⋅ 𝑦 • 𝑘 = 𝑎 ⋅ 𝑏 • 𝑡 = 𝜆𝑧 + 𝑘 x y 𝑧 x 𝜆 𝑢 x a x b k 𝑡 + 1 1 2 3 net = mx.sym.Variable('data') net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64) net = mx.sym.Activation(net, name='relu1', act_type="relu") net = mx.sym.FullyConnected(net, name='fc2', num_hidden=26) net = mx.sym.SoftmaxOutput(net, name='softmax') mx.viz.plot_network(net)
  • 31. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Computational Dependency/Graph • 𝑧 = 𝑥 ⋅ 𝑦 • 𝑘 = 𝑎 ⋅ 𝑏 • 𝑡 = 𝜆𝑧 + 𝑘 x y 𝑧 x 𝜆 𝑢 x a x b k 𝑡 + 1 1 2 3 net = mx.sym.Variable('data') net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64) net = mx.sym.Activation(net, name='relu1', act_type="relu") net = mx.sym.FullyConnected(net, name='fc2', num_hidden=26) net = mx.sym.SoftmaxOutput(net, name='softmax') mx.viz.plot_network(net)
  • 32. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ideal Inception v3 Resnet Alexnet 88% Efficiency 1 2 4 8 16 32 64 128 256 Scaling with MXNet
  • 33. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Imperative vs Symbolic Programming Imperative Symbolic Execution Flow is the same as flow of the code: Abstract functions are defined and compiled first, data binding happens next. Flexible but inefficient: Efficient • Memory: 4 * 10 * 8 = 320 bytes • Interim values are available • No Operation Folding. • Familiar coding paradigm. • Memory: 2 * 10 * 8 = 160 bytes • Interim values are not available • Operation Folding: Folding multiple operations into one. We run one op. instead of many on GPU. This is possible because we have access to whole comp. graph
  • 34. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Gluon
  • 35. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Evolution of DL Frameworks
  • 36. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Advantages of the Gluon API Simple, Easy-to- Understand Code Flexible, Imperative Structure Dynamic Graphs High Performance  Neural networks can be defined using simple, clear, concise code  Plug-and-play neural network building blocks – including predefined layers, optimizers, and initializers  Eliminates rigidity of neural network model definition and brings together the model with the training algorithm  Intuitive, easy-to-debug, familiar code  Neural networks can change in shape or size during the training process to address advanced use cases where the size of data fed is variable  Important area of innovation in Natural Language Processing (NLP)  There is no sacrifice with respect to training speed  When it is time to move from prototyping to production, easily cache neural networks for high performance and a reduced memory footprint
  • 37. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Code https://github.com/cyrusmvahid/GluonBootcamp/tree/master/labs/fancy_mnist
  • 38. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s New • GluonCV, a Deep Learning Toolkit for Computer Vision • Features: • training scripts that reproduces SOTA results reported in latest papers, • a large set of pre-trained models, • carefully designed APIs and easy to understand implementations, • community support.
  • 39. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s New • GluonNLP, a Deep Learning Toolkit for Natural Language Processing • Features: • Training scripts to reproduce SOTA results reported in research papers. • Pre-trained models for common NLP tasks. • Carefully designed APIs that greatly reduce the implementation complexity. • Community support.
  • 40. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s New • MXNet backend for Keras: Keras is a high-level neural networks API, written in Python and capable of running on top of Apache MXNet, Tensorflow, CNTK, and Theano. • Performance: MXNet backend provides scalable and fast backend for new projects and existing code, hence with least effort it can improve performance of existing models. For more on benchmarking please check: https://github.com/awslabs/keras-apache-mxnet/tree/master/benchmark
  • 41. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Refrences • Mxnet: http://mxnet.incubator.apache.org/ • Gluon 60-min crash course: https://gluon-crash-course.mxnet.io/ • Deep Learning book based on gluon: https://gluon.mxnet.io/ • GluonCV: https://gluon-cv.mxnet.io/ • GluonNLP: https://gluon-nlp.mxnet.io/ • Keras-mxnet: https://github.com/awslabs/keras-apache-mxnet
  • 42. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you! c y r u s m v @ a m a z o n . c o m