Contenu connexe Similaire à Deep Learning with MXNet (20) Deep Learning with MXNet1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cyrus Vahid <cyrusmv@amazon.com>
Principal Evangelist, AI Labs – MXNet
Aug 2018
Apache MXNet and gluon
Building Deep Learning Applications with
2. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Background
3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deductive Reasoning
P Q P ∧ Q P ∨ Q P ∴ Q
T T T T T
T F F T F
F T F T T
F F F F T
• 𝑃 = 𝑇 ∧ 𝑄 = 𝑇 ∴ 𝑃 ∧ 𝑄 = 𝑇
• 𝑃 ∧ 𝑄 ∴ 𝑃 → 𝑄; ∼ 𝑃 ∴ 𝑃 → 𝑄
• P → Q
P
_________
∴ Q
4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Rule Based Programming
5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Plausible Reasoning
6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Programming with Data
Understand
your data
Algorithmically
Discover
Hidden Patents
Generalize
Solution
Algorithm
Apply solution
to unseen
patterns
Make
Predictions
7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fundamentals
8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Biological & Artificial Neuron
Source: http://cs231n.github.io/neural-networks-1/
9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Perceptron
I1 I2 B
O
w1 w2 w3
𝑓 𝑥𝑖, 𝑤𝑖 = Φ(𝑏 + Σ𝑖(𝑤𝑖. 𝑥𝑖))
Φ 𝑥 =
1, 𝑖𝑓 𝑥 ≥ 0.5
0, 𝑖𝑓 𝑥 < 0.5
10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Perceptron
I1 I2 B
O
1 1 -1
𝑂1 = 1𝑥1 + 1𝑥1 + −1.5 = 0.5 ∴ Φ(𝑂1) = 1
𝐼1 = 𝐼2 = 𝐵1 = 1
𝑂1 = 1𝑥1 + 0𝑥1 + −1.5 = −0.5 ∴ Φ(𝑂1) = 0
𝐼2 = 0 ; 𝐼1 = 𝐵1 = 1
11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Non-Linearity
P Q P ∧ Q P ⨁ Q
T T T T
T F F F
F T F F
F F F T
P
Q
x0
0 0
P
Q
x0
x 0
12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deep Learning
hidden layersInput layer
output
Add Non Linearity to output of hidden layer
To transform output into continuous range
13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The “Learning” in Deep Learning
0.4 0.3
0.2 0.9
...
backpropagation (gradient descent)
X1 != X
0.4 ± 𝛿 0.3 ± 𝛿
new
weights
new
weights
0
1
0
1
1
.
.
-
X
input
label
...
X1
14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activation Function (Φ)
15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Inputs: Preprocessing, Batches, Epochs
Preprocessing
Random separation of data into
training, validation, and test sets
Necessary to measuring the
accuracy of the model
Batch
Amount of data propagated
through network at every iteration
Enables faster optimization
through shorter iteration cycles
Epoch
Complete pass through all the
training data
Optimization will have multiple
epochs to reduce error rate
16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Inputs: Encoding MNIST data
https://www.tensorflow.org/get_started/mnist/beginners
17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Inputs: Encoding Pictures into Data
7 x 7 x 3 Matrix
18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Classification with the Softmax Function
Softmax converts the output layer into probabilities – necessary for classification
Softmax Function
19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Loss Function
• It is an objective function that quantifies how successful
the model was in its predictions
• It is a measure of the difference between a neural net’s
prediction and the actual value – that is, the error
• Typically, we use Cross Entropy Loss, which adjusts
the plain loss calculation to mitigate learning slowdown
• Backpropagation is performed to calculate the error
contribution of each neuron after processing one batch
20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Gradient Descent
Iteratively update parameters to get the most optimal value for the objective function
21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Weight Initialization
https://stats.stackexchange.com/questions/47590/what-are-good-initial-weights-in-a-neural-network
22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Stochastic Gradient Descent
Gradient Descent
A single iteration for the
parameter update runs through
ALL of the training data
Stochastic Gradient Descent,
A single iteration for the
parameter update runs through
a BATCH of the training data
23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Optimizers
http://imgur.com/a/Hqolp
24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Learning Rates
• Learning Rate: It is a real number
that decides how far to move down in
the direction of steepest gradient
• Online Learning: Weights are
updated at each step (slow to learn)
• Batch Learning: Weights are
updated after all training data is
processed (hard to optimize)
• Mini-Batch: Combination of both
when we break up the training set
into smaller batches and update the
weights after each mini-batch
25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Training and Validation Data
Best model
When only evaluating accuracy using the training set, we face the Overfitting issue
26. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Dropout
Srivastava, Nitish, et al. ”Dropout: a simple way to prevent neural networks from
overfitting”, JMLR 2014
27. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MXNet
28. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Computational Dependency/Graph
• 𝑧 = 𝑥 ⋅ 𝑦
• 𝑘 = 𝑎 ⋅ 𝑏
• 𝑡 = 𝜆𝑧 + 𝑘
x y
𝑧
x
𝜆
𝑢
x
a
x
b
k
𝑡
+
1 1
2
3
29. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Computational Dependency/Graph
net = mx.sym.Variable('data')
net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64)
net = mx.sym.Activation(net, name='relu1', act_type="relu")
net = mx.sym.FullyConnected(net, name='fc2', num_hidden=26)
net = mx.sym.SoftmaxOutput(net, name='softmax')
mx.viz.plot_network(net)
30. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Computational Dependency/Graph
• 𝑧 = 𝑥 ⋅ 𝑦
• 𝑘 = 𝑎 ⋅ 𝑏
• 𝑡 = 𝜆𝑧 + 𝑘
x y
𝑧
x
𝜆
𝑢
x
a
x
b
k
𝑡
+
1 1
2
3
net = mx.sym.Variable('data')
net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64)
net = mx.sym.Activation(net, name='relu1', act_type="relu")
net = mx.sym.FullyConnected(net, name='fc2', num_hidden=26)
net = mx.sym.SoftmaxOutput(net, name='softmax')
mx.viz.plot_network(net)
31. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Computational Dependency/Graph
• 𝑧 = 𝑥 ⋅ 𝑦
• 𝑘 = 𝑎 ⋅ 𝑏
• 𝑡 = 𝜆𝑧 + 𝑘
x y
𝑧
x
𝜆
𝑢
x
a
x
b
k
𝑡
+
1 1
2
3
net = mx.sym.Variable('data')
net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64)
net = mx.sym.Activation(net, name='relu1', act_type="relu")
net = mx.sym.FullyConnected(net, name='fc2', num_hidden=26)
net = mx.sym.SoftmaxOutput(net, name='softmax')
mx.viz.plot_network(net)
32. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ideal
Inception v3
Resnet
Alexnet
88%
Efficiency
1 2 4 8 16 32 64 128 256
Scaling with MXNet
33. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Imperative vs Symbolic Programming
Imperative Symbolic
Execution Flow is the same as flow of the
code:
Abstract functions are defined and compiled
first, data binding happens next.
Flexible but inefficient: Efficient
• Memory: 4 * 10 * 8 = 320 bytes
• Interim values are available
• No Operation Folding.
• Familiar coding paradigm.
• Memory: 2 * 10 * 8 = 160 bytes
• Interim values are not available
• Operation Folding: Folding
multiple operations into one.
We run one op. instead of
many on GPU. This is possible
because we have access to
whole comp. graph
34. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Gluon
35. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Evolution of DL Frameworks
36. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Advantages of the Gluon API
Simple, Easy-to-
Understand Code
Flexible, Imperative
Structure
Dynamic Graphs
High Performance
Neural networks can be defined using simple, clear, concise code
Plug-and-play neural network building blocks – including predefined layers,
optimizers, and initializers
Eliminates rigidity of neural network model definition and brings together
the model with the training algorithm
Intuitive, easy-to-debug, familiar code
Neural networks can change in shape or size during the training process to
address advanced use cases where the size of data fed is variable
Important area of innovation in Natural Language Processing (NLP)
There is no sacrifice with respect to training speed
When it is time to move from prototyping to production, easily cache neural
networks for high performance and a reduced memory footprint
37. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Code
https://github.com/cyrusmvahid/GluonBootcamp/tree/master/labs/fancy_mnist
38. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What’s New
• GluonCV, a Deep Learning Toolkit for Computer Vision
• Features:
• training scripts that reproduces SOTA results reported in latest
papers,
• a large set of pre-trained models,
• carefully designed APIs and easy to understand implementations,
• community support.
39. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What’s New
• GluonNLP, a Deep Learning Toolkit for Natural
Language Processing
• Features:
• Training scripts to reproduce SOTA results reported in research
papers.
• Pre-trained models for common NLP tasks.
• Carefully designed APIs that greatly reduce the implementation
complexity.
• Community support.
40. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What’s New
• MXNet backend for Keras: Keras is a high-level neural networks
API, written in Python and capable of running on top of Apache MXNet,
Tensorflow, CNTK, and Theano.
• Performance: MXNet backend provides scalable and fast backend for
new projects and existing code, hence with least effort it can improve
performance of existing models. For more on benchmarking please check:
https://github.com/awslabs/keras-apache-mxnet/tree/master/benchmark
41. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Refrences
• Mxnet: http://mxnet.incubator.apache.org/
• Gluon 60-min crash course: https://gluon-crash-course.mxnet.io/
• Deep Learning book based on gluon: https://gluon.mxnet.io/
• GluonCV: https://gluon-cv.mxnet.io/
• GluonNLP: https://gluon-nlp.mxnet.io/
• Keras-mxnet: https://github.com/awslabs/keras-apache-mxnet
42. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!
c y r u s m v @ a m a z o n . c o m