Contenu connexe Similaire à Deep Learning Fundamentals (20) Deep Learning Fundamentals1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deep Learning Fundamentals
Thomas Delteil, Machine Learning Scientist, Amazon AI
Soji Adeshina, Machine Learning Engineer, Amazon AI
©2018 Amazon Web Services, Inc. or its affiliates, All rights reserved
2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why do machine learning?
How many cats?
Complex tasks where you can’t code up explicit solutions
3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Data & labels
• Classification, Labeling
• Regression
Supervised
• Data, no labels
• Clustering
• Dimensionality reduction
Unsupervised
• Data, some labels
• Active learning
• Reinforcement learning
Semi-supervised
Types of Machine Learning
4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Situating Deep Learning
AI
Machine
Learning
Deep
Learning
Can machines think?
Can machines do what we can?
(Turing, 1950)
Machine
Learning
Data
Answers Rules
5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Linear and non-linear separability
6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is “Deep” Learning ?
7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deep computational graph
Inception model
100+ Millions of learnable
parameters
• 𝑧 = 𝑥 ⋅ 𝑦
• 𝑘 = 𝑎 ⋅ 𝑏
• 𝑡 = 𝜆𝑧 + 𝑘
x y
𝑧
x
𝜆
𝑢
x
a
x
b
k
𝑡
+
1 1
2
3
8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What can Deep Learning do?
9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
And many more
- Action recognition
- Image super resolution
- Pose estimation
- Image generation
- Text to speech
- Speech to text
- Text recognition
- Robotics policy learning
- …
12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sea/Land segmentation via satellite images
DeepUNet: A Deep Fully Convolutional Network for Pixel-level Sea-Land Segmentation, Ruirui Li et al, 2017
13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Automatic Galaxy classification
Deep Galaxy: Classification of Galaxies based on Deep Convolutional Neural Networks , Nour Eldeen M. Khalifa, 2017
14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Medical Imaging, MRI, X-ray, surgical cameras
Review of MRI-based Brain Tumor Image Segmentation Using Deep Learning Methods, Ali Isn et al. 2016
15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Stock market predictions
Deep Learning for Forecasting Stock Returns in the Cross-Section, Masaya Abe and Hideki Nakayama 2017
16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How did it start ?
17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ImageNet classification with Deep Convolutional Neural Networks, Alex Krizhevsky, Ilya Sutskever, Geoffrey E.
Hinton, Advances in Neural Information Processing Systems, 2012
AlexNet architecture
2012 - ImageNet Classification with Deep
Convolutional Neural Networks
18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Classify images among 1000 classes:
AlexNet Top-5 error-rate, 25% => 16%!ImageNet
19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Actual photo of the reaction from the computer vision community*
*might just be a stock photo
20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
I told you
so!
21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why now?
22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Nvidia V100, float16 Ops:
~ 120 TFLOPS, 5000+ cuda cores
(#1 Super computer 2005 135 TFLOPS)
Source: Mathworks
Hardware: GPUs!
23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
And more specialized hardware being
developed as we speak
- AWS Inferentia
- Intel Movidius
- FPGAs
- Apple A11 “neural engine”
- …
24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Software
- Deep learning frameworks:
- MXNet
- Tensorflow
- Pytorch
- Deep learning accelerators
- TensorRT
- MKLDNN
- Deep learning compiler
- TVM
- Glow
- Deep Learning APIs
- ONNX
- Keras
25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How does it work?
26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Basic Terminology
Age Education Years of
education
Marital
status
Occupation Sex Label
39 Bachelors 16 Single Adm-clerical Male -1
31 Masters 18 Married Engineering Female +1
Predict if a person earns >$50K
Training examples (rows)
Input features / x
Label / ground truth / y
27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Basic Terminology
Age Education Years of
education
Marital
status
Occupation Sex Label
39 Bachelors 16 Single Adm-clerical Male -1
31 Masters 18 Married Engineering Female +1
One-hot encoding to convert categorical features
Age Edu_Bachelors Edu_Masters Years of
education
Marital_Single … Label
39 1 0 16 1 … -1
31 0 1 18 0 … +1
28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Inspired by the brain’s neurons
We have ~100B of them, and ~1Q Synapses
w1
w2
wn
x1
x2
xn
Σ φ
Inputs Weights Activation
𝑦
…
𝑦 = 𝜑(
𝑗=1
𝑛
𝑤𝑗 𝑥𝑗 + 𝑏)
(Artificial) Neural Networks (ANN)
b
Bias
29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Bias term
• Each neuron has a bias associated with it
• Moves the activation left or right on x-axis
𝑦 = 𝜑(
𝑗=1
𝑛
𝑤𝑗 𝑥𝑗 + 𝑏)
30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deep Learning
the Multi Layer Perceptron (MLP)
hidden layers
Input layer
output
Activation
Discriminator
31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activation Functions
• Determine how the neuron fires
• Represent non-linearity
𝑦 = 𝜑(
𝑗=1
𝑛
𝑤𝑗 𝑥𝑗 + 𝑏)
• 𝑆𝑖𝑔𝑚𝑜𝑖𝑑: Φ 𝑥 = 1
1+𝑒−𝑥
• 𝑡𝑎𝑛ℎ: Φ 𝑥 = 2
1+𝑒−2𝑥 −1
• 𝑟𝑒𝑙𝑢: Φ 𝑥 =
𝑥; 𝑖𝑓 𝑥 ≥ 0
0; 𝑖𝑓 𝑥 < 0
• 𝑠𝑜𝑓𝑡𝑚𝑎𝑥: Φ 𝑥 𝑖 = 𝑒 𝑥 𝑖
𝑘=1
𝐾 𝑒 𝑥 𝑘
32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
0.
4
0.
3
0.
2
0.
9
...
backpropagation (gradient descent)
𝑦 != 𝑦
0.4 ± 𝛿 0.3 ± 𝛿
new
weights
new
weights
0
1
0
1
1
.
.
.
X
input
label
...
𝑦
ℎ𝑖 = Φ(
𝑖=1
𝑛
𝑤𝑖 𝑥𝑖 + 𝑏)
The learning in Deep Learning
33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Other layers: Convolutional Neural Networks
34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sharpening filter
Laplacian filter
Sobel x-axis filter
Used in Computer Vision for a long time
35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
It is the cross-channel sum of the element-wise
multiplication of a convolutional filter (kernel/mask)
computed over a sliding window on an input tensor given
a certain stride and padding, plus a bias term. The result
is called a feature map.
2 2 1
3 1 -1
4 3 2
1 -1
-1 0
Input matrix (3x3)
no padding
1 channel
Kernel (2x2)
Stride 1
Bias = 2
Feature map (2x2)
-1 2
0 1
1*2 –1*2 –1*3 + 0*1 + 2 = – 1
1*2 –1*2 –1*1 + 0*-1 + 2. = 2
1*3 –1*1 –1*4 + 0*3 + 2 = 0
1*1 – (-1)*1 –1*3 + 0*2 + 2 = 1
How does it work?
36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Spatial data: Convolutional Layers (images, etc)
37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
- Detect patterns at larger and larger scale by stacking
convolution layers on top of each others to grow the
receptive field
- Applicable to spatially correlated data
Source: AlexNet first 96 filters learned represented in RGB space
38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sharpening filter
Laplacian filter
Sobel x-axis filter
Used in Computer Vision for a long time,
now learnable!
39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Source: ML Review, A guide to receptive field arithmetic
Deeper in the
network
Hierarchical learning: growing receptive field
40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Another layer: Max pooling
41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A lot of others layers:
- Batch normalization layer
- Dropout layer
- Pooling layer
- Attention layer
- Recurrent Layers (LSTM, GRU)
- …
42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
More deep learning concepts
43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Overfitting
• Model learns signal as well as
noise in the training data.
• Model doesn’t generalize
• too few data points, noisy
data, or too large of a network
44. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Parameters and Hyperparameters
• Parameters
• Numeric values in the model: weights and biases
• Learned during training
• Hyperparameters
• Values set for the training session
• Numeric e.g. mini-batch size
• Non-numeric e.g. which algorithm to use for optimization
• Hyperparameter optimization
• Outer layer of learning / searching for hyperparameters
45. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Accuracy vs. Loss
• Accuracy: A percentage
• Correct or not per example
• Loss: calculated during training
• How far off is the current model?
• Continuous value
• Common loss functions
• Mean squared error (regression)
• Cross entropy: log of difference in probability
• During training, minimize loss with an optimizer
46. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Stochastic Gradient Descent
• Take a series of steps
• Specify a learning rate:
• weight = weight + learning_rate * gradient
47. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Other optimization rules:
48. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Conclusion
- Deep learning is a collection of techniques and algorithms
- Characterized by large computational graphs with learnable
parameters
- Trained using backward propagation of the gradients of a loss
- Usually requiring large amount of data
- Possible by advances in hardware and software
- Applied to a variety of tasks across a large number of domains
Notes de l'éditeur ML was created to identify images of cats. Semi-supervised?? Well, things that don’t neatly fit into one of those buckets or the other. Sometimes here people say “ML and DL” as if they were two different things. Good examples:
Imagine x = longitude, y = latitude
Linearly separable: blue dots = people in USA; green dots = Canada
Non-Linearly separable: blue = people in a city, green = people outside city https://shaoanlu.wordpress.com/2017/05/07/vihicle-detection-using-ssd-on-floybhub-udacity-self-driving-car-nano-degree/
https://arxiv.org/abs/1710.10196 Many fields of application, no need to be an expert Common ML educational task on old USA census data. All elements of input feature vector must be numbers.
Assign to positions in the vector, not just arbitrary numbers
There are more complex ways of encoding categorical features e.g. embeddings Without these activation functions we would just have linear combinations of features and weights Squinting analogy 0 Padding – reduction in size
Random parameters for demonstration purposes Squinting analogy First, imagine that you have two parameters