SlideShare une entreprise Scribd logo
1  sur  68
Télécharger pour lire hors ligne
Convolutional Neural Networks
and Natural Language Processing
Thomas Delteil – github.com/thomasdelteil – linkedin.com/in/thomasdelteil
Applied Scientist @ AWS Deep Engine
Goals
§ Explain what convolutions are
§ Show how to handle textual data
§ Analyze a reference neural network
architecture for text classification
§ Demonstrate how to train and deploy a CNN for
Natural Language Processing
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Convolutions
And where to find them
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
2012 - ImageNet Classification with Deep Convolutional Neural Networks
ImageNet classification with Deep Convolutional Neural Networks, Alex Krizhevsky, Ilya Sutskever, Geoffrey E.
Hinton, Advances in Neural Information Processing Systems, 2012
AlexNet architecture
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
ImageNet competition
Classify images among 1000 classes:
AlexNet Top-5 error-rate, 25% => 16%!
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Actual photo of the reaction from the computer vision community*
*might just be a stock photo
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
I told you
so!
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
What made Convolutional Neural Networks viable?
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
GPUs!
- Nvidia V100, float16 Ops:
~ 120 TFLOPS, 5000+ cuda cores
- #1 Super computer 2005 ~135 TFLOPS
Source: Mathworks
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Sea/Land segmentation via satellite images
DeepUNet: A Deep Fully Convolutional Network for Pixel-level Sea-Land Segmentation, Ruirui Li et al, 2017
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Automatic Galaxy classication
Deep Galaxy: Classification of Galaxies based on Deep Convolutional Neural Networks , Nour Eldeen M. Khalifa, 2017
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Medical Imaging, MRI, X-ray, surgical cameras
Review of MRI-based Brain Tumor Image Segmentation Using Deep Learning Methods, Ali Isn et al. 2016
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
What is a convolution ?
It is the cross-channel sum of the element-wise
multiplication of a convolutional filter (kernel/mask)
computed over a sliding window on an input tensor
given a certain stride and padding, plus a bias term.
The result is called a feature map.
2 2 1
3 1 -1
4 3 2
1 -1
-1 0
Input matrix (3x3)
no padding
1 channel
Kernel (2x2)
Stride 1
Bias = 2
Feature map (2x2)
-1 2
0 1
1*2 –1*2 –1*3 + 0*1 + 2 = – 1
1*2 –1*2 –1*1 + 0*-1 + 2. = 2
1*3 –1*1 –1*4 + 0*3 + 2 = 0
1*1 – (-1)*1 –1*3 + 0*2 + 2 = 1
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
What is a convolution ? Padding
Source: Machine Learning guru - Neural Networks CNN
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
What is a convolution ? Stride = 2
Source: Machine Learning guru - Neural Networks CNN
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
What is a convolution ? Multi Channel
1 convolutional filter
(3)x(3x3)
Source: Machine Learning guru - Neural Networks CNN
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
What is a convolution ? Multi Channel
source: Convolutional Neural Networks on the iphone with vggnet
N: Number of input channels
W:Width of the kernel
H: Height of the kernel
M: Number of output channels
Kernel size = ! ∗ # ∗ $
#Params = % ∗ ! ∗ # ∗ $ + %
256 convolutions of kernel (3,3) on 256 input channels
256*256*3*3 = ~0.5M
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Easily parallelizable
Convolution computations are:
- Independent (across filters and within
filter)
- Simple (multiplication and sums)
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Why does it work?
Sharpening filter
Laplacian filter
Sobel x-axis filter
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Why does it work?
- Detect patterns at larger and larger scale by stacking convolution
layers on top of each others to grow the receptive field
- Applicable to spatially correlated data
Source: AlexNet first 96 (55x55) filters learned represented in RGB
space (3 input channels)
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Growing receptive field
Source: ML Review, A guide to receptive field arithmetic
Deeper in the
network
Visualize convolutions
http://scs.ryerson.ca/~aharley/vis/conv/flat.html
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Visualize convolutions
Source: Neural Network 3D Simulation
(warning flashing lights)
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
State of the art networks are getting deeper and more complex
Source: Inception v3
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
input
Learn Data Science – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
High number of parameters => Requires a lot of data to train
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Advanced type of convolutions
Source: An introduction to different types of convolutions
Transposed Convolutions
(deconvolution)
EnhanceNet
Dilated Convolutions
WaveNet
Depth-wise separable
Convolutions
MobileNet
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
On to Natural Language
Processing
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
NLP
Machine
translation
OCR
Q&A
Sentiment
Analysis
Speech
Recognition
TTS
Topic
Modelling
Information
Retrieval
Natural
Language
Understanding
Document
Classification
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
NLP Domains
8.4PB
of information per second
as of 2020
source: business2comunity, 2016
70%
of companies
use customer feedback
Source: business2comunity, 2016
£1.3Tvalue of company
data
source: IDC, 2014
10%
of organizations expect to
commercialise their data by 2020
source: Gartner, 2016
NLP Industry Facts
Source: Ticary, What is natural language processing Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Convolutions and Natural Language Processing
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Data Representation
?
source: Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn,and Dong Yu,. Classification Convolutional Neural Networks for
Speech Recognition, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Encoding Data word-level
- Word-level embedding (word2vec). Word -> N-dimensional vector
Source: Convolutional Neural Networks for Sentence Classification,Yoon Kim, 2014
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
N
time
different
embeddings
V A N C O U V E R N L P …
_ 0 0 0 0 0 0 0 0 0 1 0 0 0
- 0 0 0 0 0 0 0 0 0 0 0 0 0
. 0 0 0 0 0 0 0 0 0 0 0 0 0
A 0 1 0 0 0 0 0 0 0 0 0 0 0
B 0 0 0 0 0 0 0 0 0 0 0 0 0
C 0 0 0 1 0 0 0 0 0 0 0 0 0
D 0 0 0 0 0 0 0 0 0 0 0 0 0
E 0 0 0 0 0 0 0 1 0 0 0 0 0
F 0 0 0 0 0 0 0 0 0 0 0 0 0
G 0 0 0 0 0 0 0 0 0 0 0 0 0
H 0 0 0 0 0 0 0 0 0 0 0 0 0
I 0 0 0 0 0 0 0 0 0 0 0 0 0
J 0 0 0 0 0 0 0 0 0 0 0 0 0
K 0 0 0 0 0 0 0 0 0 0 0 0 0
L 0 0 0 0 0 0 0 0 0 0 0 1 0
M 0 0 0 0 0 0 0 0 0 0 0 0 0
N 0 0 1 0 0 0 0 0 0 0 1 0 0
O 0 0 0 0 1 0 0 0 0 0 0 0 0
P 0 0 0 0 0 0 0 0 0 0 0 0 1
Q 0 0 0 0 0 0 0 0 0 0 0 0 0
R 0 0 0 0 0 0 0 0 1 0 0 0 0
S 0 0 0 0 0 0 0 0 0 0 0 0 0
T 0 0 0 0 0 0 0 0 0 0 0 0 0
U 0 0 0 0 0 1 0 0 0 0 0 0 0
V 1 0 0 0 0 0 1 0 0 0 0 0 0
W 0 0 0 0 0 0 0 0 0 0 0 0 0
X 0 0 0 0 0 0 0 0 0 0 0 0 0
Y 0 0 0 0 0 0 0 0 0 0 0 0 0
Z 0 0 0 0 0 0 0 0 0 0 0 0 0
Encoding Data – Character-level
- One-hot encoding
- Alphabet
- Sparse representation
- Character embedding
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Text classification, N categories
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Text classification, N categories
Neural
Network
- Fiction: 0%
- Biography: 6%
…
- Play: 80%
…
- Documentation: 0%
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
source: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. NIPS 2015
Visualization with Netro
Deep Neural Network: Crepe Model
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Visualization with Netron
Intuition: convolutions act similarly as n-grams
V A N C O U V E R … 1013
_ 0 0 0 0 0 0 0 0 0 1
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
- 0 0 0 0 0 0 0 0 0 0 …
. 0 0 0 0 0 0 0 0 0 0 …
A 0 1 0 0 0 0 0 0 0 0 …
B 0 0 0 0 0 0 0 0 0 0 …
C 0 0 0 1 0 0 0 0 0 0 …
D 0 0 0 0 0 0 0 0 0 0 …
E 0 0 0 0 0 0 0 1 0 0 …
F 0 0 0 0 0 0 0 0 0 0 …
G 0 0 0 0 0 0 0 0 0 0 …
H 0 0 0 0 0 0 0 0 0 0 …
I 0 0 0 0 0 0 0 0 0 0 …
J 0 0 0 0 0 0 0 0 0 0 …
K 0 0 0 0 0 0 0 0 0 0 …
L 0 0 0 0 0 0 0 0 0 0 …
M 0 0 0 0 0 0 0 0 0 0 …
N 0 0 1 0 0 0 0 0 0 0 …
O 0 0 0 0 1 0 0 0 0 0 …
P 0 0 0 0 0 0 0 0 0 0 …
Q 0 0 0 0 0 0 0 0 0 0 …
R 0 0 0 0 0 0 0 0 1 0 …
S 0 0 0 0 0 0 0 0 0 0 …
T 0 0 0 0 0 0 0 0 0 0 …
U 0 0 0 0 0 1 0 0 0 0 …
V 1 0 0 0 0 0 1 0 0 0 …
W 0 0 0 0 0 0 0 0 0 0 …
X 0 0 0 0 0 0 0 0 0 0 …
Y 0 0 0 0 0 0 0 0 0 0 …
Z 0 0 0 0 0 0 0 0 0 0 …
0 1 2 3 4 … … … … … … … … 1007
0 6.4 1.1 3.2 0.3 -0.4 … … … … … … … … …
1 -2.1 0.2 -3.4 … … … … … … … … … … …
… … … … … … … … … … … … … … …
… … … … … … … … … … … … … … …
… … … … … … … … … … … … … … …
254 … … … … … … … … … … … … … …
255 1.2 3.4 -1 1.2 3.2 … … … … … … … … …
x 256
69x1014x1 = ~70k
1x1008x256 = ~256k
x 1008
Temporal Convolution (256 69*7/1)
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
1x1008x256 = ~256k
1x1008x256 = ~ 256k
Activation Function: Rectified Linear Unit (ReLU)
! " = $
", " ≥ 0
0, " < 0
0 1 2 3 4 5 … 1007
0 6.4 1.1 3.2 0.3 -0.4 0.2 … …
… … … … … … … … …
255 1.2 3.4 -1 1.2 3.2 2.8 … …
0 1 2 3 4 5 … 1007
0 6.4 1.1 3.2 0.3 0 0.2 … …
… … … … … … … … …
255 1.2 3.4 0 1.2 3.2 2.8 … …
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
0 1 2 3 4 5 … 1007
0 6.4 1.1 3.2 0.3 0 0.2 … …
… … … … … … … … …
255 1.2 3.4 0 1.2 3.2 2.8 … …
0 1 … 335
0 6.4 0.3 … …
… … … … …
255 3.4 3.2 … …
1x1008x256 = ~256k
1x336x256 = ~86k
x 336
x 256
Down-sampling: Max-Pooling (256 1*3/3)
source : Stanford's CS231n
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Fast forward…
1x336x256 = ~86k <- after 1 convolution layer (69*7/1) and 1 max pooling (3x1/3)
1x330x256 = ~85k <- after 1 convolution layer (1*7/1)
1x110x256 = ~28k <- 1 max-pooling (1*3/3)
3x102x256 = ~26k <- 4 convolutions layers (1*3/1)
1x34x256 = ~9k <- 1 max-pooling (1*3/3)
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
0 1 2 3 4 5 6 7 8 … 33
0 6.4 0.1 … … … … … … … … …
1 2.1 24.9 … … … … … … … … …
… … … … … … … … … … … …
255 … … … … … … … … … … 9.9
0
0 6.4
1 0.1
… …
34 2.1
35 24.9
… …
… …
… …
8703 9.9 8704x1x1 = ~9k
1x34x256 = ~9k
x 256
Flattening Layer
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
0
0 6.4
1 0.1
… …
8703 9.9
8704x1x1 = ~9k
0
1
k
1023
x 1024
1024x1x1 = ~1k
!" # = %
&'(
)*(+
,"& ∗ .& + 0"
0
0 8.7
1 -2.1
… …
1023 32.1
Fully Connected / Dense layer (1024)
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
0
0 8.7
1 0
… …
… …
… …
… …
… …
… …
… …
1023 32.1
DROP OUT
1024x1x1 = ~1k
0
1
k
1023
x 1024
1024x1x1 = ~1k
!" # = %
&'(
)*(+
,"& ∗ .& + 0"
0
0 9.2
1 5.3
… …
1023 0.1
ignored
Dropout (p=0.5) + Fully Connected Layer (1024)
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
0
0 6.4
1 0.1
… …
… …
… …
… …
… …
… …
… …
1023 9.9
1024x1x1 = ~1k
0
…
N-1
x N
Nx1x1 = N
0
0 2.7
1 0.1
… …
… …
N-1 12.5
ignored
Softmax
0
0 0.1
1 0.01
… …
… …
N-1 0.8
Nx1x1 = N
!"#$%&' ( ) =
+,-
∑/01
234 +
,/
Output: Dropout + Dense + Softmax for N categories
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Text classification, N categories
Neural
Network
- Fiction: 0%
- Biography: 6%
…
- Play: 6%
…
- Documentation: 80%
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
How to train the network? Backward propagation!
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Backward propagation – Efficient Gradient Descent
- Fiction: 0%
- Biography: 6% 0%
…
- Play: 6% 100%
…
- Documentation: 80% 0%
- Fiction: 0%
- Biography: 6%
…
- Play: 6%
…
- Documentation: 80%
Update the weights of the convolutional masks and fully
connected units so that the error will be minimized next time
Neural
Network
!"# = !"# − &.
()
(*+,
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Learning Rate ! : How much to update the weights for every batch of documents?
Training Parameters: Learning Rate
Source:Towards data Science: Gradient descent in a nutshell
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Training parameters: Batch Size
Batch size: How many examples to learn from in one step?
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Training parameters: Number of epochs
Number of epochs: How many times should we feed the network the entire training set?
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Jupyter notebook demo – Crepe in Apache MXNet/Gluon
https://github.com/ThomasDelteil/CNN_NLP_MXNet
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Results
Traditional approaches
Word-level CNN
Character-level CNN
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
For images
For text
Humans to rephrase the examples
Synonyms
Similar semantic meaning
Data Augmentation
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Data Augmentation
The quick brown fox jumps over the lazy dog
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Data Augmentation
The quick brown fox jumps over the lazy dog
fast
swift
speedy
idle
indolent
slothful
hound
pup
mutt
leaps
springs
bounds
hops
hazel
brunette
chestnut
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Data Augmentation
The quick brown fox jumps over the lazy dog
fast
swift
speedy
idle
indolent
slothful
hound
pup
mutt
leaps
springs
bounds
hops
hazel
brunette
chestnut
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Data Augmentation
The quick brown fox jumps over the lazy dog
fast
swift
speedy
idle
indolent
slothful
hound
pup
mutt
leaps
springs
bounds
hops
The swift brunette fox leaps over the slothful pup
hazel
brunette
chestnut
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
You need a large dataset
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
…A very large dataset!
Live Demo – Classification of product category for Amazon Reviews
https://thomasdelteil.github.io/CNN_NLP_MXNet/
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
- Develop model using a Jupyter notebook
- Train model on GPU instance
- Package model behind web API in a Docker container, e.g using MXNet Model Server
- Upload container to container registry
- Deploy container to an elastic container service
- Enjoy quick and linear scaling
- Put the API behind a load balancer with SSL termination
- Enjoy J
Workflow and Operationalization
Elastic
Container
Service
GPU instance Container
Registry
Auto-scaling Load
Balancer
Container
HTTPS request
“Loved this
book”
HTTPS response
{
“prediction” : {
“book”: 0.99
}
}
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Advanced use-cases for
Convolutions and NLP
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
CNN + LSTM: Spatially and Temporally Deep Neural Networks
- CNN for feature extraction
- LSTM for temporal representation
Applications:
- Video (CNN for frames, LSTM to
combine them temporally)
- Text tasks
- Audio (Language detection)
Source: Combining CNN and RNN for spoken language detection
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Advanced use-case: Speech Generation WaveNet
Source: DeepMind Wavenet generative model raw audio
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
WaveNet: Dilated Causal Convolution
Source: DeepMind Wavenet generative model raw audio
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
WaveNet: Dilated Causal Convolution
Source: DeepMind Wavenet generative model raw audio
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Summary
§ Learned about convolutions
§ Applied them to textual data
§ Studied the crepe architecture from
Zhang et al. in details
§ Learned about advanced use cases and
operationalization
Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
Thank you!
Connect here
github.com/thomasdelteil
linkedin.com/in/thomasdelteil
tdelteil@amazon.com
Photos credits: https://pexels.com and https://unsplash.com/

Contenu connexe

Tendances

Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity RecognitionTomer Lieber
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer modelsDing Li
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processingMinh Pham
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers Arvind Devaraj
 
Knowledge Representation & Reasoning
Knowledge Representation & ReasoningKnowledge Representation & Reasoning
Knowledge Representation & ReasoningSajid Marwat
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: ParsingRushdi Shams
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingSangwoo Mo
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to HindiRajat Jain
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfPo-Chuan Chen
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introductionRobert Lujo
 
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)Weiwei Guo
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 
LLMs_talk_March23.pdf
LLMs_talk_March23.pdfLLMs_talk_March23.pdf
LLMs_talk_March23.pdfChaoYang81
 

Tendances (20)

Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity Recognition
 
Ngrams smoothing
Ngrams smoothingNgrams smoothing
Ngrams smoothing
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
Knowledge Representation & Reasoning
Knowledge Representation & ReasoningKnowledge Representation & Reasoning
Knowledge Representation & Reasoning
 
NLP
NLPNLP
NLP
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: Parsing
 
Gpt models
Gpt modelsGpt models
Gpt models
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to Hindi
 
First order logic
First order logicFirst order logic
First order logic
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
NLP_KASHK:Finite-State Automata
NLP_KASHK:Finite-State AutomataNLP_KASHK:Finite-State Automata
NLP_KASHK:Finite-State Automata
 
LLMs_talk_March23.pdf
LLMs_talk_March23.pdfLLMs_talk_March23.pdf
LLMs_talk_March23.pdf
 
What is langchain
What is langchainWhat is langchain
What is langchain
 

Similaire à Convolutional Neural Networks and Natural Language Processing

The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?Frank van Harmelen
 
Introduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS PractitionersIntroduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS PractitionersEmanuele Della Valle
 
Deep Content Learning in Traffic Prediction and Text Classification
Deep Content Learning in Traffic Prediction and Text ClassificationDeep Content Learning in Traffic Prediction and Text Classification
Deep Content Learning in Traffic Prediction and Text ClassificationHPCC Systems
 
Deep Learning for IoT : is there a shallow end of the pool?
Deep Learning for IoT : is there a shallow end of the pool?Deep Learning for IoT : is there a shallow end of the pool?
Deep Learning for IoT : is there a shallow end of the pool?Venu Vasudevan
 
The Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge RepresentationThe Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge RepresentationFrank van Harmelen
 
building intelligent systems with large scale deep learning
building intelligent systems with large scale deep learningbuilding intelligent systems with large scale deep learning
building intelligent systems with large scale deep learningmustafa sarac
 
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...MaRS Discovery District
 
Intro to deep learning
Intro to deep learning Intro to deep learning
Intro to deep learning David Voyles
 
OK festival Lightning Talk - Collaborative Open Geospatial Data
OK festival Lightning Talk - Collaborative Open Geospatial DataOK festival Lightning Talk - Collaborative Open Geospatial Data
OK festival Lightning Talk - Collaborative Open Geospatial DataAndrew Turner
 
Deep learning tutorial 9/2019
Deep learning tutorial 9/2019Deep learning tutorial 9/2019
Deep learning tutorial 9/2019Amr Rashed
 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning TutorialAmr Rashed
 
Nautral Langauge Processing - Basics / Non Technical
Nautral Langauge Processing - Basics / Non Technical Nautral Langauge Processing - Basics / Non Technical
Nautral Langauge Processing - Basics / Non Technical Dhruv Gohil
 
Small, Medium and Big Data
Small, Medium and Big DataSmall, Medium and Big Data
Small, Medium and Big DataPierre De Wilde
 
BDW16 London - Chris von Csefalvay, Helioserv - Cats and What They Tell us Ab...
BDW16 London - Chris von Csefalvay, Helioserv - Cats and What They Tell us Ab...BDW16 London - Chris von Csefalvay, Helioserv - Cats and What They Tell us Ab...
BDW16 London - Chris von Csefalvay, Helioserv - Cats and What They Tell us Ab...Big Data Week
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningAmr Rashed
 
Deep Learning and Watson Studio
Deep Learning and Watson StudioDeep Learning and Watson Studio
Deep Learning and Watson StudioSasha Lazarevic
 
Chatbots in 2017 -- Ithaca Talk Dec 6
Chatbots in 2017 -- Ithaca Talk Dec 6Chatbots in 2017 -- Ithaca Talk Dec 6
Chatbots in 2017 -- Ithaca Talk Dec 6Paul Houle
 

Similaire à Convolutional Neural Networks and Natural Language Processing (20)

The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?
 
Introduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS PractitionersIntroduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS Practitioners
 
Novi sad ai event 1-2018
Novi sad ai event 1-2018Novi sad ai event 1-2018
Novi sad ai event 1-2018
 
Deep Content Learning in Traffic Prediction and Text Classification
Deep Content Learning in Traffic Prediction and Text ClassificationDeep Content Learning in Traffic Prediction and Text Classification
Deep Content Learning in Traffic Prediction and Text Classification
 
Deep Learning for IoT : is there a shallow end of the pool?
Deep Learning for IoT : is there a shallow end of the pool?Deep Learning for IoT : is there a shallow end of the pool?
Deep Learning for IoT : is there a shallow end of the pool?
 
The Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge RepresentationThe Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge Representation
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
 
building intelligent systems with large scale deep learning
building intelligent systems with large scale deep learningbuilding intelligent systems with large scale deep learning
building intelligent systems with large scale deep learning
 
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
 
Intro to deep learning
Intro to deep learning Intro to deep learning
Intro to deep learning
 
OK festival Lightning Talk - Collaborative Open Geospatial Data
OK festival Lightning Talk - Collaborative Open Geospatial DataOK festival Lightning Talk - Collaborative Open Geospatial Data
OK festival Lightning Talk - Collaborative Open Geospatial Data
 
Deep learning tutorial 9/2019
Deep learning tutorial 9/2019Deep learning tutorial 9/2019
Deep learning tutorial 9/2019
 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning Tutorial
 
Nautral Langauge Processing - Basics / Non Technical
Nautral Langauge Processing - Basics / Non Technical Nautral Langauge Processing - Basics / Non Technical
Nautral Langauge Processing - Basics / Non Technical
 
Small, Medium and Big Data
Small, Medium and Big DataSmall, Medium and Big Data
Small, Medium and Big Data
 
BDW16 London - Chris von Csefalvay, Helioserv - Cats and What They Tell us Ab...
BDW16 London - Chris von Csefalvay, Helioserv - Cats and What They Tell us Ab...BDW16 London - Chris von Csefalvay, Helioserv - Cats and What They Tell us Ab...
BDW16 London - Chris von Csefalvay, Helioserv - Cats and What They Tell us Ab...
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Deep Learning and Watson Studio
Deep Learning and Watson StudioDeep Learning and Watson Studio
Deep Learning and Watson Studio
 
Query Understanding
Query UnderstandingQuery Understanding
Query Understanding
 
Chatbots in 2017 -- Ithaca Talk Dec 6
Chatbots in 2017 -- Ithaca Talk Dec 6Chatbots in 2017 -- Ithaca Talk Dec 6
Chatbots in 2017 -- Ithaca Talk Dec 6
 

Dernier

6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfPratikPatil591646
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 

Dernier (17)

6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdf
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 

Convolutional Neural Networks and Natural Language Processing

  • 1. Convolutional Neural Networks and Natural Language Processing Thomas Delteil – github.com/thomasdelteil – linkedin.com/in/thomasdelteil Applied Scientist @ AWS Deep Engine
  • 2. Goals § Explain what convolutions are § Show how to handle textual data § Analyze a reference neural network architecture for text classification § Demonstrate how to train and deploy a CNN for Natural Language Processing Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 3. Convolutions And where to find them Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 4. 2012 - ImageNet Classification with Deep Convolutional Neural Networks ImageNet classification with Deep Convolutional Neural Networks, Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, Advances in Neural Information Processing Systems, 2012 AlexNet architecture Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 5. ImageNet competition Classify images among 1000 classes: AlexNet Top-5 error-rate, 25% => 16%! Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 6. Actual photo of the reaction from the computer vision community* *might just be a stock photo Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 7. I told you so! Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 8. What made Convolutional Neural Networks viable? Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 9. GPUs! - Nvidia V100, float16 Ops: ~ 120 TFLOPS, 5000+ cuda cores - #1 Super computer 2005 ~135 TFLOPS Source: Mathworks Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 10. Sea/Land segmentation via satellite images DeepUNet: A Deep Fully Convolutional Network for Pixel-level Sea-Land Segmentation, Ruirui Li et al, 2017 Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 11. Automatic Galaxy classication Deep Galaxy: Classification of Galaxies based on Deep Convolutional Neural Networks , Nour Eldeen M. Khalifa, 2017 Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 12. Medical Imaging, MRI, X-ray, surgical cameras Review of MRI-based Brain Tumor Image Segmentation Using Deep Learning Methods, Ali Isn et al. 2016 Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 13. What is a convolution ? It is the cross-channel sum of the element-wise multiplication of a convolutional filter (kernel/mask) computed over a sliding window on an input tensor given a certain stride and padding, plus a bias term. The result is called a feature map. 2 2 1 3 1 -1 4 3 2 1 -1 -1 0 Input matrix (3x3) no padding 1 channel Kernel (2x2) Stride 1 Bias = 2 Feature map (2x2) -1 2 0 1 1*2 –1*2 –1*3 + 0*1 + 2 = – 1 1*2 –1*2 –1*1 + 0*-1 + 2. = 2 1*3 –1*1 –1*4 + 0*3 + 2 = 0 1*1 – (-1)*1 –1*3 + 0*2 + 2 = 1 Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 14. What is a convolution ? Padding Source: Machine Learning guru - Neural Networks CNN Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 15. What is a convolution ? Stride = 2 Source: Machine Learning guru - Neural Networks CNN Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 16. What is a convolution ? Multi Channel 1 convolutional filter (3)x(3x3) Source: Machine Learning guru - Neural Networks CNN Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 17. What is a convolution ? Multi Channel source: Convolutional Neural Networks on the iphone with vggnet N: Number of input channels W:Width of the kernel H: Height of the kernel M: Number of output channels Kernel size = ! ∗ # ∗ $ #Params = % ∗ ! ∗ # ∗ $ + % 256 convolutions of kernel (3,3) on 256 input channels 256*256*3*3 = ~0.5M Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 18. Easily parallelizable Convolution computations are: - Independent (across filters and within filter) - Simple (multiplication and sums) Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 19. Why does it work? Sharpening filter Laplacian filter Sobel x-axis filter Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 20. Why does it work? - Detect patterns at larger and larger scale by stacking convolution layers on top of each others to grow the receptive field - Applicable to spatially correlated data Source: AlexNet first 96 (55x55) filters learned represented in RGB space (3 input channels) Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 21. Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil Growing receptive field Source: ML Review, A guide to receptive field arithmetic Deeper in the network
  • 22. Visualize convolutions http://scs.ryerson.ca/~aharley/vis/conv/flat.html Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 23. Visualize convolutions Source: Neural Network 3D Simulation (warning flashing lights) Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 24. State of the art networks are getting deeper and more complex Source: Inception v3 Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil input
  • 25. Learn Data Science – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil High number of parameters => Requires a lot of data to train Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 26. Advanced type of convolutions Source: An introduction to different types of convolutions Transposed Convolutions (deconvolution) EnhanceNet Dilated Convolutions WaveNet Depth-wise separable Convolutions MobileNet Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 27. On to Natural Language Processing Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 28. NLP Machine translation OCR Q&A Sentiment Analysis Speech Recognition TTS Topic Modelling Information Retrieval Natural Language Understanding Document Classification Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil NLP Domains
  • 29. 8.4PB of information per second as of 2020 source: business2comunity, 2016 70% of companies use customer feedback Source: business2comunity, 2016 £1.3Tvalue of company data source: IDC, 2014 10% of organizations expect to commercialise their data by 2020 source: Gartner, 2016 NLP Industry Facts Source: Ticary, What is natural language processing Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 30. Convolutions and Natural Language Processing Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 31. Data Representation ? source: Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn,and Dong Yu,. Classification Convolutional Neural Networks for Speech Recognition, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014 Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 32. Encoding Data word-level - Word-level embedding (word2vec). Word -> N-dimensional vector Source: Convolutional Neural Networks for Sentence Classification,Yoon Kim, 2014 Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil N time different embeddings
  • 33. V A N C O U V E R N L P … _ 0 0 0 0 0 0 0 0 0 1 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 . 0 0 0 0 0 0 0 0 0 0 0 0 0 A 0 1 0 0 0 0 0 0 0 0 0 0 0 B 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 1 0 0 0 0 0 0 0 0 0 D 0 0 0 0 0 0 0 0 0 0 0 0 0 E 0 0 0 0 0 0 0 1 0 0 0 0 0 F 0 0 0 0 0 0 0 0 0 0 0 0 0 G 0 0 0 0 0 0 0 0 0 0 0 0 0 H 0 0 0 0 0 0 0 0 0 0 0 0 0 I 0 0 0 0 0 0 0 0 0 0 0 0 0 J 0 0 0 0 0 0 0 0 0 0 0 0 0 K 0 0 0 0 0 0 0 0 0 0 0 0 0 L 0 0 0 0 0 0 0 0 0 0 0 1 0 M 0 0 0 0 0 0 0 0 0 0 0 0 0 N 0 0 1 0 0 0 0 0 0 0 1 0 0 O 0 0 0 0 1 0 0 0 0 0 0 0 0 P 0 0 0 0 0 0 0 0 0 0 0 0 1 Q 0 0 0 0 0 0 0 0 0 0 0 0 0 R 0 0 0 0 0 0 0 0 1 0 0 0 0 S 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 0 0 0 0 0 0 0 0 0 U 0 0 0 0 0 1 0 0 0 0 0 0 0 V 1 0 0 0 0 0 1 0 0 0 0 0 0 W 0 0 0 0 0 0 0 0 0 0 0 0 0 X 0 0 0 0 0 0 0 0 0 0 0 0 0 Y 0 0 0 0 0 0 0 0 0 0 0 0 0 Z 0 0 0 0 0 0 0 0 0 0 0 0 0 Encoding Data – Character-level - One-hot encoding - Alphabet - Sparse representation - Character embedding Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 34. Text classification, N categories Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 35. Text classification, N categories Neural Network - Fiction: 0% - Biography: 6% … - Play: 80% … - Documentation: 0% Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 36. source: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. NIPS 2015 Visualization with Netro Deep Neural Network: Crepe Model Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil Visualization with Netron Intuition: convolutions act similarly as n-grams
  • 37. V A N C O U V E R … 1013 _ 0 0 0 0 0 0 0 0 0 1 … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … - 0 0 0 0 0 0 0 0 0 0 … . 0 0 0 0 0 0 0 0 0 0 … A 0 1 0 0 0 0 0 0 0 0 … B 0 0 0 0 0 0 0 0 0 0 … C 0 0 0 1 0 0 0 0 0 0 … D 0 0 0 0 0 0 0 0 0 0 … E 0 0 0 0 0 0 0 1 0 0 … F 0 0 0 0 0 0 0 0 0 0 … G 0 0 0 0 0 0 0 0 0 0 … H 0 0 0 0 0 0 0 0 0 0 … I 0 0 0 0 0 0 0 0 0 0 … J 0 0 0 0 0 0 0 0 0 0 … K 0 0 0 0 0 0 0 0 0 0 … L 0 0 0 0 0 0 0 0 0 0 … M 0 0 0 0 0 0 0 0 0 0 … N 0 0 1 0 0 0 0 0 0 0 … O 0 0 0 0 1 0 0 0 0 0 … P 0 0 0 0 0 0 0 0 0 0 … Q 0 0 0 0 0 0 0 0 0 0 … R 0 0 0 0 0 0 0 0 1 0 … S 0 0 0 0 0 0 0 0 0 0 … T 0 0 0 0 0 0 0 0 0 0 … U 0 0 0 0 0 1 0 0 0 0 … V 1 0 0 0 0 0 1 0 0 0 … W 0 0 0 0 0 0 0 0 0 0 … X 0 0 0 0 0 0 0 0 0 0 … Y 0 0 0 0 0 0 0 0 0 0 … Z 0 0 0 0 0 0 0 0 0 0 … 0 1 2 3 4 … … … … … … … … 1007 0 6.4 1.1 3.2 0.3 -0.4 … … … … … … … … … 1 -2.1 0.2 -3.4 … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … 254 … … … … … … … … … … … … … … 255 1.2 3.4 -1 1.2 3.2 … … … … … … … … … x 256 69x1014x1 = ~70k 1x1008x256 = ~256k x 1008 Temporal Convolution (256 69*7/1) Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 38. 1x1008x256 = ~256k 1x1008x256 = ~ 256k Activation Function: Rectified Linear Unit (ReLU) ! " = $ ", " ≥ 0 0, " < 0 0 1 2 3 4 5 … 1007 0 6.4 1.1 3.2 0.3 -0.4 0.2 … … … … … … … … … … … 255 1.2 3.4 -1 1.2 3.2 2.8 … … 0 1 2 3 4 5 … 1007 0 6.4 1.1 3.2 0.3 0 0.2 … … … … … … … … … … … 255 1.2 3.4 0 1.2 3.2 2.8 … … Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 39. 0 1 2 3 4 5 … 1007 0 6.4 1.1 3.2 0.3 0 0.2 … … … … … … … … … … … 255 1.2 3.4 0 1.2 3.2 2.8 … … 0 1 … 335 0 6.4 0.3 … … … … … … … 255 3.4 3.2 … … 1x1008x256 = ~256k 1x336x256 = ~86k x 336 x 256 Down-sampling: Max-Pooling (256 1*3/3) source : Stanford's CS231n Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 40. Fast forward… 1x336x256 = ~86k <- after 1 convolution layer (69*7/1) and 1 max pooling (3x1/3) 1x330x256 = ~85k <- after 1 convolution layer (1*7/1) 1x110x256 = ~28k <- 1 max-pooling (1*3/3) 3x102x256 = ~26k <- 4 convolutions layers (1*3/1) 1x34x256 = ~9k <- 1 max-pooling (1*3/3) Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 41. 0 1 2 3 4 5 6 7 8 … 33 0 6.4 0.1 … … … … … … … … … 1 2.1 24.9 … … … … … … … … … … … … … … … … … … … … … 255 … … … … … … … … … … 9.9 0 0 6.4 1 0.1 … … 34 2.1 35 24.9 … … … … … … 8703 9.9 8704x1x1 = ~9k 1x34x256 = ~9k x 256 Flattening Layer Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 42. 0 0 6.4 1 0.1 … … 8703 9.9 8704x1x1 = ~9k 0 1 k 1023 x 1024 1024x1x1 = ~1k !" # = % &'( )*(+ ,"& ∗ .& + 0" 0 0 8.7 1 -2.1 … … 1023 32.1 Fully Connected / Dense layer (1024) Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 43. 0 0 8.7 1 0 … … … … … … … … … … … … … … 1023 32.1 DROP OUT 1024x1x1 = ~1k 0 1 k 1023 x 1024 1024x1x1 = ~1k !" # = % &'( )*(+ ,"& ∗ .& + 0" 0 0 9.2 1 5.3 … … 1023 0.1 ignored Dropout (p=0.5) + Fully Connected Layer (1024) Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 44. 0 0 6.4 1 0.1 … … … … … … … … … … … … … … 1023 9.9 1024x1x1 = ~1k 0 … N-1 x N Nx1x1 = N 0 0 2.7 1 0.1 … … … … N-1 12.5 ignored Softmax 0 0 0.1 1 0.01 … … … … N-1 0.8 Nx1x1 = N !"#$%&' ( ) = +,- ∑/01 234 + ,/ Output: Dropout + Dense + Softmax for N categories Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 45. Text classification, N categories Neural Network - Fiction: 0% - Biography: 6% … - Play: 6% … - Documentation: 80% Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 46. How to train the network? Backward propagation! Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 47. Backward propagation – Efficient Gradient Descent - Fiction: 0% - Biography: 6% 0% … - Play: 6% 100% … - Documentation: 80% 0% - Fiction: 0% - Biography: 6% … - Play: 6% … - Documentation: 80% Update the weights of the convolutional masks and fully connected units so that the error will be minimized next time Neural Network !"# = !"# − &. () (*+, Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 48. Learning Rate ! : How much to update the weights for every batch of documents? Training Parameters: Learning Rate Source:Towards data Science: Gradient descent in a nutshell Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 49. Training parameters: Batch Size Batch size: How many examples to learn from in one step? Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 50. Training parameters: Number of epochs Number of epochs: How many times should we feed the network the entire training set? Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 51. Jupyter notebook demo – Crepe in Apache MXNet/Gluon https://github.com/ThomasDelteil/CNN_NLP_MXNet Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 52. Results Traditional approaches Word-level CNN Character-level CNN Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 53. For images For text Humans to rephrase the examples Synonyms Similar semantic meaning Data Augmentation Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 54. Data Augmentation The quick brown fox jumps over the lazy dog Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 55. Data Augmentation The quick brown fox jumps over the lazy dog fast swift speedy idle indolent slothful hound pup mutt leaps springs bounds hops hazel brunette chestnut Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 56. Data Augmentation The quick brown fox jumps over the lazy dog fast swift speedy idle indolent slothful hound pup mutt leaps springs bounds hops hazel brunette chestnut Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 57. Data Augmentation The quick brown fox jumps over the lazy dog fast swift speedy idle indolent slothful hound pup mutt leaps springs bounds hops The swift brunette fox leaps over the slothful pup hazel brunette chestnut Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 58. You need a large dataset Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 59. …A very large dataset!
  • 60. Live Demo – Classification of product category for Amazon Reviews https://thomasdelteil.github.io/CNN_NLP_MXNet/ Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 61. - Develop model using a Jupyter notebook - Train model on GPU instance - Package model behind web API in a Docker container, e.g using MXNet Model Server - Upload container to container registry - Deploy container to an elastic container service - Enjoy quick and linear scaling - Put the API behind a load balancer with SSL termination - Enjoy J Workflow and Operationalization Elastic Container Service GPU instance Container Registry Auto-scaling Load Balancer Container HTTPS request “Loved this book” HTTPS response { “prediction” : { “book”: 0.99 } } Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 62. Advanced use-cases for Convolutions and NLP Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 63. CNN + LSTM: Spatially and Temporally Deep Neural Networks - CNN for feature extraction - LSTM for temporal representation Applications: - Video (CNN for frames, LSTM to combine them temporally) - Text tasks - Audio (Language detection) Source: Combining CNN and RNN for spoken language detection Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 64. Advanced use-case: Speech Generation WaveNet Source: DeepMind Wavenet generative model raw audio Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 65. WaveNet: Dilated Causal Convolution Source: DeepMind Wavenet generative model raw audio Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 66. WaveNet: Dilated Causal Convolution Source: DeepMind Wavenet generative model raw audio Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil
  • 67. Summary § Learned about convolutions § Applied them to textual data § Studied the crepe architecture from Zhang et al. in details § Learned about advanced use cases and operationalization Learn Data Science, Vancouver – Deep Learning and NLP - CNNs and NLP - Thomas Delteil - github.com/thomasdelteil - linkedin.com/in/thomasdelteil