SlideShare une entreprise Scribd logo
1  sur  105
Télécharger pour lire hors ligne
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20171
Lecture 10:
Recurrent Neural Networks
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20172
Administrative
A1 grades will go out soon
A2 is due today (11:59pm)
Midterm is in-class on Tuesday!
We will send out details on where to go soon
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20173
Extra Credit: Train Game
More details on Piazza
by early next week
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20174
Last Time: CNN Architectures
AlexNet
Figure copyright Kaiming He, 2016. Reproduced with permission.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20175
Last Time: CNN Architectures
Figure copyright Kaiming He, 2016. Reproduced with permission.
3x3 conv, 128
Pool
3x3 conv, 64
3x3 conv, 64
Input
3x3 conv, 128
Pool
3x3 conv, 256
3x3 conv, 256
Pool
3x3 conv, 512
3x3 conv, 512
Pool
3x3 conv, 512
3x3 conv, 512
Pool
FC 4096
FC 1000
Softmax
FC 4096
3x3 conv, 512
3x3 conv, 512
Pool
Input
Pool
Pool
Pool
Pool
Softmax
3x3 conv, 512
3x3 conv, 512
3x3 conv, 256
3x3 conv, 256
3x3 conv, 128
3x3 conv, 128
3x3 conv, 64
3x3 conv, 64
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
FC 4096
FC 1000
FC 4096
VGG16 VGG19 GoogLeNet
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20176
Last Time: CNN Architectures
Figure copyright Kaiming He, 2016. Reproduced with permission.
Input
Softmax
3x3 conv, 64
7x7 conv, 64 / 2
FC 1000
Pool
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 128
3x3 conv, 128 / 2
3x3 conv, 128
3x3 conv, 128
3x3 conv, 128
3x3 conv, 128
...
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
Pool
relu
Residual block
conv
conv
X
identity
F(x) + x
F(x)
relu
X
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20177
Figures copyright Larsson et al., 2017. Reproduced with permission.
Pool
Conv
Dense Block 1
Conv
Input
Conv
Dense Block 2
Conv
Pool
Conv
Dense Block 3
Softmax
FC
Pool
Conv
Conv
1x1 conv, 64
1x1 conv, 64
Input
Concat
Concat
Concat
Dense Block
DenseNet
FractalNet
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20178
Last Time: CNN Architectures
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20179
Last Time: CNN Architectures
AlexNet and VGG have
tons of parameters in the
fully connected layers
AlexNet: ~62M parameters
FC6: 256x6x6 -> 4096: 38M params
FC7: 4096 -> 4096: 17M params
FC8: 4096 -> 1000: 4M params
~59M params in FC layers!
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201710
Today: Recurrent Neural Networks
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201711
Vanilla Neural Networks
“Vanilla” Neural Network
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201712
Recurrent Neural Networks: Process Sequences
e.g. Image Captioning
image -> sequence of words
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201713
Recurrent Neural Networks: Process Sequences
e.g. Sentiment Classification
sequence of words -> sentiment
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201714
Recurrent Neural Networks: Process Sequences
e.g. Machine Translation
seq of words -> seq of words
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201715
Recurrent Neural Networks: Process Sequences
e.g. Video classification on frame level
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201716
Sequential Processing of Non-Sequence Data
Ba, Mnih, and Kavukcuoglu, “Multiple Object Recognition with Visual Attention”, ICLR 2015.
Gregor et al, “DRAW: A Recurrent Neural Network For Image Generation”, ICML 2015
Figure copyright Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra, 2015. Reproduced with
permission.
Classify images by taking a
series of “glimpses”
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201717
Sequential Processing of Non-Sequence Data
Gregor et al, “DRAW: A Recurrent Neural Network For Image Generation”, ICML 2015
Figure copyright Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra, 2015. Reproduced with
permission.
Generate images one piece at a time!
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201718
Recurrent Neural Network
x
RNN
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201719
Recurrent Neural Network
x
RNN
y
usually want to
predict a vector at
some time steps
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201720
Recurrent Neural Network
x
RNN
y
We can process a sequence of vectors x by
applying a recurrence formula at every time step:
new state old state input vector at
some time step
some function
with parameters W
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201721
Recurrent Neural Network
x
RNN
y
We can process a sequence of vectors x by
applying a recurrence formula at every time step:
Notice: the same function and the same set
of parameters are used at every time step.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201722
(Vanilla) Recurrent Neural Network
x
RNN
y
The state consists of a single “hidden” vector h:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201723
h0 fW
h1
x1
RNN: Computational Graph
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201724
h0 fW
h1 fW
h2
x2
x1
RNN: Computational Graph
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201725
h0 fW
h1 fW
h2 fW
h3
x3
…
x2
x1
RNN: Computational Graph
hT
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201726
h0 fW
h1 fW
h2 fW
h3
x3
…
x2
x1
W
RNN: Computational Graph
Re-use the same weight matrix at every time-step
hT
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201727
h0 fW
h1 fW
h2 fW
h3
x3
yT
…
x2
x1
W
RNN: Computational Graph: Many to Many
hT
y3
y2y1
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201728
h0 fW
h1 fW
h2 fW
h3
x3
yT
…
x2
x1
W
RNN: Computational Graph: Many to Many
hT
y3
y2y1
L1
L2
L3
LT
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201729
h0 fW
h1 fW
h2 fW
h3
x3
yT
…
x2
x1
W
RNN: Computational Graph: Many to Many
hT
y3
y2y1
L1
L2
L3
LT
L
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201730
h0 fW
h1 fW
h2 fW
h3
x3
y
…
x2
x1
W
RNN: Computational Graph: Many to One
hT
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201731
h0 fW
h1 fW
h2 fW
h3
yT
…
x
W
RNN: Computational Graph: One to Many
hT
y3
y3y3
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201732
Sequence to Sequence: Many-to-one +
one-to-many
h
0
fW
h
1
fW
h
2
fW
h
3
x
3
…
x
2
x
1
W
1
h
T
Many to one: Encode input
sequence in a single vector
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201733
Sequence to Sequence: Many-to-one +
one-to-many
h
0
fW
h
1
fW
h
2
fW
h
3
x
3
…
x
2
x
1
W
1
h
T
y
1
y
2
…
Many to one: Encode input
sequence in a single vector
One to many: Produce output
sequence from single input vector
fW
h
1
fW
h
2
fW
W
2
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201734
Example:
Character-level
Language Model
Vocabulary:
[h,e,l,o]
Example training
sequence:
“hello”
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201735
Example:
Character-level
Language Model
Vocabulary:
[h,e,l,o]
Example training
sequence:
“hello”
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201736
Example:
Character-level
Language Model
Vocabulary:
[h,e,l,o]
Example training
sequence:
“hello”
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201737
Example:
Character-level
Language Model
Sampling
Vocabulary:
[h,e,l,o]
At test-time sample
characters one at a time,
feed back to model
.03
.13
.00
.84
.25
.20
.05
.50
.11
.17
.68
.03
.11
.02
.08
.79
Softmax
“e” “l” “l” “o”
Sample
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201738
.03
.13
.00
.84
.25
.20
.05
.50
.11
.17
.68
.03
.11
.02
.08
.79
Softmax
“e” “l” “l” “o”
SampleExample:
Character-level
Language Model
Sampling
Vocabulary:
[h,e,l,o]
At test-time sample
characters one at a time,
feed back to model
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201739
.03
.13
.00
.84
.25
.20
.05
.50
.11
.17
.68
.03
.11
.02
.08
.79
Softmax
“e” “l” “l” “o”
SampleExample:
Character-level
Language Model
Sampling
Vocabulary:
[h,e,l,o]
At test-time sample
characters one at a time,
feed back to model
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201740
.03
.13
.00
.84
.25
.20
.05
.50
.11
.17
.68
.03
.11
.02
.08
.79
Softmax
“e” “l” “l” “o”
SampleExample:
Character-level
Language Model
Sampling
Vocabulary:
[h,e,l,o]
At test-time sample
characters one at a time,
feed back to model
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201741
Backpropagation through time
Loss
Forward through entire sequence to
compute loss, then backward through
entire sequence to compute gradient
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201742
Truncated Backpropagation through time
Loss
Run forward and backward
through chunks of the
sequence instead of whole
sequence
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201743
Truncated Backpropagation through time
Loss
Carry hidden states
forward in time forever,
but only backpropagate
for some smaller
number of steps
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201744
Truncated Backpropagation through time
Loss
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201745
min-char-rnn.py gist: 112 lines of Python
(https://gist.github.com/karpathy/d4dee
566867f8291f086)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201746
x
RNN
y
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201747
train more
train more
train more
at first:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201748
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201749
The Stacks Project: open source algebraic geometry textbook
Latex source http://stacks.math.columbia.edu/
The stacks project is licensed under the GNU Free Documentation License
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201750
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201751
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201752
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201753
Generated
C code
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201754
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201755
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201756
Searching for interpretable cells
Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201757
Searching for interpretable cells
Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016
Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201758
Searching for interpretable cells
Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016
Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission
quote detection cell
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201759
Searching for interpretable cells
Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016
Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission
line length tracking cell
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201760
Searching for interpretable cells
Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016
Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission
if statement cell
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201761
Searching for interpretable cells
Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016
Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission
quote/comment cell
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201762
Searching for interpretable cells
Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016
Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission
code depth cell
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201763
Explain Images with Multimodal Recurrent Neural Networks, Mao et al.
Deep Visual-Semantic Alignments for Generating Image Descriptions, Karpathy and Fei-Fei
Show and Tell: A Neural Image Caption Generator, Vinyals et al.
Long-term Recurrent Convolutional Networks for Visual Recognition and Description, Donahue et al.
Learning a Recurrent Visual Representation for Image Caption Generation, Chen and Zitnick
Image Captioning
Figure from Karpathy et a, “Deep
Visual-Semantic Alignments for Generating
Image Descriptions”, CVPR 2015; figure
copyright IEEE, 2015.
Reproduced for educational purposes.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201764
Convolutional Neural Network
Recurrent Neural Network
test image
This image is CC0 public domain
test image
test image
X
test image
x0
<STA
RT>
<START>
h0
x0
<STA
RT>
y0
<START>
test image
before:
h = tanh(Wxh * x + Whh * h)
now:
h = tanh(Wxh * x + Whh * h + Wih * v)
v
Wih
h0
x0
<STA
RT>
y0
<START>
test image
straw
sample!
h0
x0
<STA
RT>
y0
<START>
test image
straw
h1
y1
h0
x0
<STA
RT>
y0
<START>
test image
straw
h1
y1
hat
sample!
h0
x0
<STA
RT>
y0
<START>
test image
straw
h1
y1
hat
h2
y2
h0
x0
<STA
RT>
y0
<START>
test image
straw
h1
y1
hat
h2
y2
sample
<END> token
=> finish.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201775
A cat sitting on a
suitcase on the floor
A cat is sitting on a tree
branch
A dog is running in the
grass with a frisbee
A white teddy bear sitting in
the grass
Two people walking on
the beach with surfboards
Two giraffes standing in a
grassy field
A man riding a dirt bike on
a dirt track
Image Captioning: Example Results
A tennis player in action
on the court
Captions generated using neuraltalk2
All images are CC0 Public domain:
cat suitcase, cat tree, dog, bear,
surfers, tennis, giraffe, motorcycle
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201776
Image Captioning: Failure Cases
A woman is holding a
cat in her hand
A woman standing on a
beach holding a surfboard
A person holding a
computer mouse on a desk
A bird is perched on
a tree branch
A man in a
baseball uniform
throwing a ball
Captions generated using neuraltalk2
All images are CC0 Public domain: fur
coat, handstand, spider web, baseball
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201777
Image Captioning with Attention
Xu et al, “Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015
Figure copyright Kelvin Xu, Jimmy Lei Ba, Jamie Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Benchio, 2015. Reproduced with permission.
RNN focuses its attention at a different spatial location
when generating each word
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201778
Image Captioning with Attention
CNN
Image:
H x W x 3
Features:
L x D
h0
Xu et al, “Show, Attend and Tell: Neural
Image Caption Generation with Visual
Attention”, ICML 2015
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201779
CNN
Image:
H x W x 3
Features:
L x D
h0
a1
Distribution over
L locations
Xu et al, “Show, Attend and Tell: Neural
Image Caption Generation with Visual
Attention”, ICML 2015
Image Captioning with Attention
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201780
CNN
Image:
H x W x 3
Features:
L x D
h0
a1
Weighted
combination
of features
Distribution over
L locations
z1
Weighted
features: D
Xu et al, “Show, Attend and Tell: Neural
Image Caption Generation with Visual
Attention”, ICML 2015
Image Captioning with Attention
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201781
CNN
Image:
H x W x 3
Features:
L x D
h0
a1
z1
Weighted
combination
of features
h1
Distribution over
L locations
Weighted
features: D
y1
First wordXu et al, “Show, Attend and Tell: Neural
Image Caption Generation with Visual
Attention”, ICML 2015
Image Captioning with Attention
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201782
CNN
Image:
H x W x 3
Features:
L x D
h0
a1
z1
Weighted
combination
of features
y1
h1
First word
Distribution over
L locations
a2 d1
Weighted
features: D
Distribution
over vocab
Xu et al, “Show, Attend and Tell: Neural
Image Caption Generation with Visual
Attention”, ICML 2015
Image Captioning with Attention
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201783
CNN
Image:
H x W x 3
Features:
L x D
h0
a1
z1
Weighted
combination
of features
y1
h1
First word
Distribution over
L locations
a2 d1
h2
z2 y2
Weighted
features: D
Distribution
over vocab
Xu et al, “Show, Attend and Tell: Neural
Image Caption Generation with Visual
Attention”, ICML 2015
Image Captioning with Attention
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201784
CNN
Image:
H x W x 3
Features:
L x D
h0
a1
z1
Weighted
combination
of features
y1
h1
First word
Distribution over
L locations
a2 d1
h2
a3 d2
z2 y2
Weighted
features: D
Distribution
over vocab
Xu et al, “Show, Attend and Tell: Neural
Image Caption Generation with Visual
Attention”, ICML 2015
Image Captioning with Attention
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201785
Soft attention
Hard attention
Image Captioning with Attention
Xu et al, “Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015
Figure copyright Kelvin Xu, Jimmy Lei Ba, Jamie Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Benchio, 2015. Reproduced with permission.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201786
Image Captioning with Attention
Xu et al, “Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015
Figure copyright Kelvin Xu, Jimmy Lei Ba, Jamie Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Benchio, 2015. Reproduced with permission.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201787
Visual Question Answering
Agrawal et al, “VQA: Visual Question Answering”, ICCV 2015
Zhu et al, “Visual 7W: Grounded Question Answering in Images”, CVPR 2016
Figure from Zhu et al, copyright IEEE 2016. Reproduced for educational purposes.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201788
Zhu et al, “Visual 7W: Grounded Question Answering in Images”, CVPR 2016
Figures from Zhu et al, copyright IEEE 2016. Reproduced for educational purposes.
Visual Question Answering: RNNs with Attention
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201789
time
depth
Multilayer RNNs
LSTM:
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201790
ht-1
xt
W
stack
tanh
ht
Vanilla RNN Gradient Flow
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201791
ht-1
xt
W
stack
tanh
ht
Vanilla RNN Gradient Flow
Backpropagation from ht
to ht-1
multiplies by W
(actually Whh
T
)
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201792
Vanilla RNN Gradient Flow
h0
h1
h2
h3
h4
x1
x2
x3
x4
Computing gradient
of h0
involves many
factors of W
(and repeated tanh)
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201793
Vanilla RNN Gradient Flow
h0
h1
h2
h3
h4
x1
x2
x3
x4
Largest singular value > 1:
Exploding gradients
Largest singular value < 1:
Vanishing gradients
Computing gradient
of h0
involves many
factors of W
(and repeated tanh)
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201794
Vanilla RNN Gradient Flow
h0
h1
h2
h3
h4
x1
x2
x3
x4
Largest singular value > 1:
Exploding gradients
Largest singular value < 1:
Vanishing gradients
Gradient clipping: Scale
gradient if its norm is too bigComputing gradient
of h0
involves many
factors of W
(and repeated tanh)
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201795
Vanilla RNN Gradient Flow
h0
h1
h2
h3
h4
x1
x2
x3
x4
Computing gradient
of h0
involves many
factors of W
(and repeated tanh)
Largest singular value > 1:
Exploding gradients
Largest singular value < 1:
Vanishing gradients
Change RNN architecture
Bengio et al, “Learning long-term dependencies with gradient descent
is difficult”, IEEE Transactions on Neural Networks, 1994
Pascanu et al, “On the difficulty of training recurrent neural networks”,
ICML 2013
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201796
Long Short Term Memory (LSTM)
Hochreiter and Schmidhuber, “Long Short Term Memory”, Neural Computation
1997
Vanilla RNN LSTM
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201797
Long Short Term Memory (LSTM)
[Hochreiter et al., 1997]
x
h
vector from
before (h)
W
i
f
o
g
vector from
below (x)
sigmoid
sigmoid
tanh
sigmoid
4h x 2h 4h 4*h
f: Forget gate, Whether to erase cell
i: Input gate, whether to write to cell
g: Gate gate (?), How much to write to cell
o: Output gate, How much to reveal cell
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
☉
98
ct-1
ht-1
xt
f
i
g
o
W ☉
+ ct
tanh
☉ ht
Long Short Term Memory (LSTM)
[Hochreiter et al., 1997]
stack
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017
☉
99
ct-1
ht-1
xt
f
i
g
o
W ☉
+ ct
tanh
☉ ht
Long Short Term Memory (LSTM): Gradient Flow
[Hochreiter et al., 1997]
stack
Backpropagation from ct
to
ct-1
only elementwise
multiplication by f, no matrix
multiply by W
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017100
Long Short Term Memory (LSTM): Gradient Flow
[Hochreiter et al., 1997]
c0
c1
c2
c3
Uninterrupted gradient flow!
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017101
Long Short Term Memory (LSTM): Gradient Flow
[Hochreiter et al., 1997]
c0
c1
c2
c3
Uninterrupted gradient flow!
Input
Softmax
3x3conv,64
7x7conv,64/2
FC1000
Pool
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,128
3x3conv,128/2
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
...
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
Pool
Similar to ResNet!
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017102
Long Short Term Memory (LSTM): Gradient Flow
[Hochreiter et al., 1997]
c0
c1
c2
c3
Uninterrupted gradient flow!
Input
Softmax
3x3conv,64
7x7conv,64/2
FC1000
Pool
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,128
3x3conv,128/2
3x3conv,128
3x3conv,128
3x3conv,128
3x3conv,128
...
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
3x3conv,64
Pool
Similar to ResNet!
In between:
Highway Networks
Srivastava et al, “Highway Networks”,
ICML DL Workshop 2015
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017103
Other RNN Variants
[LSTM: A Search Space Odyssey,
Greff et al., 2015]
[An Empirical Exploration of
Recurrent Network Architectures,
Jozefowicz et al., 2015]
GRU [Learning phrase representations using rnn
encoder-decoder for statistical machine translation,
Cho et al. 2014]
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017104
Summary
- RNNs allow a lot of flexibility in architecture design
- Vanilla RNNs are simple but don’t work very well
- Common to use LSTM or GRU: their additive interactions
improve gradient flow
- Backward flow of gradients in RNN can explode or vanish.
Exploding is controlled with gradient clipping. Vanishing is
controlled with additive interactions (LSTM)
- Better/simpler architectures are a hot topic of current research
- Better understanding (both theoretical and empirical) is needed.
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017105
Next time: Midterm!
Then Detection and Segmentation

Contenu connexe

Tendances

bag-of-words models
bag-of-words models bag-of-words models
bag-of-words models Xiaotao Zou
 
実践 Amazon Mechanical Turk ※下記の注意点をご覧ください(回答の質の悪化・報酬額の相場の変化・仕様変更)
実践 Amazon Mechanical Turk ※下記の注意点をご覧ください(回答の質の悪化・報酬額の相場の変化・仕様変更)実践 Amazon Mechanical Turk ※下記の注意点をご覧ください(回答の質の悪化・報酬額の相場の変化・仕様変更)
実践 Amazon Mechanical Turk ※下記の注意点をご覧ください(回答の質の悪化・報酬額の相場の変化・仕様変更)Ayako_Hasegawa
 
텐서플로우로 배우는 딥러닝
텐서플로우로 배우는 딥러닝텐서플로우로 배우는 딥러닝
텐서플로우로 배우는 딥러닝찬웅 주
 
卒業研究発表のスライド
卒業研究発表のスライド卒業研究発表のスライド
卒業研究発表のスライドTakebuchi Eiichi
 
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화NAVER D2
 
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Universitat Politècnica de Catalunya
 
Singular value decomposition (SVD)
Singular value decomposition (SVD)Singular value decomposition (SVD)
Singular value decomposition (SVD)Luis Serrano
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningMohamed Loey
 
再帰型ニューラルネット in 機械学習プロフェッショナルシリーズ輪読会
再帰型ニューラルネット in 機械学習プロフェッショナルシリーズ輪読会再帰型ニューラルネット in 機械学習プロフェッショナルシリーズ輪読会
再帰型ニューラルネット in 機械学習プロフェッショナルシリーズ輪読会Shotaro Sano
 
実環境音響信号処理における収音技術
実環境音響信号処理における収音技術実環境音響信号処理における収音技術
実環境音響信号処理における収音技術Yuma Koizumi
 
Semi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learningSemi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learningYusuke Uchida
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You NeedDaiki Tanaka
 
Stable Diffusion path
Stable Diffusion pathStable Diffusion path
Stable Diffusion pathVitaly Bondar
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practicesLior Sidi
 
(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot LearningMasahiro Suzuki
 

Tendances (20)

bag-of-words models
bag-of-words models bag-of-words models
bag-of-words models
 
実践 Amazon Mechanical Turk ※下記の注意点をご覧ください(回答の質の悪化・報酬額の相場の変化・仕様変更)
実践 Amazon Mechanical Turk ※下記の注意点をご覧ください(回答の質の悪化・報酬額の相場の変化・仕様変更)実践 Amazon Mechanical Turk ※下記の注意点をご覧ください(回答の質の悪化・報酬額の相場の変化・仕様変更)
実践 Amazon Mechanical Turk ※下記の注意点をご覧ください(回答の質の悪化・報酬額の相場の変化・仕様変更)
 
텐서플로우로 배우는 딥러닝
텐서플로우로 배우는 딥러닝텐서플로우로 배우는 딥러닝
텐서플로우로 배우는 딥러닝
 
卒業研究発表のスライド
卒業研究発表のスライド卒業研究発表のスライド
卒業研究発表のスライド
 
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화
 
Word2 vec
Word2 vecWord2 vec
Word2 vec
 
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
 
Singular value decomposition (SVD)
Singular value decomposition (SVD)Singular value decomposition (SVD)
Singular value decomposition (SVD)
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
 
再帰型ニューラルネット in 機械学習プロフェッショナルシリーズ輪読会
再帰型ニューラルネット in 機械学習プロフェッショナルシリーズ輪読会再帰型ニューラルネット in 機械学習プロフェッショナルシリーズ輪読会
再帰型ニューラルネット in 機械学習プロフェッショナルシリーズ輪読会
 
Recurrent neural networks
Recurrent neural networksRecurrent neural networks
Recurrent neural networks
 
実環境音響信号処理における収音技術
実環境音響信号処理における収音技術実環境音響信号処理における収音技術
実環境音響信号処理における収音技術
 
Introduction to OpenCV
Introduction to OpenCVIntroduction to OpenCV
Introduction to OpenCV
 
Semi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learningSemi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learning
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
Stable Diffusion path
Stable Diffusion pathStable Diffusion path
Stable Diffusion path
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practices
 
(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning
 
Cv20160205
Cv20160205Cv20160205
Cv20160205
 
Densenet CNN
Densenet CNNDensenet CNN
Densenet CNN
 

En vedette

Deep Learning for Chatbot (4/4)
Deep Learning for Chatbot (4/4)Deep Learning for Chatbot (4/4)
Deep Learning for Chatbot (4/4)Jaemin Cho
 
Cs231n 2017 lecture11 Detection and Segmentation
Cs231n 2017 lecture11 Detection and SegmentationCs231n 2017 lecture11 Detection and Segmentation
Cs231n 2017 lecture11 Detection and SegmentationYanbin Kong
 
Technological Unemployment and the Robo-Economy
Technological Unemployment and the Robo-EconomyTechnological Unemployment and the Robo-Economy
Technological Unemployment and the Robo-EconomyMelanie Swan
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersRoelof Pieters
 
Cs231n 2017 lecture13 Generative Model
Cs231n 2017 lecture13 Generative ModelCs231n 2017 lecture13 Generative Model
Cs231n 2017 lecture13 Generative ModelYanbin Kong
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Association for Computational Linguistics
 
Deep Learning Explained
Deep Learning ExplainedDeep Learning Explained
Deep Learning ExplainedMelanie Swan
 
Hackathon 2014 NLP Hack
Hackathon 2014 NLP HackHackathon 2014 NLP Hack
Hackathon 2014 NLP HackRoelof Pieters
 
Chris Dyer - 2017 - CoNLL Invited Talk: Should Neural Network Architecture Re...
Chris Dyer - 2017 - CoNLL Invited Talk: Should Neural Network Architecture Re...Chris Dyer - 2017 - CoNLL Invited Talk: Should Neural Network Architecture Re...
Chris Dyer - 2017 - CoNLL Invited Talk: Should Neural Network Architecture Re...Association for Computational Linguistics
 
Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !LINAGORA
 
Exploring Session Context using Distributed Representations of Queries and Re...
Exploring Session Context using Distributed Representations of Queries and Re...Exploring Session Context using Distributed Representations of Queries and Re...
Exploring Session Context using Distributed Representations of Queries and Re...Bhaskar Mitra
 
Deep Learning for Chatbot (1/4)
Deep Learning for Chatbot (1/4)Deep Learning for Chatbot (1/4)
Deep Learning for Chatbot (1/4)Jaemin Cho
 
Cs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and UnderstandingCs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and UnderstandingYanbin Kong
 
Deep Learning for Chatbot (3/4)
Deep Learning for Chatbot (3/4)Deep Learning for Chatbot (3/4)
Deep Learning for Chatbot (3/4)Jaemin Cho
 
Blockchain Smartnetworks: Bitcoin and Blockchain Explained
Blockchain Smartnetworks: Bitcoin and Blockchain ExplainedBlockchain Smartnetworks: Bitcoin and Blockchain Explained
Blockchain Smartnetworks: Bitcoin and Blockchain ExplainedMelanie Swan
 
Blockchain Economic Theory
Blockchain Economic TheoryBlockchain Economic Theory
Blockchain Economic TheoryMelanie Swan
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Association for Computational Linguistics
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageRoelof Pieters
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopAssociation for Computational Linguistics
 

En vedette (20)

Deep Learning for Chatbot (4/4)
Deep Learning for Chatbot (4/4)Deep Learning for Chatbot (4/4)
Deep Learning for Chatbot (4/4)
 
Cs231n 2017 lecture11 Detection and Segmentation
Cs231n 2017 lecture11 Detection and SegmentationCs231n 2017 lecture11 Detection and Segmentation
Cs231n 2017 lecture11 Detection and Segmentation
 
Care your Child
Care your ChildCare your Child
Care your Child
 
Technological Unemployment and the Robo-Economy
Technological Unemployment and the Robo-EconomyTechnological Unemployment and the Robo-Economy
Technological Unemployment and the Robo-Economy
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
 
Cs231n 2017 lecture13 Generative Model
Cs231n 2017 lecture13 Generative ModelCs231n 2017 lecture13 Generative Model
Cs231n 2017 lecture13 Generative Model
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
 
Deep Learning Explained
Deep Learning ExplainedDeep Learning Explained
Deep Learning Explained
 
Hackathon 2014 NLP Hack
Hackathon 2014 NLP HackHackathon 2014 NLP Hack
Hackathon 2014 NLP Hack
 
Chris Dyer - 2017 - CoNLL Invited Talk: Should Neural Network Architecture Re...
Chris Dyer - 2017 - CoNLL Invited Talk: Should Neural Network Architecture Re...Chris Dyer - 2017 - CoNLL Invited Talk: Should Neural Network Architecture Re...
Chris Dyer - 2017 - CoNLL Invited Talk: Should Neural Network Architecture Re...
 
Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !
 
Exploring Session Context using Distributed Representations of Queries and Re...
Exploring Session Context using Distributed Representations of Queries and Re...Exploring Session Context using Distributed Representations of Queries and Re...
Exploring Session Context using Distributed Representations of Queries and Re...
 
Deep Learning for Chatbot (1/4)
Deep Learning for Chatbot (1/4)Deep Learning for Chatbot (1/4)
Deep Learning for Chatbot (1/4)
 
Cs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and UnderstandingCs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and Understanding
 
Deep Learning for Chatbot (3/4)
Deep Learning for Chatbot (3/4)Deep Learning for Chatbot (3/4)
Deep Learning for Chatbot (3/4)
 
Blockchain Smartnetworks: Bitcoin and Blockchain Explained
Blockchain Smartnetworks: Bitcoin and Blockchain ExplainedBlockchain Smartnetworks: Bitcoin and Blockchain Explained
Blockchain Smartnetworks: Bitcoin and Blockchain Explained
 
Blockchain Economic Theory
Blockchain Economic TheoryBlockchain Economic Theory
Blockchain Economic Theory
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 

Plus de Yanbin Kong

[Public] gerrit concepts and workflows
[Public] gerrit   concepts and workflows[Public] gerrit   concepts and workflows
[Public] gerrit concepts and workflowsYanbin Kong
 
Emotion coaching introduction
Emotion coaching introductionEmotion coaching introduction
Emotion coaching introductionYanbin Kong
 
A New Golden Age for Computer Architecture
A New Golden Age for Computer ArchitectureA New Golden Age for Computer Architecture
A New Golden Age for Computer ArchitectureYanbin Kong
 
Cs231n 2017 lecture9 CNN Architecture
Cs231n 2017 lecture9 CNN ArchitectureCs231n 2017 lecture9 CNN Architecture
Cs231n 2017 lecture9 CNN ArchitectureYanbin Kong
 
iPhone5c的最后猜测
iPhone5c的最后猜测iPhone5c的最后猜测
iPhone5c的最后猜测Yanbin Kong
 
Mega guess on i phone5c
Mega guess on i phone5cMega guess on i phone5c
Mega guess on i phone5cYanbin Kong
 

Plus de Yanbin Kong (6)

[Public] gerrit concepts and workflows
[Public] gerrit   concepts and workflows[Public] gerrit   concepts and workflows
[Public] gerrit concepts and workflows
 
Emotion coaching introduction
Emotion coaching introductionEmotion coaching introduction
Emotion coaching introduction
 
A New Golden Age for Computer Architecture
A New Golden Age for Computer ArchitectureA New Golden Age for Computer Architecture
A New Golden Age for Computer Architecture
 
Cs231n 2017 lecture9 CNN Architecture
Cs231n 2017 lecture9 CNN ArchitectureCs231n 2017 lecture9 CNN Architecture
Cs231n 2017 lecture9 CNN Architecture
 
iPhone5c的最后猜测
iPhone5c的最后猜测iPhone5c的最后猜测
iPhone5c的最后猜测
 
Mega guess on i phone5c
Mega guess on i phone5cMega guess on i phone5c
Mega guess on i phone5c
 

Dernier

Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...amitlee9823
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 

Dernier (20)

Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 

Cs231n 2017 lecture10 Recurrent Neural Networks

  • 1. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20171 Lecture 10: Recurrent Neural Networks
  • 2. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20172 Administrative A1 grades will go out soon A2 is due today (11:59pm) Midterm is in-class on Tuesday! We will send out details on where to go soon
  • 3. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20173 Extra Credit: Train Game More details on Piazza by early next week
  • 4. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20174 Last Time: CNN Architectures AlexNet Figure copyright Kaiming He, 2016. Reproduced with permission.
  • 5. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20175 Last Time: CNN Architectures Figure copyright Kaiming He, 2016. Reproduced with permission. 3x3 conv, 128 Pool 3x3 conv, 64 3x3 conv, 64 Input 3x3 conv, 128 Pool 3x3 conv, 256 3x3 conv, 256 Pool 3x3 conv, 512 3x3 conv, 512 Pool 3x3 conv, 512 3x3 conv, 512 Pool FC 4096 FC 1000 Softmax FC 4096 3x3 conv, 512 3x3 conv, 512 Pool Input Pool Pool Pool Pool Softmax 3x3 conv, 512 3x3 conv, 512 3x3 conv, 256 3x3 conv, 256 3x3 conv, 128 3x3 conv, 128 3x3 conv, 64 3x3 conv, 64 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 FC 4096 FC 1000 FC 4096 VGG16 VGG19 GoogLeNet
  • 6. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20176 Last Time: CNN Architectures Figure copyright Kaiming He, 2016. Reproduced with permission. Input Softmax 3x3 conv, 64 7x7 conv, 64 / 2 FC 1000 Pool 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 128 3x3 conv, 128 / 2 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 ... 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 Pool relu Residual block conv conv X identity F(x) + x F(x) relu X
  • 7. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20177 Figures copyright Larsson et al., 2017. Reproduced with permission. Pool Conv Dense Block 1 Conv Input Conv Dense Block 2 Conv Pool Conv Dense Block 3 Softmax FC Pool Conv Conv 1x1 conv, 64 1x1 conv, 64 Input Concat Concat Concat Dense Block DenseNet FractalNet
  • 8. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20178 Last Time: CNN Architectures
  • 9. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20179 Last Time: CNN Architectures AlexNet and VGG have tons of parameters in the fully connected layers AlexNet: ~62M parameters FC6: 256x6x6 -> 4096: 38M params FC7: 4096 -> 4096: 17M params FC8: 4096 -> 1000: 4M params ~59M params in FC layers!
  • 10. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201710 Today: Recurrent Neural Networks
  • 11. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201711 Vanilla Neural Networks “Vanilla” Neural Network
  • 12. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201712 Recurrent Neural Networks: Process Sequences e.g. Image Captioning image -> sequence of words
  • 13. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201713 Recurrent Neural Networks: Process Sequences e.g. Sentiment Classification sequence of words -> sentiment
  • 14. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201714 Recurrent Neural Networks: Process Sequences e.g. Machine Translation seq of words -> seq of words
  • 15. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201715 Recurrent Neural Networks: Process Sequences e.g. Video classification on frame level
  • 16. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201716 Sequential Processing of Non-Sequence Data Ba, Mnih, and Kavukcuoglu, “Multiple Object Recognition with Visual Attention”, ICLR 2015. Gregor et al, “DRAW: A Recurrent Neural Network For Image Generation”, ICML 2015 Figure copyright Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra, 2015. Reproduced with permission. Classify images by taking a series of “glimpses”
  • 17. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201717 Sequential Processing of Non-Sequence Data Gregor et al, “DRAW: A Recurrent Neural Network For Image Generation”, ICML 2015 Figure copyright Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra, 2015. Reproduced with permission. Generate images one piece at a time!
  • 18. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201718 Recurrent Neural Network x RNN
  • 19. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201719 Recurrent Neural Network x RNN y usually want to predict a vector at some time steps
  • 20. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201720 Recurrent Neural Network x RNN y We can process a sequence of vectors x by applying a recurrence formula at every time step: new state old state input vector at some time step some function with parameters W
  • 21. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201721 Recurrent Neural Network x RNN y We can process a sequence of vectors x by applying a recurrence formula at every time step: Notice: the same function and the same set of parameters are used at every time step.
  • 22. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201722 (Vanilla) Recurrent Neural Network x RNN y The state consists of a single “hidden” vector h:
  • 23. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201723 h0 fW h1 x1 RNN: Computational Graph
  • 24. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201724 h0 fW h1 fW h2 x2 x1 RNN: Computational Graph
  • 25. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201725 h0 fW h1 fW h2 fW h3 x3 … x2 x1 RNN: Computational Graph hT
  • 26. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201726 h0 fW h1 fW h2 fW h3 x3 … x2 x1 W RNN: Computational Graph Re-use the same weight matrix at every time-step hT
  • 27. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201727 h0 fW h1 fW h2 fW h3 x3 yT … x2 x1 W RNN: Computational Graph: Many to Many hT y3 y2y1
  • 28. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201728 h0 fW h1 fW h2 fW h3 x3 yT … x2 x1 W RNN: Computational Graph: Many to Many hT y3 y2y1 L1 L2 L3 LT
  • 29. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201729 h0 fW h1 fW h2 fW h3 x3 yT … x2 x1 W RNN: Computational Graph: Many to Many hT y3 y2y1 L1 L2 L3 LT L
  • 30. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201730 h0 fW h1 fW h2 fW h3 x3 y … x2 x1 W RNN: Computational Graph: Many to One hT
  • 31. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201731 h0 fW h1 fW h2 fW h3 yT … x W RNN: Computational Graph: One to Many hT y3 y3y3
  • 32. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201732 Sequence to Sequence: Many-to-one + one-to-many h 0 fW h 1 fW h 2 fW h 3 x 3 … x 2 x 1 W 1 h T Many to one: Encode input sequence in a single vector
  • 33. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201733 Sequence to Sequence: Many-to-one + one-to-many h 0 fW h 1 fW h 2 fW h 3 x 3 … x 2 x 1 W 1 h T y 1 y 2 … Many to one: Encode input sequence in a single vector One to many: Produce output sequence from single input vector fW h 1 fW h 2 fW W 2
  • 34. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201734 Example: Character-level Language Model Vocabulary: [h,e,l,o] Example training sequence: “hello”
  • 35. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201735 Example: Character-level Language Model Vocabulary: [h,e,l,o] Example training sequence: “hello”
  • 36. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201736 Example: Character-level Language Model Vocabulary: [h,e,l,o] Example training sequence: “hello”
  • 37. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201737 Example: Character-level Language Model Sampling Vocabulary: [h,e,l,o] At test-time sample characters one at a time, feed back to model .03 .13 .00 .84 .25 .20 .05 .50 .11 .17 .68 .03 .11 .02 .08 .79 Softmax “e” “l” “l” “o” Sample
  • 38. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201738 .03 .13 .00 .84 .25 .20 .05 .50 .11 .17 .68 .03 .11 .02 .08 .79 Softmax “e” “l” “l” “o” SampleExample: Character-level Language Model Sampling Vocabulary: [h,e,l,o] At test-time sample characters one at a time, feed back to model
  • 39. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201739 .03 .13 .00 .84 .25 .20 .05 .50 .11 .17 .68 .03 .11 .02 .08 .79 Softmax “e” “l” “l” “o” SampleExample: Character-level Language Model Sampling Vocabulary: [h,e,l,o] At test-time sample characters one at a time, feed back to model
  • 40. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201740 .03 .13 .00 .84 .25 .20 .05 .50 .11 .17 .68 .03 .11 .02 .08 .79 Softmax “e” “l” “l” “o” SampleExample: Character-level Language Model Sampling Vocabulary: [h,e,l,o] At test-time sample characters one at a time, feed back to model
  • 41. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201741 Backpropagation through time Loss Forward through entire sequence to compute loss, then backward through entire sequence to compute gradient
  • 42. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201742 Truncated Backpropagation through time Loss Run forward and backward through chunks of the sequence instead of whole sequence
  • 43. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201743 Truncated Backpropagation through time Loss Carry hidden states forward in time forever, but only backpropagate for some smaller number of steps
  • 44. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201744 Truncated Backpropagation through time Loss
  • 45. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201745 min-char-rnn.py gist: 112 lines of Python (https://gist.github.com/karpathy/d4dee 566867f8291f086)
  • 46. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201746 x RNN y
  • 47. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201747 train more train more train more at first:
  • 48. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201748
  • 49. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201749 The Stacks Project: open source algebraic geometry textbook Latex source http://stacks.math.columbia.edu/ The stacks project is licensed under the GNU Free Documentation License
  • 50. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201750
  • 51. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201751
  • 52. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201752
  • 53. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201753 Generated C code
  • 54. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201754
  • 55. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201755
  • 56. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201756 Searching for interpretable cells Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016
  • 57. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201757 Searching for interpretable cells Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016 Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission
  • 58. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201758 Searching for interpretable cells Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016 Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission quote detection cell
  • 59. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201759 Searching for interpretable cells Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016 Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission line length tracking cell
  • 60. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201760 Searching for interpretable cells Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016 Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission if statement cell
  • 61. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201761 Searching for interpretable cells Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016 Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission quote/comment cell
  • 62. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201762 Searching for interpretable cells Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016 Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission code depth cell
  • 63. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201763 Explain Images with Multimodal Recurrent Neural Networks, Mao et al. Deep Visual-Semantic Alignments for Generating Image Descriptions, Karpathy and Fei-Fei Show and Tell: A Neural Image Caption Generator, Vinyals et al. Long-term Recurrent Convolutional Networks for Visual Recognition and Description, Donahue et al. Learning a Recurrent Visual Representation for Image Caption Generation, Chen and Zitnick Image Captioning Figure from Karpathy et a, “Deep Visual-Semantic Alignments for Generating Image Descriptions”, CVPR 2015; figure copyright IEEE, 2015. Reproduced for educational purposes.
  • 64. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201764 Convolutional Neural Network Recurrent Neural Network
  • 65. test image This image is CC0 public domain
  • 69. h0 x0 <STA RT> y0 <START> test image before: h = tanh(Wxh * x + Whh * h) now: h = tanh(Wxh * x + Whh * h + Wih * v) v Wih
  • 75. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201775 A cat sitting on a suitcase on the floor A cat is sitting on a tree branch A dog is running in the grass with a frisbee A white teddy bear sitting in the grass Two people walking on the beach with surfboards Two giraffes standing in a grassy field A man riding a dirt bike on a dirt track Image Captioning: Example Results A tennis player in action on the court Captions generated using neuraltalk2 All images are CC0 Public domain: cat suitcase, cat tree, dog, bear, surfers, tennis, giraffe, motorcycle
  • 76. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201776 Image Captioning: Failure Cases A woman is holding a cat in her hand A woman standing on a beach holding a surfboard A person holding a computer mouse on a desk A bird is perched on a tree branch A man in a baseball uniform throwing a ball Captions generated using neuraltalk2 All images are CC0 Public domain: fur coat, handstand, spider web, baseball
  • 77. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201777 Image Captioning with Attention Xu et al, “Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015 Figure copyright Kelvin Xu, Jimmy Lei Ba, Jamie Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Benchio, 2015. Reproduced with permission. RNN focuses its attention at a different spatial location when generating each word
  • 78. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201778 Image Captioning with Attention CNN Image: H x W x 3 Features: L x D h0 Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015
  • 79. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201779 CNN Image: H x W x 3 Features: L x D h0 a1 Distribution over L locations Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015 Image Captioning with Attention
  • 80. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201780 CNN Image: H x W x 3 Features: L x D h0 a1 Weighted combination of features Distribution over L locations z1 Weighted features: D Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015 Image Captioning with Attention
  • 81. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201781 CNN Image: H x W x 3 Features: L x D h0 a1 z1 Weighted combination of features h1 Distribution over L locations Weighted features: D y1 First wordXu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015 Image Captioning with Attention
  • 82. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201782 CNN Image: H x W x 3 Features: L x D h0 a1 z1 Weighted combination of features y1 h1 First word Distribution over L locations a2 d1 Weighted features: D Distribution over vocab Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015 Image Captioning with Attention
  • 83. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201783 CNN Image: H x W x 3 Features: L x D h0 a1 z1 Weighted combination of features y1 h1 First word Distribution over L locations a2 d1 h2 z2 y2 Weighted features: D Distribution over vocab Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015 Image Captioning with Attention
  • 84. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201784 CNN Image: H x W x 3 Features: L x D h0 a1 z1 Weighted combination of features y1 h1 First word Distribution over L locations a2 d1 h2 a3 d2 z2 y2 Weighted features: D Distribution over vocab Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015 Image Captioning with Attention
  • 85. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201785 Soft attention Hard attention Image Captioning with Attention Xu et al, “Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015 Figure copyright Kelvin Xu, Jimmy Lei Ba, Jamie Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Benchio, 2015. Reproduced with permission.
  • 86. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201786 Image Captioning with Attention Xu et al, “Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015 Figure copyright Kelvin Xu, Jimmy Lei Ba, Jamie Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Benchio, 2015. Reproduced with permission.
  • 87. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201787 Visual Question Answering Agrawal et al, “VQA: Visual Question Answering”, ICCV 2015 Zhu et al, “Visual 7W: Grounded Question Answering in Images”, CVPR 2016 Figure from Zhu et al, copyright IEEE 2016. Reproduced for educational purposes.
  • 88. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201788 Zhu et al, “Visual 7W: Grounded Question Answering in Images”, CVPR 2016 Figures from Zhu et al, copyright IEEE 2016. Reproduced for educational purposes. Visual Question Answering: RNNs with Attention
  • 89. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201789 time depth Multilayer RNNs LSTM:
  • 90. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201790 ht-1 xt W stack tanh ht Vanilla RNN Gradient Flow Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994 Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013
  • 91. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201791 ht-1 xt W stack tanh ht Vanilla RNN Gradient Flow Backpropagation from ht to ht-1 multiplies by W (actually Whh T ) Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994 Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013
  • 92. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201792 Vanilla RNN Gradient Flow h0 h1 h2 h3 h4 x1 x2 x3 x4 Computing gradient of h0 involves many factors of W (and repeated tanh) Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994 Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013
  • 93. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201793 Vanilla RNN Gradient Flow h0 h1 h2 h3 h4 x1 x2 x3 x4 Largest singular value > 1: Exploding gradients Largest singular value < 1: Vanishing gradients Computing gradient of h0 involves many factors of W (and repeated tanh) Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994 Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013
  • 94. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201794 Vanilla RNN Gradient Flow h0 h1 h2 h3 h4 x1 x2 x3 x4 Largest singular value > 1: Exploding gradients Largest singular value < 1: Vanishing gradients Gradient clipping: Scale gradient if its norm is too bigComputing gradient of h0 involves many factors of W (and repeated tanh) Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994 Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013
  • 95. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201795 Vanilla RNN Gradient Flow h0 h1 h2 h3 h4 x1 x2 x3 x4 Computing gradient of h0 involves many factors of W (and repeated tanh) Largest singular value > 1: Exploding gradients Largest singular value < 1: Vanishing gradients Change RNN architecture Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994 Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013
  • 96. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201796 Long Short Term Memory (LSTM) Hochreiter and Schmidhuber, “Long Short Term Memory”, Neural Computation 1997 Vanilla RNN LSTM
  • 97. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201797 Long Short Term Memory (LSTM) [Hochreiter et al., 1997] x h vector from before (h) W i f o g vector from below (x) sigmoid sigmoid tanh sigmoid 4h x 2h 4h 4*h f: Forget gate, Whether to erase cell i: Input gate, whether to write to cell g: Gate gate (?), How much to write to cell o: Output gate, How much to reveal cell
  • 98. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 ☉ 98 ct-1 ht-1 xt f i g o W ☉ + ct tanh ☉ ht Long Short Term Memory (LSTM) [Hochreiter et al., 1997] stack
  • 99. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017 ☉ 99 ct-1 ht-1 xt f i g o W ☉ + ct tanh ☉ ht Long Short Term Memory (LSTM): Gradient Flow [Hochreiter et al., 1997] stack Backpropagation from ct to ct-1 only elementwise multiplication by f, no matrix multiply by W
  • 100. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017100 Long Short Term Memory (LSTM): Gradient Flow [Hochreiter et al., 1997] c0 c1 c2 c3 Uninterrupted gradient flow!
  • 101. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017101 Long Short Term Memory (LSTM): Gradient Flow [Hochreiter et al., 1997] c0 c1 c2 c3 Uninterrupted gradient flow! Input Softmax 3x3conv,64 7x7conv,64/2 FC1000 Pool 3x3conv,64 3x3conv,64 3x3conv,64 3x3conv,64 3x3conv,64 3x3conv,128 3x3conv,128/2 3x3conv,128 3x3conv,128 3x3conv,128 3x3conv,128 ... 3x3conv,64 3x3conv,64 3x3conv,64 3x3conv,64 3x3conv,64 3x3conv,64 Pool Similar to ResNet!
  • 102. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017102 Long Short Term Memory (LSTM): Gradient Flow [Hochreiter et al., 1997] c0 c1 c2 c3 Uninterrupted gradient flow! Input Softmax 3x3conv,64 7x7conv,64/2 FC1000 Pool 3x3conv,64 3x3conv,64 3x3conv,64 3x3conv,64 3x3conv,64 3x3conv,128 3x3conv,128/2 3x3conv,128 3x3conv,128 3x3conv,128 3x3conv,128 ... 3x3conv,64 3x3conv,64 3x3conv,64 3x3conv,64 3x3conv,64 3x3conv,64 Pool Similar to ResNet! In between: Highway Networks Srivastava et al, “Highway Networks”, ICML DL Workshop 2015
  • 103. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017103 Other RNN Variants [LSTM: A Search Space Odyssey, Greff et al., 2015] [An Empirical Exploration of Recurrent Network Architectures, Jozefowicz et al., 2015] GRU [Learning phrase representations using rnn encoder-decoder for statistical machine translation, Cho et al. 2014]
  • 104. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017104 Summary - RNNs allow a lot of flexibility in architecture design - Vanilla RNNs are simple but don’t work very well - Common to use LSTM or GRU: their additive interactions improve gradient flow - Backward flow of gradients in RNN can explode or vanish. Exploding is controlled with gradient clipping. Vanishing is controlled with additive interactions (LSTM) - Better/simpler architectures are a hot topic of current research - Better understanding (both theoretical and empirical) is needed.
  • 105. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017105 Next time: Midterm! Then Detection and Segmentation