Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Xavier Giro-i-Nieto
xavier.giro@upc.edu
Associate Professor
Universitat Politecnica de Catalunya
Technical University of C...
2
Acknowledgments
Santiago Pascual
3
Outline
1. The importance of context
2. Where is the memory?
3. Vanilla RNN
4. Problems
5. Gating methodology
a. LSTM
b....
4
The importance of context
● Recall the 5th digit of your phone number
● Sing your favourite song beginning at third sent...
Recall from Day 1 Lecture 4…
5
FeedFORWARD
Every y/hi is computed from the
sequence of forward activations out of
input x.
6
Where is the Memory?
If we have a sequence of samples...
predict sample x[t+1] knowing previous values {x[t], x[t-1], x[...
7
Where is the Memory?
Feed Forward approach:
● static window of size L
● slide the window time-step wise
...
...
...
x[t+...
8
Where is the Memory?
Feed Forward approach:
● static window of size L
● slide the window time-step wise
...
...
...
x[t+...
9
Where is the Memory?
Feed Forward approach:
● static window of size L
● slide the window time-step wise
x[t+3]
L
...
......
10
Where is the Memory?
...
...
...
x1, x2, …, xL
Problems for the feed forward + static window approach:
● What’s the mat...
11
Recurrent Neural Network
Solution: Build specific connections capturing the temporal evolution → Shared weights
in time...
12
Solution: Build specific connections capturing the temporal evolution → Shared weights
in time
● Give volatile memory t...
Recurrent Neural Network
time
time
Rotation
90o
Front View Side View
Rotation
90o
Slide credit: Xavi Giro
Recurrent Neural Network
Hence we have two data flows: Forward in layers + time propagation
BEWARE: We have extra depth no...
Recurrent Neural Network
Hence we have two data flows: Forward in layers + time propagation
Recurrent Neural Network
Hence we have two data flows: Forward in layers + time propagation
○ Last time-step includes the ...
Recurrent Neural Network
Hence we have two data flows: Forward in layers + time propagation
○ Last time-step includes the ...
Recurrent Neural Network
Back Propagation Through Time (BPTT): The training method has to take into account the
time opera...
Recurrent Neural Network
Main problems:
● Long-term memory (remembering quite far time-steps) vanishes quickly because of
...
20
Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9, no.
8 (1997): 1735-1780.
Long...
21
Long Short-Term Memory (LSTM)
The New York Times, “When A.I. Matures, It May Call Jürgen Schmidhuber ‘Dad’”
(November 2...
22
Long Short-Term Memory (LSTM)
Jürgen
Schmidhuber @
NIPS 2016
Barcelona
23
Long Short-Term Memory (LSTM)
Jürgen Schmidhuber @
NIPS 2016 Barcelona
24
Long Short-Term Memory (LSTM)
Jürgen Schmidhuber @
NIPS 2016 Barcelona
Gating method
Solutions:
1. Change the way in which past information is kept → create the notion of cell state, a
memory u...
26
Long Short-Term Memory (LSTM)
Three gates are governed by sigmoid units (btw [0,1]) define
the control of in & out info...
27
Long Short-Term Memory (LSTM)
Forget Gate:
Concatenate
Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / ...
28
Long Short-Term Memory (LSTM)
Input Gate Layer
New contribution to cell state
Classic neuron
Figure: Cristopher Olah, “...
29
Long Short-Term Memory (LSTM)
Update Cell State (memory):
Figure: Cristopher Olah, “Understanding LSTM Networks” (2015)...
30
Long Short-Term Memory (LSTM)
Output Gate Layer
Output to next layer
Figure: Cristopher Olah, “Understanding LSTM Netwo...
31
Long Short-Term Memory (LSTM)
Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes
Long Short Term Memory (LSTM) cell
An LSTM cell is defined by two groups of neurons plus the cell state (memory unit):
1. ...
Long Short Term Memory (LSTM) cell
An LSTM cell is defined by two groups of neurons plus the cell state (memory unit):
1. ...
34
Gated Recurrent Unit (GRU)
Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Hol...
35
Gated Recurrent Unit (GRU)
Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Hol...
36Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes
Gated Recurrent Unit (GRU)
37
Thanks ! Q&A ?
Follow me at
https://imatge.upc.edu/web/people/xavier-giro
@DocXavi
/ProfessorXavi
Prochain SlideShare
Chargement dans…5
×

Recurrent Neural Networks (D2L8 Insight@DCU Machine Learning Workshop 2017)

1 094 vues

Publié le

https://telecombcn-dl.github.io/dlmm-2017-dcu/

Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.

Publié dans : Données & analyses
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Recurrent Neural Networks (D2L8 Insight@DCU Machine Learning Workshop 2017)

  1. 1. Xavier Giro-i-Nieto xavier.giro@upc.edu Associate Professor Universitat Politecnica de Catalunya Technical University of Catalonia Recurrent Neural Networks #InsightDL2017
  2. 2. 2 Acknowledgments Santiago Pascual
  3. 3. 3 Outline 1. The importance of context 2. Where is the memory? 3. Vanilla RNN 4. Problems 5. Gating methodology a. LSTM b. GRU
  4. 4. 4 The importance of context ● Recall the 5th digit of your phone number ● Sing your favourite song beginning at third sentence ● Recall 10th character of the alphabet Probably you went straight from the beginning of the stream in each case… because in sequences order matters! Idea: retain the information preserving the importance of order
  5. 5. Recall from Day 1 Lecture 4… 5 FeedFORWARD Every y/hi is computed from the sequence of forward activations out of input x.
  6. 6. 6 Where is the Memory? If we have a sequence of samples... predict sample x[t+1] knowing previous values {x[t], x[t-1], x[t-2], …, x[t-τ]}
  7. 7. 7 Where is the Memory? Feed Forward approach: ● static window of size L ● slide the window time-step wise ... ... ... x[t+1] x[t-L], …, x[t-1], x[t] x[t+1] L
  8. 8. 8 Where is the Memory? Feed Forward approach: ● static window of size L ● slide the window time-step wise ... ... ... x[t+2] x[t-L+1], …, x[t], x[t+1] ... ... ... x[t+1] x[t-L], …, x[t-1], x[t] x[t+2] L
  9. 9. 9 Where is the Memory? Feed Forward approach: ● static window of size L ● slide the window time-step wise x[t+3] L ... ... ... x[t+3] x[t-L+2], …, x[t+1], x[t+2] ... ... ... x[t+2] x[t-L+1], …, x[t], x[t+1] ... ... ... x[t+1] x[t-L], …, x[t-1], x[t]
  10. 10. 10 Where is the Memory? ... ... ... x1, x2, …, xL Problems for the feed forward + static window approach: ● What’s the matter increasing L? → Fast growth of num of parameters! ● Decisions are independent between time-steps! ○ The network doesn’t care about what happened at previous time-step, only present window matters → doesn’t look good ● Cumbersome padding when there are not enough samples to fill L size ○ Can’t work with variable sequence lengths x1, x2, …, xL, …, x2L ... ... x1, x2, …, xL, …, x2L, …, x3L ... ... ... ...
  11. 11. 11 Recurrent Neural Network Solution: Build specific connections capturing the temporal evolution → Shared weights in time ● Give volatile memory to the network Feed-Forward Fully-Connected
  12. 12. 12 Solution: Build specific connections capturing the temporal evolution → Shared weights in time ● Give volatile memory to the network Feed-Forward Fully-ConnectedFully connected in time: recurrent matrix Recurrent Neural Network
  13. 13. Recurrent Neural Network time time Rotation 90o Front View Side View Rotation 90o Slide credit: Xavi Giro
  14. 14. Recurrent Neural Network Hence we have two data flows: Forward in layers + time propagation BEWARE: We have extra depth now! Every time-step is an extra level of depth (as a deeper stack of layers in a feed-forward fashion!)
  15. 15. Recurrent Neural Network Hence we have two data flows: Forward in layers + time propagation
  16. 16. Recurrent Neural Network Hence we have two data flows: Forward in layers + time propagation ○ Last time-step includes the context of our decisions recursively
  17. 17. Recurrent Neural Network Hence we have two data flows: Forward in layers + time propagation ○ Last time-step includes the context of our decisions recursively
  18. 18. Recurrent Neural Network Back Propagation Through Time (BPTT): The training method has to take into account the time operations → a cost function E is defined to train our RNN, and in this case the total error at the output of the network is the sum of the errors at each time-step: T: max amount of time-steps to do back-prop. In Keras this is specified when defining the “input shape” to the RNN layer, by means of: (batch size, sequence length (T), input_dim)
  19. 19. Recurrent Neural Network Main problems: ● Long-term memory (remembering quite far time-steps) vanishes quickly because of the recursive operation with U ● During training gradients explode/vanish easily because of depth-in-time → Exploding/Vanishing gradients!
  20. 20. 20 Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9, no. 8 (1997): 1735-1780. Long Short-Term Memory (LSTM)
  21. 21. 21 Long Short-Term Memory (LSTM) The New York Times, “When A.I. Matures, It May Call Jürgen Schmidhuber ‘Dad’” (November 2016)
  22. 22. 22 Long Short-Term Memory (LSTM) Jürgen Schmidhuber @ NIPS 2016 Barcelona
  23. 23. 23 Long Short-Term Memory (LSTM) Jürgen Schmidhuber @ NIPS 2016 Barcelona
  24. 24. 24 Long Short-Term Memory (LSTM) Jürgen Schmidhuber @ NIPS 2016 Barcelona
  25. 25. Gating method Solutions: 1. Change the way in which past information is kept → create the notion of cell state, a memory unit that keeps long-term information in a safer way by protecting it from recursive operations 2. Make every RNN unit able to forget whatever may not be useful anymore by clearing that info from the cell state (optimized clearing mechanism) 3. Make every RNN unit able to decide whether the current time-step information matters or not, to accept or discard (optimized reading mechanism) 4. Make every RNN unit able to output the decisions whenever it is ready to do so (optimized output mechanism)
  26. 26. 26 Long Short-Term Memory (LSTM) Three gates are governed by sigmoid units (btw [0,1]) define the control of in & out information.. Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) slide credit: Xavi Giro
  27. 27. 27 Long Short-Term Memory (LSTM) Forget Gate: Concatenate Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes Make every RNN unit able to forget whatever may not be useful anymore by clearing that info from the cell state (optimized clearing mechanism)
  28. 28. 28 Long Short-Term Memory (LSTM) Input Gate Layer New contribution to cell state Classic neuron Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes Make every RNN unit able to decide whether the current time-step information matters or not, to accept or discard (optimized reading mechanism)
  29. 29. 29 Long Short-Term Memory (LSTM) Update Cell State (memory): Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes Make every RNN unit able to decide whether the current time-step information matters or not, to accept or discard (optimized reading mechanism)
  30. 30. 30 Long Short-Term Memory (LSTM) Output Gate Layer Output to next layer Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes Make every RNN unit able to output the decisions whenever it is ready to do so (optimized output mechanism)
  31. 31. 31 Long Short-Term Memory (LSTM) Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes
  32. 32. Long Short Term Memory (LSTM) cell An LSTM cell is defined by two groups of neurons plus the cell state (memory unit): 1. Gates 2. Activation units 3. Cell state Computation Flow
  33. 33. Long Short Term Memory (LSTM) cell An LSTM cell is defined by two groups of neurons plus the cell state (memory unit): 1. Gates 2. Activation units 3. Cell state 3 gates + input activation
  34. 34. 34 Gated Recurrent Unit (GRU) Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." AMNLP 2014. Similar performance as LSTM with less computation. slide credit: Xavi Giro
  35. 35. 35 Gated Recurrent Unit (GRU) Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." AMNLP 2014. Similar performance as LSTM with less computation. slide credit: Xavi Giro
  36. 36. 36Figure: Cristopher Olah, “Understanding LSTM Networks” (2015) / Slide: Alberto Montes Gated Recurrent Unit (GRU)
  37. 37. 37 Thanks ! Q&A ? Follow me at https://imatge.upc.edu/web/people/xavier-giro @DocXavi /ProfessorXavi

×