Lecture conducted by me on Deep Learning concepts and applications. Discussed FNNs, CNNs, Simple RNNs and LSTM Networks in detail. Finally conducted a hands-on session on deep-learning using Keras and scikit-learn.
4. Artificial Neural Network
• Computational model based on the structure and functions of
biological neural networks
The structure of a single artificial neuronThe structure of a basic biological neuron
4
5. A Neuron - Function
• Receiving information: the processing unit obtains the information as
input x1,x2,....,xn.
• Weighting: each input is weighted
by its corresponding weights
denoted as w0,w1,w2,....,wn.
• Activation: an activation function
f is applied to the sum of all the
weighted inputs z.
• Output: an output is y generated
depending on z.
The structure of a single artificial neuron
5
10. Issues with Hand
Engineered Features
• Most critical for accuracy
• Most time-consuming in development
• What is the best feature???
• What is next?? Keep on crafting better
features?
• Let’s learn feature representation directly
from data.
10
11. Learning Features and classifier
together
• A non-linear mapping that takes raw
pixels directly to labels
• How to build?
• By combining simple building blocks (i.e.
layers in Neural Network)
Hmmm…
Which is
better?
Option 2 is better
Option 1 Option 2
11
12. Intuition behind Deep Neural Nets
• Each layer will have parameters subject to learning
• Composition makes a highly non-linear system
• In case of classification:
• Final layer outputs a probability
distribution of categories.
Final Layer
A Layer
12
13. Training a Deep Neural Network
• Compute loss on small batches(Forward Propagation)
• Compute Gradient w.r.t. parameters
• Use gradient to update parameters
𝑦1
𝑋
𝑦
Error
Number of Hidden
Units
Number of Hidden
Layers
Type of Layer
Loss Function
13
17. Reduce connections to local regions
Example: 1000 x 1000 image
1 M hidden units
Filter size: 10 * 10
10 M parameters
17
18. Reuse the same kernel everywhere
Why?
Because interesting
features (edges) can
happen at anywhere in
the image
Share the same parameters across
different locations
Convolution with learned kernels
18
20. Handling Multiple Channels
• Image may contain
multiple channels
• Eg: 3 channel (R, G, B)
image
• 3 separate k by k filter
is applied to each
channel
20
21. Translation Invariance
Assume we are going to make an Eye detector
Problem: How to make the detection
robust to exact Eye location?
21
22. Translation Invariance
Solution: Use pooling (max / average)
on the filter responses
• Provides robustness to exact spatial location of
features
• Also sub-samples the image allowing next layer
to look @ larger spatial regions
22
23. Summary of Complete CNN
• Doing all of this consists one layer.
• Pooling and normalization is optional
• Stack them up and train just like multilayer
neural nets
• Multiple Conv Layers can be used to learn high
level features
• Final layer is usually fully connected 𝑛𝑒𝑢𝑟𝑎𝑙 𝑛𝑒𝑡
𝑤𝑖𝑡ℎ 𝑜𝑢𝑡𝑝𝑢𝑡 𝑠𝑖𝑧𝑒 == 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠
23
24. Recurrent neural network (RNN)
• Considers sequence
• Used in Forecasting
• Applications
• Language Modelling
• Machine Translation
• Conversation Bots
• Image Description
• Image Search
24
25. Structure of RNN
• Performs the same task for every element of a sequence, with the
output being depended on the previous computations
• Have a “memory” which captures information about what has been
calculated so far
An unrolled recurrent neural network. 25
26. A Simple RNN
• Performs the same task for every element of a sequence, with the
output being depended on the previous computations
Unrolled RNN 26
27. The Problem of Long-Term Dependencies
• Consider a language model
trying to predict the next word
based on the previous ones
• Larger Gap => Unable to learn
features by RNN
• Theoretically, this should be
possible but practically simple
RNNs are not capable of
representing long-term
dependencies
𝑇ℎ𝑒 𝑐𝑙𝑜𝑢𝑑𝑠 𝑎𝑟𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑘𝑦
𝐼 𝑔𝑟𝑒𝑤 𝑢𝑝 𝑖𝑛 𝐹𝑟𝑎𝑛𝑐𝑒 … 𝐼 𝑠𝑝𝑒𝑎𝑘 𝑓𝑙𝑢𝑒𝑛𝑡 𝐹𝑟𝑒𝑛𝑐ℎ27
28. LSTM - Hochreiter & Schmidhuber (1997)
• A special kind of RNN
• Capable of learning long-term dependencies
• Long periods of time is practically their default behaviour, not
something they struggle to learn!
An unrolled LSTM 28
31. Practical Session
• See https://online.mrt.ac.lk/mod/folder/view.php?id=65448
• Follow instructions in Moodle to get started using Colab
• Then follow the instructions in Python Notebook
31
Similar to what we discussed in past few slides
However, the traditional deep network which involves in taking all the inputs from previous layers are not scalable
Images which is one of the things that led to implementation of this network has very complex learning curve and they are large as well, that is the traditional neural networks are
These are neurons
In the second sentence, recent information suggests that the next word is probably the name of a language, but if we want to narrow down which language, we need the context of France, from further back. It’s entirely possible for the gap between the relevant information and the point where it is needed to become very large.