SlideShare une entreprise Scribd logo
1  sur  104
Télécharger pour lire hors ligne
Introduction to Machine
Learning
(Neural networks)
Dmytro Fishman (dmytro@ut.ee)
The following material was adopted from:
Evolution of ML methods
Rule-based
Adopted from Y.Bengio http://videolectures.net/deeplearning2015_bengio_theoretical_motivations/
Input
Fixed set of
rules
Output
Evolution of ML methods
Rule-based
Classic Machine
Learning
Adopted from Y.Bengio http://videolectures.net/deeplearning2015_bengio_theoretical_motivations/
Input Input
Fixed set of
rules
Output
Hand designed
features
Learning
Output
Evolution of ML methods
Rule-based
Classic Machine
Learning
Representation
Learning
Adopted from Y.Bengio http://videolectures.net/deeplearning2015_bengio_theoretical_motivations/
Input Input Input
Fixed set of
rules
Output
Hand designed
features
Learning
Automated
feature extraction
Output
Learning
Output
Evolution of ML methods
Rule-based
Classic Machine
Learning
Representation
Learning
Deep Learning
Adopted from Y.Bengio http://videolectures.net/deeplearning2015_bengio_theoretical_motivations/
Input Input Input Input
Fixed set of
rules
Output
Hand designed
features
Learning
Automated
feature extraction
Output
Learning
Output
Low level
features
High level
features
Learning
Output
What is deep learning?
Many layers of adaptive non-linear processing to
model complex relationships among data
Space 1 Space 2
What is deep learning?
Many layers of adaptive non-linear processing to
model complex relationships among data
Space 1 Space 2
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
0,1,2,3,…9
What is deep learning?
Many layers of adaptive non-linear processing to
model complex relationships among data
Space 1 Space 2
species
What is deep learning?
Many layers of adaptive non-linear processing to
model complex relationships among data
Space 1 Space 2
“We love you”
In practice
DL = Artificial Neural Networks with many layers
A Logical Calculus of the Ideas
Immanent in Nervous Activity
McCulloch & Pitts (1943)
Rosenblatt (1957)
Perceptron
NewYorkTimes: “(The
perceptron) is the embryo of
an electronic computer that is
expected to be able to walk,
talk, see, write, reproduce itself
and be conscious of its
existence”
A Logical Calculus of the Ideas
Immanent in Nervous Activity
McCulloch & Pitts (1943)
Minsky & Papert (1969)
Perceptrons: an introduction to
computational geometry
Rosenblatt (1957)
Perceptron
A Logical Calculus of the Ideas
Immanent in Nervous Activity
McCulloch & Pitts (1943)
Blum & Rivest (1992)
Training a 3-node neural network is NP-
complete
Minsky & Papert (1969)
Perceptrons: an introduction to
computational geometry
Rosenblatt (1957)
Perceptron
A Logical Calculus of the Ideas
Immanent in Nervous Activity
McCulloch & Pitts (1943)
Rumelhart, Hinton & Williams (1986)
Learning representations by back-
propagating errors
Blum & Rivest (1992)
Training a 3-node neural network is NP-
complete
Minsky & Papert (1969)
Perceptrons: an introduction to
computational geometry
Rosenblatt (1957)
Perceptron
A Logical Calculus of the Ideas
Immanent in Nervous Activity
McCulloch & Pitts (1943)
Artificial neural network
• A collection of simple trainable mathematical units, which
collaborate to compute a complicated function
• Compatible with supervised, unsupervised, and
reinforcement
• Brain inspired (loosely)
x2w2
x1w1
x2w2
w0
X
0
x1w1
x0
Artificial Neuron
x2w2
x1w1
x2w2
w0
X
0
x1w1
x2w2
x0
Artificial Neuron
x2w2
x1w1
x2w2
w0
X
0
x1w1
x2w2
x0
Artificial Neuron
x1w1
x2w2
x1w1
x2w2
w0
X
0
x1w1
x2w2
x0
Artificial Neuron
x1w1
x0w0
x2w2
x1w1
x2w2
w0
X
0
x1w1
x2w2
x0
Artificial Neuron
x1w1
x0w0
X
i 0
x2w2
x1w1
x0
Input layer Fully connected
layer
Output layer
Single
neuron
Feedforward Neural Network
X
i 0
x2w2
x1w1
x0
Input layer Fully connected
layer
Output layer
Single
neuron
Feedforward Neural Network
X
i 0
x2w2
x1w1
x0
Input layer Fully connected
layer
Output layer
Single
neuron
Feedforward Neural Network
x2w2
x1w1
x0
Input layer Fully connected
layer
Output layer
Single
neuron
Feedforward Neural Network
X
x
x2w2
x1w1
x0w0 0
x2w2
x1w1
x0
Input layer Fully connected
layer
Output layer
Single
neuron
Feedforward Neural Network
X
x
x2w2
x1w1
x0w0 0
x2w2
x1w1
x0
Input layer Fully connected
layer
Output layer
Single
neuron
Feedforward Neural Network
X
i 0
Learning algorithm
• modify connection weights to
make prediction closer to y
• run neuronal network on input x
• while not done
• pick a random training instance (x, y)
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
1.The Forward pass - Compute total error
neth1 = w1 ⇤ i1 + w2 ⇤ i2 + b1 ⇤ 1
INPUT TARGET
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
1.The Forward pass - Compute total error
neth1 = w1 ⇤ i1 + w2 ⇤ i2 + b1 ⇤ 1
neth1 = 0.15 ⇤ 0.05 + 0.2 ⇤ 0.1 + 0.35 ⇤ 1 = 0.3775
INPUT TARGET
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
1.The Forward pass - Compute total error
neth1 = w1 ⇤ i1 + w2 ⇤ i2 + b1 ⇤ 1
neth1 = 0.15 ⇤ 0.05 + 0.2 ⇤ 0.1 + 0.35 ⇤ 1 = 0.3775
INPUT TARGET
f(x) =
1
1 + e x
outh1 =
1
1 + e neth1
=
1
1 + e 0.3775
= 0.5933
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
1.The Forward pass - Compute total error
neth1 = w1 ⇤ i1 + w2 ⇤ i2 + b1 ⇤ 1
neth1 = 0.15 ⇤ 0.05 + 0.2 ⇤ 0.1 + 0.35 ⇤ 1 = 0.3775
INPUT TARGET
f(x) =
1
1 + e x
outh1 =
1
1 + e neth1
=
1
1 + e 0.3775
= 0.5933
Repeat for h2 = 0.596; o1 = 0.751; o2 = 0.773
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
1.The Forward pass - Compute total error
INPUT TARGET
We have o1, o2
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
1.The Forward pass - Compute total error
INPUT TARGET
We have o1, o2 Etotal =
X 1
2
(target output)
2
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
1.The Forward pass - Compute total error
INPUT TARGET
We have o1, o2 Etotal =
X 1
2
(target output)
2
Eo1 =
1
2
(targeto1 outo1)
2
=
1
2
(0.01 0.7514)
2
= 0.2748
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
1.The Forward pass - Compute total error
INPUT TARGET
We have o1, o2 Etotal =
X 1
2
(target output)
2
Eo1 =
1
2
(targeto1 outo1)
2
=
1
2
(0.01 0.7514)
2
= 0.2748
Eo2 = 0.02356
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
1.The Forward pass - Compute total error
INPUT TARGET
We have o1, o2 Etotal =
X 1
2
(target output)
2
Eo1 =
1
2
(targeto1 outo1)
2
=
1
2
(0.01 0.7514)
2
= 0.2748
Eo2 = 0.02356
Etotal = Eo1 + Eo2 = 0.2748 + 0.02356 = 0.29836
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
We want to know how much a change in w5 affects the total error
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
We want to know how much a change in w5 affects the total error
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
We want to know how much a change in w5 affects the total error
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
Etotal =
X 1
2
(target output)
2
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
We want to know how much a change in w5 affects the total error
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
Etotal =
X 1
2
(target output)
2
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
We want to know how much a change in w5 affects the total error
@Etotal
@outo1
= 2 ⇤
1
2
(targeto1 outo1) ⇤ 1 + 0 = (0.01 0.751) = 0.741
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
We want to know how much a change in w5 affects the total error
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
We want to know how much a change in w5 affects the total error
outo1 =
1
1 + e neto1
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
We want to know how much a change in w5 affects the total error
@outo1
@neto1
= outo1 (1 outo1) = 0.1868
outo1 =
1
1 + e neto1
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
We want to know how much a change in w5 affects the total error
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
We want to know how much a change in w5 affects the total error
neto1 = w5 ⇤ outh1 + w6 ⇤ outh2 + b2 ⇤ 1
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
We want to know how much a change in w5 affects the total error
neto1 = w5 ⇤ outh1 + w6 ⇤ outh2 + b2 ⇤ 1
@neto1
@w5
= outh1 = 0.5933
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
We want to know how much a change in w5 affects the total error
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
We want to know how much a change in w5 affects the total error
@Etotal
@w5
= 0.7414 ⇤ 0.1868 ⇤ 0.5933 = 0.0821
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
We want to know how much a change in w5 affects the total error
@Etotal
@w5
= 0.7414 ⇤ 0.1868 ⇤ 0.5933 = 0.0821
wnew
5 = wold
5 ⌘ ⇤
@Etotal
@w5
= 0.4 0.5 ⇤ 0.0821 = 0.3589
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
• Repeat for w6, w7, w8
INPUT TARGET
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
• Repeat for w6, w7, w8
INPUT TARGET
• In analogous way for w1, w2, w3, w4
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
• Repeat for w6, w7, w8
INPUT TARGET
• In analogous way for w1, w2, w3, w4
• Compute the total error before: 0.298371109
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
• Repeat for w6, w7, w8
INPUT TARGET
• In analogous way for w1, w2, w3, w4
• Compute the total error before:
now:
0.298371109
0.291027924
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
• Repeat for w6, w7, w8
INPUT TARGET
• In analogous way for w1, w2, w3, w4
• Compute the total error before:
now:
0.298371109
0.291027924
• Repeat x10000: 0.000035085
http://www.emergentmind.com/neural-network
Training Neural Networks
http://playground.tensorflow.org/
Training Neural Networks
(part II)
Deep networks were
difficult to train
Overfitting
DimensionalityVanishing gradients
Complex landscape
w2
w1
E(w1,w2)
Why DL revolution did not
happen in 1986?
Not enough data
(datasets 1000 too small)
Why DL revolution did not
happen in 1986?
Computers were too slow
(1000000 times)
Why DL revolution did not
happen in 1986?
Not enough data
(datasets 1000 too small)
1.2 million images
1000 categories
Errors
2010
2011
28%
26%
http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/
Errors
2010
2011
2012
28%
26%
16%
http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/
AlexNet (A. Krizhevsky et al. 2012)
Errors
2010
2011
2012
2013
2014
2015
2016
28%
26%
16%
12%
7%
3% <3%
AlexNet (A. Krizhevsky et al. 2012)
http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/
Errors
2010
2011
2012
2013
2014
2015
2016
28%
26%
16%
12%
7%
3% <3%
Hypothetical super-
dedicated fine-
grained expert
ensemble of human
labelers
AlexNet (A. Krizhevsky et al. 2012)
http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/
Convolutional Neural Network
Let’s consider the following image
Convolutional Neural Network
Let’s consider the following image
Convolutional Neural Network
Convolutional layer works as a
filter applied to the original image
Convolutional Neural Network
Convolutional layer works as a
filter applied to the original image
There are many filters in the
convolutional layer, they detect
different patterns
4 filters
Convolutional Neural Network
4 filters
Each filter applied to all possible
2x2 patches of the original image
produces one output value
Convolutional Neural Network
4 filters
Each filter applied to all possible
2x2 patches of the original image
produces one output value
Convolutional Neural Network
4 filters
Each filter applied to all possible
2x2 patches of the original image
produces one output value
Convolutional Neural Network
4 filters
Each filter applied to all possible
2x2 patches of the original image
produces one output value
Convolutional Neural Network
Each filter applied to all possible
2x2 patches of the original image
produces one output value
Repeat this process for all filters in
this layer
Convolutional Neural Network
Each filter applied to all possible
2x2 patches of the original image
produces one output value
Repeat this process for all filters in
this layer and the next
Flattening
The output of the last
convolutional layer is flattened into
a single vector (like we did with
images)
Convolutional Neural Network
Flattening
The output of the last
convolutional layer is flattened into
a single vector (like we did with
images)
Convolutional Neural Network
0
1
2
7
8
9
This vector is fed into fully
connected layer with as many
neutrons as possible classes
Flattening
The output of the last
convolutional layer is flattened into
a single vector (like we did with
images)
Convolutional Neural Network
0
1
2
8
9
This vector is fed into fully
connected layer with as many
neutrons as possible classes
Each neuron
outputs
probabilities
7
Training Neural Networks
(part III)
http://scs.ryerson.ca/~aharley/vis/conv/
http://www.asimovinstitute.org/wp-content/uploads/2016/09/neuralnetworks.png
• Pre-training (weights initialization)
(complex landscape)
• Pre-training (weights initialization)
• Efficient descent algorithms
(complex landscape)
(complex landscape)
• Pre-training (weights initialization)
• Efficient descent algorithms
• Activation
(complex landscape)
(complex landscape)
(vanishing gradient)
• Pre-training (weights initialization)
• Efficient descent algorithms
• Dropout
• Activation
(complex landscape)
(complex landscape)
(vanishing gradient)
(overfitting)
• Pre-training (weights initialization)
• Efficient descent algorithms
• Dropout
• Domain Prior Knowledge
• Activation
(complex landscape)
(complex landscape)
(vanishing gradient)
(overfitting)
Now that we are deep...
• Powerful function approximation
• Instead of hand-crafted features, let the algorithm
build the relevant features for your problem
• More representational power for learning
False positives
False negatives
Karpathy, Fei-Fei,“DeepVisual-Semantic Alignments for Generating Image Descriptions” (2014)
Style transferring
Texture Networks by Dmitry Ulyanov et al.
Visual and Textual Question Answering
Visual and Textual Question Answering
Visual and Textual Question Answering
http://cloudcv.org/vqa/
References
• Machine Learning by Andrew Ng (https://www.coursera.org/learn/machine-
learning)
• Introduction to Machine Learning by Pascal Vincent given at Deep Learning
Summer School, Montreal 2015 (http://videolectures.net/
deeplearning2015_vincent_machine_learning/)
• Welcome to Machine Learning by Konstantin Tretyakov delivered at AACIMP
Summer School 2015 (http://kt.era.ee/lectures/aacimp2015/1-intro.pdf)
• Stanford CS class: Convolutional Neural Networks for Visual Recognition by
Andrej Karpathy (http://cs231n.github.io/)
• Data Mining Course by Jaak Vilo at University of Tartu (https://courses.cs.ut.ee/
MTAT.03.183/2017_spring/uploads/Main/DM_05_Clustering.pdf)
• Machine Learning Essential Conepts by Ilya Kuzovkin (https://
www.slideshare.net/iljakuzovkin)
• From the brain to deep learning and back by Raul Vicente Zafra and Ilya
Kuzovkin (http://www.uttv.ee/naita?id=23585&keel=eng)
www.biit.cs.ut.ee www.ut.ee www.quretec.ee

Contenu connexe

Tendances

Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Simplilearn
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
Simplilearn
 

Tendances (20)

"Deep Learning" Chap.6 Convolutional Neural Net
"Deep Learning" Chap.6 Convolutional Neural Net"Deep Learning" Chap.6 Convolutional Neural Net
"Deep Learning" Chap.6 Convolutional Neural Net
 
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural Networks
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
 
What is a Neural Network | Edureka
What is a Neural Network | EdurekaWhat is a Neural Network | Edureka
What is a Neural Network | Edureka
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Artificial neural network model & hidden layers in multilayer artificial neur...
Artificial neural network model & hidden layers in multilayer artificial neur...Artificial neural network model & hidden layers in multilayer artificial neur...
Artificial neural network model & hidden layers in multilayer artificial neur...
 
CNN Attention Networks
CNN Attention NetworksCNN Attention Networks
CNN Attention Networks
 
rnn BASICS
rnn BASICSrnn BASICS
rnn BASICS
 
Intro to Neural Networks
Intro to Neural NetworksIntro to Neural Networks
Intro to Neural Networks
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysis
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
 

Similaire à 5 Introduction to neural networks

Introduction to Artificial Neural Networks - PART III.pdf
Introduction to Artificial Neural Networks - PART III.pdfIntroduction to Artificial Neural Networks - PART III.pdf
Introduction to Artificial Neural Networks - PART III.pdf
SasiKala592103
 
There are two types of ciphers - Block and Stream. Block is used to .docx
There are two types of ciphers - Block and Stream. Block is used to .docxThere are two types of ciphers - Block and Stream. Block is used to .docx
There are two types of ciphers - Block and Stream. Block is used to .docx
relaine1
 

Similaire à 5 Introduction to neural networks (20)

Back prop
Back propBack prop
Back prop
 
Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplified
 
Learning Deep Learning
Learning Deep LearningLearning Deep Learning
Learning Deep Learning
 
Neural Network - Feed Forward - Back Propagation Visualization
Neural Network - Feed Forward - Back Propagation VisualizationNeural Network - Feed Forward - Back Propagation Visualization
Neural Network - Feed Forward - Back Propagation Visualization
 
Back Propagation in Deep Neural Network
Back Propagation in Deep Neural NetworkBack Propagation in Deep Neural Network
Back Propagation in Deep Neural Network
 
Backpropagation Algorithm forward and backward pass
Backpropagation Algorithm forward and backward passBackpropagation Algorithm forward and backward pass
Backpropagation Algorithm forward and backward pass
 
weights training of perceptron (using 3 training rules)
weights training of perceptron (using 3 training rules)weights training of perceptron (using 3 training rules)
weights training of perceptron (using 3 training rules)
 
Introduction to Deep Neural Network
Introduction to Deep Neural NetworkIntroduction to Deep Neural Network
Introduction to Deep Neural Network
 
backprop.ppt
backprop.pptbackprop.ppt
backprop.ppt
 
Deep Learning & Tensor flow: An Intro
Deep Learning & Tensor flow: An IntroDeep Learning & Tensor flow: An Intro
Deep Learning & Tensor flow: An Intro
 
Capstone paper
Capstone paperCapstone paper
Capstone paper
 
Deep Learning through Pytorch Exercises
Deep Learning through Pytorch ExercisesDeep Learning through Pytorch Exercises
Deep Learning through Pytorch Exercises
 
Deep Style: Using Variational Auto-encoders for Image Generation
Deep Style: Using Variational Auto-encoders for Image GenerationDeep Style: Using Variational Auto-encoders for Image Generation
Deep Style: Using Variational Auto-encoders for Image Generation
 
Hetro associative memory
Hetro associative memoryHetro associative memory
Hetro associative memory
 
DNN.pptx
DNN.pptxDNN.pptx
DNN.pptx
 
2018 Global Azure Bootcamp Azure Machine Learning for neural networks
2018 Global Azure Bootcamp Azure Machine Learning for neural networks2018 Global Azure Bootcamp Azure Machine Learning for neural networks
2018 Global Azure Bootcamp Azure Machine Learning for neural networks
 
REvit training
REvit trainingREvit training
REvit training
 
04-logic-gates (1).ppt
04-logic-gates (1).ppt04-logic-gates (1).ppt
04-logic-gates (1).ppt
 
Introduction to Artificial Neural Networks - PART III.pdf
Introduction to Artificial Neural Networks - PART III.pdfIntroduction to Artificial Neural Networks - PART III.pdf
Introduction to Artificial Neural Networks - PART III.pdf
 
There are two types of ciphers - Block and Stream. Block is used to .docx
There are two types of ciphers - Block and Stream. Block is used to .docxThere are two types of ciphers - Block and Stream. Block is used to .docx
There are two types of ciphers - Block and Stream. Block is used to .docx
 

Plus de Dmytro Fishman

Plus de Dmytro Fishman (14)

DOME: Recommendations for supervised machine learning validation in biology
DOME: Recommendations for supervised machine learning validation in biologyDOME: Recommendations for supervised machine learning validation in biology
DOME: Recommendations for supervised machine learning validation in biology
 
Tips for effective presentations
Tips for effective presentationsTips for effective presentations
Tips for effective presentations
 
Autonomous Driving Lab - Simultaneous Localization and Mapping WP
Autonomous Driving Lab - Simultaneous Localization and Mapping WPAutonomous Driving Lab - Simultaneous Localization and Mapping WP
Autonomous Driving Lab - Simultaneous Localization and Mapping WP
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Introduction to Machine Learning for Taxify/Bolt
Introduction to Machine Learning for Taxify/BoltIntroduction to Machine Learning for Taxify/Bolt
Introduction to Machine Learning for Taxify/Bolt
 
Introduction to Gaussian Processes
Introduction to Gaussian ProcessesIntroduction to Gaussian Processes
Introduction to Gaussian Processes
 
Biit group 2018
Biit group 2018Biit group 2018
Biit group 2018
 
Detecting Nuclei from Microscopy Images with Deep Learning
Detecting Nuclei from Microscopy Images with Deep LearningDetecting Nuclei from Microscopy Images with Deep Learning
Detecting Nuclei from Microscopy Images with Deep Learning
 
Deep Learning in Healthcare
Deep Learning in HealthcareDeep Learning in Healthcare
Deep Learning in Healthcare
 
4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)
 
3 Unsupervised learning
3 Unsupervised learning3 Unsupervised learning
3 Unsupervised learning
 
1 Supervised learning
1 Supervised learning1 Supervised learning
1 Supervised learning
 
What does it mean to be a bioinformatician?
What does it mean to be a bioinformatician?What does it mean to be a bioinformatician?
What does it mean to be a bioinformatician?
 
Machine Learning in Bioinformatics
Machine Learning in BioinformaticsMachine Learning in Bioinformatics
Machine Learning in Bioinformatics
 

Dernier

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Dernier (20)

Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 

5 Introduction to neural networks

  • 1. Introduction to Machine Learning (Neural networks) Dmytro Fishman (dmytro@ut.ee)
  • 2. The following material was adopted from:
  • 3. Evolution of ML methods Rule-based Adopted from Y.Bengio http://videolectures.net/deeplearning2015_bengio_theoretical_motivations/ Input Fixed set of rules Output
  • 4. Evolution of ML methods Rule-based Classic Machine Learning Adopted from Y.Bengio http://videolectures.net/deeplearning2015_bengio_theoretical_motivations/ Input Input Fixed set of rules Output Hand designed features Learning Output
  • 5. Evolution of ML methods Rule-based Classic Machine Learning Representation Learning Adopted from Y.Bengio http://videolectures.net/deeplearning2015_bengio_theoretical_motivations/ Input Input Input Fixed set of rules Output Hand designed features Learning Automated feature extraction Output Learning Output
  • 6. Evolution of ML methods Rule-based Classic Machine Learning Representation Learning Deep Learning Adopted from Y.Bengio http://videolectures.net/deeplearning2015_bengio_theoretical_motivations/ Input Input Input Input Fixed set of rules Output Hand designed features Learning Automated feature extraction Output Learning Output Low level features High level features Learning Output
  • 7. What is deep learning? Many layers of adaptive non-linear processing to model complex relationships among data Space 1 Space 2
  • 8. What is deep learning? Many layers of adaptive non-linear processing to model complex relationships among data Space 1 Space 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 0,1,2,3,…9
  • 9. What is deep learning? Many layers of adaptive non-linear processing to model complex relationships among data Space 1 Space 2 species
  • 10. What is deep learning? Many layers of adaptive non-linear processing to model complex relationships among data Space 1 Space 2 “We love you”
  • 11. In practice DL = Artificial Neural Networks with many layers
  • 12. A Logical Calculus of the Ideas Immanent in Nervous Activity McCulloch & Pitts (1943)
  • 13. Rosenblatt (1957) Perceptron NewYorkTimes: “(The perceptron) is the embryo of an electronic computer that is expected to be able to walk, talk, see, write, reproduce itself and be conscious of its existence” A Logical Calculus of the Ideas Immanent in Nervous Activity McCulloch & Pitts (1943)
  • 14. Minsky & Papert (1969) Perceptrons: an introduction to computational geometry Rosenblatt (1957) Perceptron A Logical Calculus of the Ideas Immanent in Nervous Activity McCulloch & Pitts (1943)
  • 15. Blum & Rivest (1992) Training a 3-node neural network is NP- complete Minsky & Papert (1969) Perceptrons: an introduction to computational geometry Rosenblatt (1957) Perceptron A Logical Calculus of the Ideas Immanent in Nervous Activity McCulloch & Pitts (1943)
  • 16. Rumelhart, Hinton & Williams (1986) Learning representations by back- propagating errors Blum & Rivest (1992) Training a 3-node neural network is NP- complete Minsky & Papert (1969) Perceptrons: an introduction to computational geometry Rosenblatt (1957) Perceptron A Logical Calculus of the Ideas Immanent in Nervous Activity McCulloch & Pitts (1943)
  • 17. Artificial neural network • A collection of simple trainable mathematical units, which collaborate to compute a complicated function • Compatible with supervised, unsupervised, and reinforcement • Brain inspired (loosely)
  • 23. X i 0 x2w2 x1w1 x0 Input layer Fully connected layer Output layer Single neuron Feedforward Neural Network
  • 24. X i 0 x2w2 x1w1 x0 Input layer Fully connected layer Output layer Single neuron Feedforward Neural Network
  • 25. X i 0 x2w2 x1w1 x0 Input layer Fully connected layer Output layer Single neuron Feedforward Neural Network
  • 26. x2w2 x1w1 x0 Input layer Fully connected layer Output layer Single neuron Feedforward Neural Network X x x2w2 x1w1 x0w0 0
  • 27. x2w2 x1w1 x0 Input layer Fully connected layer Output layer Single neuron Feedforward Neural Network X x x2w2 x1w1 x0w0 0
  • 28. x2w2 x1w1 x0 Input layer Fully connected layer Output layer Single neuron Feedforward Neural Network X i 0
  • 29. Learning algorithm • modify connection weights to make prediction closer to y • run neuronal network on input x • while not done • pick a random training instance (x, y)
  • 30. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 1.The Forward pass - Compute total error neth1 = w1 ⇤ i1 + w2 ⇤ i2 + b1 ⇤ 1 INPUT TARGET
  • 31. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 1.The Forward pass - Compute total error neth1 = w1 ⇤ i1 + w2 ⇤ i2 + b1 ⇤ 1 neth1 = 0.15 ⇤ 0.05 + 0.2 ⇤ 0.1 + 0.35 ⇤ 1 = 0.3775 INPUT TARGET
  • 32. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 1.The Forward pass - Compute total error neth1 = w1 ⇤ i1 + w2 ⇤ i2 + b1 ⇤ 1 neth1 = 0.15 ⇤ 0.05 + 0.2 ⇤ 0.1 + 0.35 ⇤ 1 = 0.3775 INPUT TARGET f(x) = 1 1 + e x outh1 = 1 1 + e neth1 = 1 1 + e 0.3775 = 0.5933
  • 33. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 1.The Forward pass - Compute total error neth1 = w1 ⇤ i1 + w2 ⇤ i2 + b1 ⇤ 1 neth1 = 0.15 ⇤ 0.05 + 0.2 ⇤ 0.1 + 0.35 ⇤ 1 = 0.3775 INPUT TARGET f(x) = 1 1 + e x outh1 = 1 1 + e neth1 = 1 1 + e 0.3775 = 0.5933 Repeat for h2 = 0.596; o1 = 0.751; o2 = 0.773
  • 34. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 1.The Forward pass - Compute total error INPUT TARGET We have o1, o2
  • 35. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 1.The Forward pass - Compute total error INPUT TARGET We have o1, o2 Etotal = X 1 2 (target output) 2
  • 36. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 1.The Forward pass - Compute total error INPUT TARGET We have o1, o2 Etotal = X 1 2 (target output) 2 Eo1 = 1 2 (targeto1 outo1) 2 = 1 2 (0.01 0.7514) 2 = 0.2748
  • 37. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 1.The Forward pass - Compute total error INPUT TARGET We have o1, o2 Etotal = X 1 2 (target output) 2 Eo1 = 1 2 (targeto1 outo1) 2 = 1 2 (0.01 0.7514) 2 = 0.2748 Eo2 = 0.02356
  • 38. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 1.The Forward pass - Compute total error INPUT TARGET We have o1, o2 Etotal = X 1 2 (target output) 2 Eo1 = 1 2 (targeto1 outo1) 2 = 1 2 (0.01 0.7514) 2 = 0.2748 Eo2 = 0.02356 Etotal = Eo1 + Eo2 = 0.2748 + 0.02356 = 0.29836
  • 39. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 2.The Backward pass - Updating weights INPUT TARGET We want to know how much a change in w5 affects the total error
  • 40. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 2.The Backward pass - Updating weights INPUT TARGET @Etotal @w5 = @Etotal @outo1 ⇤ @outo1 @neto1 ⇤ @neto1 @w5 We want to know how much a change in w5 affects the total error
  • 41. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 2.The Backward pass - Updating weights INPUT TARGET @Etotal @w5 = @Etotal @outo1 ⇤ @outo1 @neto1 ⇤ @neto1 @w5 We want to know how much a change in w5 affects the total error
  • 42. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 2.The Backward pass - Updating weights INPUT TARGET Etotal = X 1 2 (target output) 2 @Etotal @w5 = @Etotal @outo1 ⇤ @outo1 @neto1 ⇤ @neto1 @w5 We want to know how much a change in w5 affects the total error
  • 43. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 2.The Backward pass - Updating weights INPUT TARGET Etotal = X 1 2 (target output) 2 @Etotal @w5 = @Etotal @outo1 ⇤ @outo1 @neto1 ⇤ @neto1 @w5 We want to know how much a change in w5 affects the total error @Etotal @outo1 = 2 ⇤ 1 2 (targeto1 outo1) ⇤ 1 + 0 = (0.01 0.751) = 0.741
  • 44. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 2.The Backward pass - Updating weights INPUT TARGET @Etotal @w5 = @Etotal @outo1 ⇤ @outo1 @neto1 ⇤ @neto1 @w5 We want to know how much a change in w5 affects the total error
  • 45. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 2.The Backward pass - Updating weights INPUT TARGET @Etotal @w5 = @Etotal @outo1 ⇤ @outo1 @neto1 ⇤ @neto1 @w5 We want to know how much a change in w5 affects the total error outo1 = 1 1 + e neto1
  • 46. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 2.The Backward pass - Updating weights INPUT TARGET @Etotal @w5 = @Etotal @outo1 ⇤ @outo1 @neto1 ⇤ @neto1 @w5 We want to know how much a change in w5 affects the total error @outo1 @neto1 = outo1 (1 outo1) = 0.1868 outo1 = 1 1 + e neto1
  • 47. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 2.The Backward pass - Updating weights INPUT TARGET @Etotal @w5 = @Etotal @outo1 ⇤ @outo1 @neto1 ⇤ @neto1 @w5 We want to know how much a change in w5 affects the total error
  • 48. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 2.The Backward pass - Updating weights INPUT TARGET @Etotal @w5 = @Etotal @outo1 ⇤ @outo1 @neto1 ⇤ @neto1 @w5 We want to know how much a change in w5 affects the total error neto1 = w5 ⇤ outh1 + w6 ⇤ outh2 + b2 ⇤ 1
  • 49. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 2.The Backward pass - Updating weights INPUT TARGET @Etotal @w5 = @Etotal @outo1 ⇤ @outo1 @neto1 ⇤ @neto1 @w5 We want to know how much a change in w5 affects the total error neto1 = w5 ⇤ outh1 + w6 ⇤ outh2 + b2 ⇤ 1 @neto1 @w5 = outh1 = 0.5933
  • 50. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 2.The Backward pass - Updating weights INPUT TARGET @Etotal @w5 = @Etotal @outo1 ⇤ @outo1 @neto1 ⇤ @neto1 @w5 We want to know how much a change in w5 affects the total error
  • 51. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 2.The Backward pass - Updating weights INPUT TARGET @Etotal @w5 = @Etotal @outo1 ⇤ @outo1 @neto1 ⇤ @neto1 @w5 We want to know how much a change in w5 affects the total error @Etotal @w5 = 0.7414 ⇤ 0.1868 ⇤ 0.5933 = 0.0821
  • 52. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 2.The Backward pass - Updating weights INPUT TARGET @Etotal @w5 = @Etotal @outo1 ⇤ @outo1 @neto1 ⇤ @neto1 @w5 We want to know how much a change in w5 affects the total error @Etotal @w5 = 0.7414 ⇤ 0.1868 ⇤ 0.5933 = 0.0821 wnew 5 = wold 5 ⌘ ⇤ @Etotal @w5 = 0.4 0.5 ⇤ 0.0821 = 0.3589
  • 53. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 • Repeat for w6, w7, w8 INPUT TARGET
  • 54. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 • Repeat for w6, w7, w8 INPUT TARGET • In analogous way for w1, w2, w3, w4
  • 55. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 • Repeat for w6, w7, w8 INPUT TARGET • In analogous way for w1, w2, w3, w4 • Compute the total error before: 0.298371109
  • 56. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 • Repeat for w6, w7, w8 INPUT TARGET • In analogous way for w1, w2, w3, w4 • Compute the total error before: now: 0.298371109 0.291027924
  • 57. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 • Repeat for w6, w7, w8 INPUT TARGET • In analogous way for w1, w2, w3, w4 • Compute the total error before: now: 0.298371109 0.291027924 • Repeat x10000: 0.000035085
  • 60. Deep networks were difficult to train Overfitting DimensionalityVanishing gradients Complex landscape w2 w1 E(w1,w2)
  • 61. Why DL revolution did not happen in 1986?
  • 62. Not enough data (datasets 1000 too small) Why DL revolution did not happen in 1986?
  • 63. Computers were too slow (1000000 times) Why DL revolution did not happen in 1986? Not enough data (datasets 1000 too small)
  • 67. Errors 2010 2011 2012 2013 2014 2015 2016 28% 26% 16% 12% 7% 3% <3% AlexNet (A. Krizhevsky et al. 2012) http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/
  • 68. Errors 2010 2011 2012 2013 2014 2015 2016 28% 26% 16% 12% 7% 3% <3% Hypothetical super- dedicated fine- grained expert ensemble of human labelers AlexNet (A. Krizhevsky et al. 2012) http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/
  • 69. Convolutional Neural Network Let’s consider the following image
  • 70. Convolutional Neural Network Let’s consider the following image
  • 71. Convolutional Neural Network Convolutional layer works as a filter applied to the original image
  • 72. Convolutional Neural Network Convolutional layer works as a filter applied to the original image There are many filters in the convolutional layer, they detect different patterns 4 filters
  • 73. Convolutional Neural Network 4 filters Each filter applied to all possible 2x2 patches of the original image produces one output value
  • 74. Convolutional Neural Network 4 filters Each filter applied to all possible 2x2 patches of the original image produces one output value
  • 75. Convolutional Neural Network 4 filters Each filter applied to all possible 2x2 patches of the original image produces one output value
  • 76. Convolutional Neural Network 4 filters Each filter applied to all possible 2x2 patches of the original image produces one output value
  • 77. Convolutional Neural Network Each filter applied to all possible 2x2 patches of the original image produces one output value Repeat this process for all filters in this layer
  • 78. Convolutional Neural Network Each filter applied to all possible 2x2 patches of the original image produces one output value Repeat this process for all filters in this layer and the next
  • 79. Flattening The output of the last convolutional layer is flattened into a single vector (like we did with images) Convolutional Neural Network
  • 80. Flattening The output of the last convolutional layer is flattened into a single vector (like we did with images) Convolutional Neural Network 0 1 2 7 8 9 This vector is fed into fully connected layer with as many neutrons as possible classes
  • 81. Flattening The output of the last convolutional layer is flattened into a single vector (like we did with images) Convolutional Neural Network 0 1 2 8 9 This vector is fed into fully connected layer with as many neutrons as possible classes Each neuron outputs probabilities 7
  • 82. Training Neural Networks (part III) http://scs.ryerson.ca/~aharley/vis/conv/
  • 84.
  • 85. • Pre-training (weights initialization) (complex landscape)
  • 86. • Pre-training (weights initialization) • Efficient descent algorithms (complex landscape) (complex landscape)
  • 87. • Pre-training (weights initialization) • Efficient descent algorithms • Activation (complex landscape) (complex landscape) (vanishing gradient)
  • 88. • Pre-training (weights initialization) • Efficient descent algorithms • Dropout • Activation (complex landscape) (complex landscape) (vanishing gradient) (overfitting)
  • 89. • Pre-training (weights initialization) • Efficient descent algorithms • Dropout • Domain Prior Knowledge • Activation (complex landscape) (complex landscape) (vanishing gradient) (overfitting)
  • 90. Now that we are deep... • Powerful function approximation • Instead of hand-crafted features, let the algorithm build the relevant features for your problem • More representational power for learning
  • 91.
  • 92.
  • 93.
  • 94.
  • 95.
  • 98. Karpathy, Fei-Fei,“DeepVisual-Semantic Alignments for Generating Image Descriptions” (2014)
  • 99. Style transferring Texture Networks by Dmitry Ulyanov et al.
  • 100. Visual and Textual Question Answering
  • 101. Visual and Textual Question Answering
  • 102. Visual and Textual Question Answering http://cloudcv.org/vqa/
  • 103. References • Machine Learning by Andrew Ng (https://www.coursera.org/learn/machine- learning) • Introduction to Machine Learning by Pascal Vincent given at Deep Learning Summer School, Montreal 2015 (http://videolectures.net/ deeplearning2015_vincent_machine_learning/) • Welcome to Machine Learning by Konstantin Tretyakov delivered at AACIMP Summer School 2015 (http://kt.era.ee/lectures/aacimp2015/1-intro.pdf) • Stanford CS class: Convolutional Neural Networks for Visual Recognition by Andrej Karpathy (http://cs231n.github.io/) • Data Mining Course by Jaak Vilo at University of Tartu (https://courses.cs.ut.ee/ MTAT.03.183/2017_spring/uploads/Main/DM_05_Clustering.pdf) • Machine Learning Essential Conepts by Ilya Kuzovkin (https:// www.slideshare.net/iljakuzovkin) • From the brain to deep learning and back by Raul Vicente Zafra and Ilya Kuzovkin (http://www.uttv.ee/naita?id=23585&keel=eng)