5 Introduction to neural networks

Introduction to Machine
Learning
(Neural networks)
Dmytro Fishman (dmytro@ut.ee)

The following material was adopted from:

Evolution of ML methods
Rule-based
Adopted from Y.Bengio http://videolectures.net/deeplearning2015_bengio_theoretical_motivations/
Input
Fixed set of
rules
Output

Rule-based
Classic Machine
Learning
Input Input
Fixed set of
rules
Output
Hand designed
features
Learning
Output

Rule-based
Classic Machine
Learning
Representation
Learning
Input Input Input
Fixed set of
rules
Output
Hand designed
features
Learning
Automated
feature extraction
Output
Learning
Output

Rule-based
Classic Machine
Learning
Representation
Learning
Deep Learning
Input Input Input Input
Fixed set of
rules
Output
Hand designed
features
Learning
Automated
feature extraction
Output
Learning
Output
Low level
features
High level
features
Learning
Output

What is deep learning?
Many layers of adaptive non-linear processing to
model complex relationships among data
Space 1 Space 2

Space 1 Space 2
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
0,1,2,3,…9

Space 1 Space 2
species

Space 1 Space 2
“We love you”

In practice
DL = Artiﬁcial Neural Networks with many layers

A Logical Calculus of the Ideas
Immanent in Nervous Activity
McCulloch & Pitts (1943)

Rosenblatt (1957)
Perceptron
NewYorkTimes: “(The
perceptron) is the embryo of
an electronic computer that is
expected to be able to walk,
talk, see, write, reproduce itself
and be conscious of its
existence”

Minsky & Papert (1969)
Perceptrons: an introduction to
computational geometry
Rosenblatt (1957)
Perceptron

Blum & Rivest (1992)
Training a 3-node neural network is NP-
complete
Rosenblatt (1957)
Perceptron

Rumelhart, Hinton & Williams (1986)
Learning representations by back-
propagating errors
Blum & Rivest (1992)
Training a 3-node neural network is NP-
complete
Rosenblatt (1957)
Perceptron

Artiﬁcial neural network
• A collection of simple trainable mathematical units, which
collaborate to compute a complicated function
• Compatible with supervised, unsupervised, and
reinforcement
• Brain inspired (loosely)

x2w2
x1w1
x2w2
w0
X
0
x1w1
x0
Artiﬁcial Neuron

x2w2
x1w1
x2w2
w0
X
0
x1w1
x2w2
x0
Artiﬁcial Neuron

x2w2
x1w1
x2w2
w0
X
0
x1w1
x2w2
x0
Artiﬁcial Neuron
x1w1

x2w2
x1w1
x2w2
w0
X
0
x1w1
x2w2
x0
Artiﬁcial Neuron
x1w1
x0w0

X
i 0
x2w2
x1w1
x0
Input layer Fully connected
layer
Output layer
Single
neuron
Feedforward Neural Network

x2w2
x1w1
x0
layer
Output layer
Single
neuron
X
x
x2w2
x1w1
x0w0 0

x2w2
x1w1
x0
layer
Output layer
Single
neuron
X
i 0

Learning algorithm
• modify connection weights to
make prediction closer to y
• run neuronal network on input x
• while not done
• pick a random training instance (x, y)

Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
1.The Forward pass - Compute total error
neth1 = w1 ⇤ i1 + w2 ⇤ i2 + b1 ⇤ 1
INPUT TARGET

Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
neth1 = w1 ⇤ i1 + w2 ⇤ i2 + b1 ⇤ 1
neth1 = 0.15 ⇤ 0.05 + 0.2 ⇤ 0.1 + 0.35 ⇤ 1 = 0.3775
INPUT TARGET

Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
neth1 = w1 ⇤ i1 + w2 ⇤ i2 + b1 ⇤ 1
neth1 = 0.15 ⇤ 0.05 + 0.2 ⇤ 0.1 + 0.35 ⇤ 1 = 0.3775
INPUT TARGET
f(x) =
1
1 + e x
outh1 =
1
1 + e neth1
=
1
1 + e 0.3775
= 0.5933

Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
neth1 = w1 ⇤ i1 + w2 ⇤ i2 + b1 ⇤ 1
neth1 = 0.15 ⇤ 0.05 + 0.2 ⇤ 0.1 + 0.35 ⇤ 1 = 0.3775
INPUT TARGET
f(x) =
1
1 + e x
outh1 =
1
1 + e neth1
=
1
1 + e 0.3775
= 0.5933
Repeat for h2 = 0.596; o1 = 0.751; o2 = 0.773

Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
INPUT TARGET
We have o1, o2

Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
INPUT TARGET
We have o1, o2 Etotal =
X 1
2
(target output)
2

Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
INPUT TARGET
X 1
2
(target output)
2
Eo1 =
1
2
(targeto1 outo1)
2
=
1
2
(0.01 0.7514)
2
= 0.2748

Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
INPUT TARGET
X 1
2
(target output)
2
Eo1 =
1
2
(targeto1 outo1)
2
=
1
2
(0.01 0.7514)
2
= 0.2748
Eo2 = 0.02356

Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
INPUT TARGET
X 1
2
(target output)
2
Eo1 =
1
2
(targeto1 outo1)
2
=
1
2
(0.01 0.7514)
2
= 0.2748
Eo2 = 0.02356
Etotal = Eo1 + Eo2 = 0.2748 + 0.02356 = 0.29836

Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
We want to know how much a change in w5 affects the total error

Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5

Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
INPUT TARGET
Etotal =
X 1
2
(target output)
2
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5

Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
INPUT TARGET
Etotal =
X 1
2
(target output)
2
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
@Etotal
@outo1
= 2 ⇤
1
2
(targeto1 outo1) ⇤ 1 + 0 = (0.01 0.751) = 0.741

Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
outo1 =
1
1 + e neto1

Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
@outo1
@neto1
= outo1 (1 outo1) = 0.1868
outo1 =
1
1 + e neto1

Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
neto1 = w5 ⇤ outh1 + w6 ⇤ outh2 + b2 ⇤ 1

Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
neto1 = w5 ⇤ outh1 + w6 ⇤ outh2 + b2 ⇤ 1
@neto1
@w5
= outh1 = 0.5933

Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
@Etotal
@w5
= 0.7414 ⇤ 0.1868 ⇤ 0.5933 = 0.0821

Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
@Etotal
@w5
= 0.7414 ⇤ 0.1868 ⇤ 0.5933 = 0.0821
wnew
5 = wold
5 ⌘ ⇤
@Etotal
@w5
= 0.4 0.5 ⇤ 0.0821 = 0.3589

Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
• Repeat for w6, w7, w8
INPUT TARGET

Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
INPUT TARGET
• In analogous way for w1, w2, w3, w4

Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
INPUT TARGET
• Compute the total error before: 0.298371109

Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
INPUT TARGET
• Compute the total error before:
now:
0.298371109
0.291027924

Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
INPUT TARGET
• Compute the total error before:
now:
0.298371109
0.291027924
• Repeat x10000: 0.000035085

http://www.emergentmind.com/neural-network
Training Neural Networks

http://playground.tensorﬂow.org/
(part II)

Deep networks were
difﬁcult to train
Overﬁtting
DimensionalityVanishing gradients
Complex landscape
w2
w1
E(w1,w2)

Why DL revolution did not
happen in 1986?

Not enough data
(datasets 1000 too small)
happen in 1986?

Computers were too slow
(1000000 times)
happen in 1986?
Not enough data
(datasets 1000 too small)

1.2 million images
1000 categories

Errors
2010
2011
28%
26%
http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/

Errors
2010
2011
2012
28%
26%
16%
AlexNet (A. Krizhevsky et al. 2012)

Errors
2010
2011
2012
2013
2014
2015
2016
28%
26%
16%
12%
7%
3% <3%

Errors
2010
2011
2012
2013
2014
2015
2016
28%
26%
16%
12%
7%
3% <3%
Hypothetical super-
dedicated ﬁne-
grained expert
ensemble of human
labelers

Convolutional Neural Network
Let’s consider the following image

Convolutional layer works as a
ﬁlter applied to the original image

Convolutional layer works as a
filter applied to the original image
There are many filters in the
convolutional layer, they detect
different patterns
4 filters

4 ﬁlters
Each ﬁlter applied to all possible
2x2 patches of the original image
produces one output value

Repeat this process for all ﬁlters in
this layer

Repeat this process for all ﬁlters in
this layer and the next

Flattening
The output of the last
convolutional layer is ﬂattened into
a single vector (like we did with
images)

Flattening
images)
0
1
2
7
8
9
This vector is fed into fully
connected layer with as many
neutrons as possible classes

Flattening
images)
0
1
2
8
9
This vector is fed into fully
connected layer with as many
neutrons as possible classes
Each neuron
outputs
probabilities
7

(part III)
http://scs.ryerson.ca/~aharley/vis/conv/

http://www.asimovinstitute.org/wp-content/uploads/2016/09/neuralnetworks.png

• Pre-training (weights initialization)
(complex landscape)

• Efﬁcient descent algorithms
(complex landscape)
(complex landscape)

• Activation
(complex landscape)
(complex landscape)
(vanishing gradient)

• Dropout
• Activation
(complex landscape)
(complex landscape)
(overﬁtting)

• Dropout
• Domain Prior Knowledge
• Activation
(complex landscape)
(complex landscape)
(overﬁtting)

Now that we are deep...
• Powerful function approximation
• Instead of hand-crafted features, let the algorithm
build the relevant features for your problem
• More representational power for learning

Karpathy, Fei-Fei,“DeepVisual-Semantic Alignments for Generating Image Descriptions” (2014)

Style transferring
Texture Networks by Dmitry Ulyanov et al.

Visual and Textual Question Answering

Visual and Textual Question Answering
http://cloudcv.org/vqa/

References
• Machine Learning by Andrew Ng (https://www.coursera.org/learn/machine-
learning)
• Introduction to Machine Learning by Pascal Vincent given at Deep Learning
Summer School, Montreal 2015 (http://videolectures.net/
deeplearning2015_vincent_machine_learning/)
• Welcome to Machine Learning by Konstantin Tretyakov delivered at AACIMP
Summer School 2015 (http://kt.era.ee/lectures/aacimp2015/1-intro.pdf)
• Stanford CS class: Convolutional Neural Networks for Visual Recognition by
Andrej Karpathy (http://cs231n.github.io/)
• Data Mining Course by Jaak Vilo at University of Tartu (https://courses.cs.ut.ee/
MTAT.03.183/2017_spring/uploads/Main/DM_05_Clustering.pdf)
• Machine Learning Essential Conepts by Ilya Kuzovkin (https://
www.slideshare.net/iljakuzovkin)
• From the brain to deep learning and back by Raul Vicente Zafra and Ilya
Kuzovkin (http://www.uttv.ee/naita?id=23585&keel=eng)

www.biit.cs.ut.ee www.ut.ee www.quretec.ee

5 Introduction to neural networks

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à 5 Introduction to neural networks

Similaire à 5 Introduction to neural networks (20)

Plus de Dmytro Fishman

Plus de Dmytro Fishman (14)

Dernier

Dernier (20)

5 Introduction to neural networks