The fifth lecture from the Machine Learning course series of lectures. It covers short history, basic types and most important principles of neural networks. A link to my github (https://github.com/skyfallen/MachineLearningPracticals) with practicals that I have designed for this course in both R and Python. I can share keynote files, contact me via e-mail: dmytro.fishman@ut.ee.
3. Evolution of ML methods
Rule-based
Adopted from Y.Bengio http://videolectures.net/deeplearning2015_bengio_theoretical_motivations/
Input
Fixed set of
rules
Output
4. Evolution of ML methods
Rule-based
Classic Machine
Learning
Adopted from Y.Bengio http://videolectures.net/deeplearning2015_bengio_theoretical_motivations/
Input Input
Fixed set of
rules
Output
Hand designed
features
Learning
Output
5. Evolution of ML methods
Rule-based
Classic Machine
Learning
Representation
Learning
Adopted from Y.Bengio http://videolectures.net/deeplearning2015_bengio_theoretical_motivations/
Input Input Input
Fixed set of
rules
Output
Hand designed
features
Learning
Automated
feature extraction
Output
Learning
Output
6. Evolution of ML methods
Rule-based
Classic Machine
Learning
Representation
Learning
Deep Learning
Adopted from Y.Bengio http://videolectures.net/deeplearning2015_bengio_theoretical_motivations/
Input Input Input Input
Fixed set of
rules
Output
Hand designed
features
Learning
Automated
feature extraction
Output
Learning
Output
Low level
features
High level
features
Learning
Output
7. What is deep learning?
Many layers of adaptive non-linear processing to
model complex relationships among data
Space 1 Space 2
8. What is deep learning?
Many layers of adaptive non-linear processing to
model complex relationships among data
Space 1 Space 2
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
0,1,2,3,…9
9. What is deep learning?
Many layers of adaptive non-linear processing to
model complex relationships among data
Space 1 Space 2
species
10. What is deep learning?
Many layers of adaptive non-linear processing to
model complex relationships among data
Space 1 Space 2
“We love you”
12. A Logical Calculus of the Ideas
Immanent in Nervous Activity
McCulloch & Pitts (1943)
13. Rosenblatt (1957)
Perceptron
NewYorkTimes: “(The
perceptron) is the embryo of
an electronic computer that is
expected to be able to walk,
talk, see, write, reproduce itself
and be conscious of its
existence”
A Logical Calculus of the Ideas
Immanent in Nervous Activity
McCulloch & Pitts (1943)
14. Minsky & Papert (1969)
Perceptrons: an introduction to
computational geometry
Rosenblatt (1957)
Perceptron
A Logical Calculus of the Ideas
Immanent in Nervous Activity
McCulloch & Pitts (1943)
15. Blum & Rivest (1992)
Training a 3-node neural network is NP-
complete
Minsky & Papert (1969)
Perceptrons: an introduction to
computational geometry
Rosenblatt (1957)
Perceptron
A Logical Calculus of the Ideas
Immanent in Nervous Activity
McCulloch & Pitts (1943)
16. Rumelhart, Hinton & Williams (1986)
Learning representations by back-
propagating errors
Blum & Rivest (1992)
Training a 3-node neural network is NP-
complete
Minsky & Papert (1969)
Perceptrons: an introduction to
computational geometry
Rosenblatt (1957)
Perceptron
A Logical Calculus of the Ideas
Immanent in Nervous Activity
McCulloch & Pitts (1943)
17. Artificial neural network
• A collection of simple trainable mathematical units, which
collaborate to compute a complicated function
• Compatible with supervised, unsupervised, and
reinforcement
• Brain inspired (loosely)
29. Learning algorithm
• modify connection weights to
make prediction closer to y
• run neuronal network on input x
• while not done
• pick a random training instance (x, y)
39. Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
We want to know how much a change in w5 affects the total error
40. Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
We want to know how much a change in w5 affects the total error
41. Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
We want to know how much a change in w5 affects the total error
42. Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
Etotal =
X 1
2
(target output)
2
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
We want to know how much a change in w5 affects the total error
43. Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
Etotal =
X 1
2
(target output)
2
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
We want to know how much a change in w5 affects the total error
@Etotal
@outo1
= 2 ⇤
1
2
(targeto1 outo1) ⇤ 1 + 0 = (0.01 0.751) = 0.741
44. Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
We want to know how much a change in w5 affects the total error
45. Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
We want to know how much a change in w5 affects the total error
outo1 =
1
1 + e neto1
46. Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
We want to know how much a change in w5 affects the total error
@outo1
@neto1
= outo1 (1 outo1) = 0.1868
outo1 =
1
1 + e neto1
47. Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
We want to know how much a change in w5 affects the total error
48. Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
We want to know how much a change in w5 affects the total error
neto1 = w5 ⇤ outh1 + w6 ⇤ outh2 + b2 ⇤ 1
49. Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
We want to know how much a change in w5 affects the total error
neto1 = w5 ⇤ outh1 + w6 ⇤ outh2 + b2 ⇤ 1
@neto1
@w5
= outh1 = 0.5933
50. Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
We want to know how much a change in w5 affects the total error
51. Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
We want to know how much a change in w5 affects the total error
@Etotal
@w5
= 0.7414 ⇤ 0.1868 ⇤ 0.5933 = 0.0821
52. Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5
=
@Etotal
@outo1
⇤
@outo1
@neto1
⇤
@neto1
@w5
We want to know how much a change in w5 affects the total error
@Etotal
@w5
= 0.7414 ⇤ 0.1868 ⇤ 0.5933 = 0.0821
wnew
5 = wold
5 ⌘ ⇤
@Etotal
@w5
= 0.4 0.5 ⇤ 0.0821 = 0.3589
72. Convolutional Neural Network
Convolutional layer works as a
filter applied to the original image
There are many filters in the
convolutional layer, they detect
different patterns
4 filters
73. Convolutional Neural Network
4 filters
Each filter applied to all possible
2x2 patches of the original image
produces one output value
74. Convolutional Neural Network
4 filters
Each filter applied to all possible
2x2 patches of the original image
produces one output value
75. Convolutional Neural Network
4 filters
Each filter applied to all possible
2x2 patches of the original image
produces one output value
76. Convolutional Neural Network
4 filters
Each filter applied to all possible
2x2 patches of the original image
produces one output value
77. Convolutional Neural Network
Each filter applied to all possible
2x2 patches of the original image
produces one output value
Repeat this process for all filters in
this layer
78. Convolutional Neural Network
Each filter applied to all possible
2x2 patches of the original image
produces one output value
Repeat this process for all filters in
this layer and the next
79. Flattening
The output of the last
convolutional layer is flattened into
a single vector (like we did with
images)
Convolutional Neural Network
80. Flattening
The output of the last
convolutional layer is flattened into
a single vector (like we did with
images)
Convolutional Neural Network
0
1
2
7
8
9
This vector is fed into fully
connected layer with as many
neutrons as possible classes
81. Flattening
The output of the last
convolutional layer is flattened into
a single vector (like we did with
images)
Convolutional Neural Network
0
1
2
8
9
This vector is fed into fully
connected layer with as many
neutrons as possible classes
Each neuron
outputs
probabilities
7
90. Now that we are deep...
• Powerful function approximation
• Instead of hand-crafted features, let the algorithm
build the relevant features for your problem
• More representational power for learning
103. References
• Machine Learning by Andrew Ng (https://www.coursera.org/learn/machine-
learning)
• Introduction to Machine Learning by Pascal Vincent given at Deep Learning
Summer School, Montreal 2015 (http://videolectures.net/
deeplearning2015_vincent_machine_learning/)
• Welcome to Machine Learning by Konstantin Tretyakov delivered at AACIMP
Summer School 2015 (http://kt.era.ee/lectures/aacimp2015/1-intro.pdf)
• Stanford CS class: Convolutional Neural Networks for Visual Recognition by
Andrej Karpathy (http://cs231n.github.io/)
• Data Mining Course by Jaak Vilo at University of Tartu (https://courses.cs.ut.ee/
MTAT.03.183/2017_spring/uploads/Main/DM_05_Clustering.pdf)
• Machine Learning Essential Conepts by Ilya Kuzovkin (https://
www.slideshare.net/iljakuzovkin)
• From the brain to deep learning and back by Raul Vicente Zafra and Ilya
Kuzovkin (http://www.uttv.ee/naita?id=23585&keel=eng)