SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
Parveen Malik
Assistant Professor
KIIT University
Neural Networks
Backpropagation
Background
• Perceptron Learning Algorithm , Hebbian Learning can classify input pattern if input
patterns are linearly separable.
• We need an algorithm which can train multilayer of perceptron or classify patterns
which are not linearly separable.
• Algorithm should also be able to use non-linear activation function.
𝒙𝟏
𝒙𝟐
𝑪𝒍𝒂𝒔𝒔 𝟏
𝑪𝒍𝒂𝒔𝒔 𝟐
𝑳𝒊𝒏𝒆𝒂𝒓𝒍𝒚 𝑺𝒆𝒑𝒆𝒓𝒂𝒃𝒍𝒆
𝒙𝟏
𝒙𝟐
𝑪𝒍𝒂𝒔𝒔 𝟏
𝑪𝒍𝒂𝒔𝒔 𝟐
𝑳𝒊𝒏𝒆𝒂𝒓𝒍𝒚 𝑵𝒐𝒏 − 𝒔𝒆𝒑𝒆𝒓𝒂𝒃𝒍𝒆
• Need non-linear boundaries
• Perceptron algorithm can't be used
• Variation of GD rule is used.
• More layers are required
• Non-linear activation function required
Perceptron Algorithm −
𝑾𝒊+𝟏
𝑵𝒆𝒘
= 𝑾𝒊
𝑵𝒆𝒘
+ (𝒕 − 𝒂)𝒙𝒊
Gradient Descent Algorithm-
𝑾𝒊+𝟏
𝑵𝒆𝒘
= 𝑾𝒊
𝒐𝒍𝒅
− 𝜼
𝝏𝑳
𝝏𝒘𝒊
𝑳𝒐𝒔𝒔 𝒇𝒖𝒏𝒄𝒕𝒊𝒐𝒏, 𝑳 =
𝟏
𝟐
𝒕 − 𝒂 𝟐
Background- Back Propagation
• The perceptron learning rule of Frank Rosenblatt and the LMS algorithm of Bernard Widrow and
Marcian Hoff were designed to train single-layer perceptron-like networks.
• Single-layer networks suffer from the disadvantage that they are only able to solve linearly separable
classification problems. Both Rosenblatt and Widrow were aware of these limitations and proposed
multilayer networks that could overcome them, but they were not able to generalize their algorithms
to train these more powerful networks.
• First description of an algorithm to train multilayer networks was contained in the thesis of Paul
Werbos in 1974 .This thesis presented the algorithm in the context of general networks, with neural
networks as a special case, and was not disseminated in the neural network community.
• It was not until the mid 1980s that the backpropagation algorithm was rediscovered and widely
publicized. It was rediscovered independently by David Rumelhart, Geoffrey Hinton and Ronald
Williams 1986, David Parker 1985 and Yann Le Cun 1985.
• The algorithm was popularized by its inclusion in the book Parallel Distributed Processing [RuMc86],
which described the work of the Parallel Distributed Processing Group led by psychologists David
Rumelhart and James Mc-Clelland
• The multilayer perceptron, trained by the backpropagation algorithm, is currently the most widely
used neural network.
Network Design
Problem : Whether you watch a movie or not ?
Step 1 : Design – Output can be Yes (1) or No (0). Therefore one neuron or perceptron is
sufficient.
Step -2 : Choose suitable activation function in the output along with a rule to update the
weights. (Hard Limit function for perceptron learning algorithm, sigmoid for the
Widrow-Hoff rule or delta rule.)
𝑾𝒊+𝟏
𝑵𝒆𝒘
= 𝑾𝒊
𝒐𝒍𝒅
− 𝜼
𝝏𝑳
𝝏𝒘𝒊
𝑳 =
𝟏
𝟐
𝒚 − ෝ
𝒚 𝟐
=
𝟏
𝟐
𝒚 − 𝒇 𝒘𝒙 + 𝒃
𝟐
𝝏𝑳
𝝏𝒘
= 𝟐 ∗
𝟏
𝟐
𝒚 − 𝒇 𝒘𝒙 + 𝒃
𝝏𝒇 𝒘𝒙 + 𝒃
𝝏𝒘
= − 𝒚 − ෝ
𝒚 𝒇′ 𝒘𝒙 + 𝒃 𝒙
𝑥 ෍ 𝑓
Director
or
Actor
or
Genre
or
IMDB
w
Yes (1)
or
No (0)
𝑤𝑥 + 𝑏
ෝ
𝒚 = 𝒇 𝒘𝒙 + 𝒃 =
𝟏
𝟏 + 𝒆−𝒘𝒙+𝒃
𝒇 𝒘𝒙 + 𝒃
𝑤0 = 𝑏
1
Network Design
Problem : Sort the students in the 4 house based on their three qualities like lineage,
choice and ethics ?
Step 1 : Design – Here, the input vector is 3-D i.e for each students, 𝑆𝑡𝑢𝑑𝑒𝑛𝑡 1 =
𝐿1
𝐶1
𝐸1
,
𝑆𝑡𝑢𝑑𝑒𝑛𝑡 2 =
𝐿2
𝐶2
𝐸2
𝒙𝟏
𝒙𝟐
𝒙𝟑
N
𝒙𝟏
𝒙𝟐
𝒙𝟑
𝒙𝟎=1
𝒘𝟏
𝒘𝟐
𝒘𝟑
𝒘𝟎 = 𝒃
Yes (1)
or
No (0)
𝑁1
𝑁2
ෝ
𝒚𝟏 = 𝒇 𝒘𝟏𝟏𝒙𝟏 + 𝒘𝟏𝟐𝒙𝟐 + 𝒘𝟏𝟑𝒙𝟑 + 𝒃𝟏
𝒙𝟏
𝒙𝟐
𝒙𝟑
𝒘𝟏𝟏
𝒘𝟏𝟐
𝒘𝟏𝟑
𝒘𝟐𝟏
𝒘𝟐𝟐
𝒘𝟐𝟑
ෝ
𝒚𝟐 = 𝒇 𝒘𝟐𝟏𝒙𝟏 + 𝒘𝟐𝟐𝒙𝟐 + 𝒘𝟐𝟑𝒙𝟑 + 𝒃𝟐
𝒃𝟏
𝒃𝟐
ො
𝑦1
ො
𝑦2
0
1
1
0
1
1
0
0
A B C D
Houses
𝑦1
𝑦2
Actual Output
Target Output
Network Design
Step 2 : Choosing the activation function and rule to update weights
Loss function, 𝐿 =
1
2
𝑦 − ො
𝑦 2
1
1
𝑁1
𝑁2
ෝ
𝒚𝟏 = 𝒇 𝒘𝟏𝟏𝒙𝟏 + 𝒘𝟏𝟐𝒙𝟐 + 𝒘𝟏𝟑𝒙𝟑 + 𝒃𝟏
𝒙𝟏
𝒙𝟐
𝒙𝟑
𝒘𝟏𝟏
𝒘𝟏𝟐
𝒘𝟏𝟑
𝒘𝟐𝟏
𝒘𝟐𝟐
𝒘𝟐𝟑
ෝ
𝒚𝟐 = 𝒇 𝒘𝟐𝟏𝒙𝟏 + 𝒘𝟐𝟐𝒙𝟐 + 𝒘𝟐𝟑𝒙𝟑 + 𝒃𝟐
𝒃𝟏
𝒃𝟐
ො
𝑦1
ො
𝑦2
0
1
1
0
0
0
A B C D
Houses
𝑦1
𝑦2
Actual Output
Target Output
𝑾𝒊𝒋 𝒕 + 𝟏 = 𝑾𝒊𝒋 𝒕 − 𝜼
𝝏𝑳
𝝏𝒘𝒊𝒋
𝝏𝑳
𝝏𝒘𝟏𝟏
= 𝒚 − ෝ
𝒚 𝒇′ 𝒘𝟏𝟏𝒙𝟏 + +𝒘𝟏𝟐𝒙𝟐 + 𝒘𝟏𝟑𝒙𝟑 + 𝒃𝟏 𝒙𝟏
Network Architectures (Complex)
𝒙𝟏
𝒉𝟏
𝒙𝒏
𝒙𝟐
𝒙𝒊
𝒉𝟐
𝒉𝒎
𝒉𝒋
𝒚𝟏
𝒚𝟐
𝒚𝒍
𝒚𝒌
⋮
Input Layer Hidden Layer Output Layer
𝑾(𝟏)
𝑾(𝟐)
⋮
⋮
⋮
⋮
⋮
Network Architectures (More Complex)
𝑥1 𝑥2 𝑥2 𝑥2
ℎ2
(1)
ℎ1
(1)
ℎ3
(1)
ℎ1
(2)
ℎ2
(2)
ℎ3
(2)
𝑦1 𝑦2
𝑾(𝟏)
𝑾(𝟐)
𝑾(𝟑)
Input
𝝏𝑳
𝝏𝑾𝒊𝒍
= 𝜹𝒊𝒁𝒍
𝜹𝒊 = 𝝈′
𝒂𝒊 ෍
𝑱
𝜹𝒋𝑾𝒋𝒊
𝑎𝑙 𝑧𝑙
𝑎𝑖 𝑧𝑖
𝑎𝑗 𝑧𝑗
⋮
⋮
⋮
⋮
⋮
⋮
𝑾𝒊𝒍
𝑾𝒋𝒊
𝜹𝒊 =
𝝏𝑳
𝝏𝒂𝒊
𝜹𝒋 =
𝝏𝑳
𝝏𝒂𝒋
Cost Function
𝑳 =
𝟏
𝟐
(𝒚 − ෝ
𝒚) 𝟐
Error to
input layer
𝝈 𝒂𝒊 𝟏 − 𝝈 𝒂𝒊
Back-propagation Algorithm (Generalized Expression)
𝜹𝒋 =
𝝏𝑳
𝝏𝒂𝒋
= − 𝒚 − ෝ
𝒚
𝝏ෝ
𝒚
𝝏𝒂𝒋
𝒂𝒊 = ෍
𝒍
𝑾𝒊𝒍𝒁𝒍
𝜹𝒊 =
𝝏𝑳
𝝏𝒂𝒊
= ෍
𝑱
𝝏𝑳
𝝏𝒂𝒋
𝝏𝒂𝒋
𝝏𝒂𝒊
𝝏𝒂𝒋
𝝏𝒂𝒊
=
𝝏𝒂𝒋
𝝏𝒁𝒊
𝝏𝒁𝒊
𝝏𝒂𝒊
= 𝑾𝒋𝒊𝝈′ 𝒂𝒊
Back-propagation Algorithm
𝒙𝟏
𝒙𝟐
ෝ
𝒚
𝑎1 𝜎 𝑎1
𝑎2 𝜎 𝑎2
𝑏1 𝜎 𝑏1
1
1
1
0.6
0.4
-0.1
0.5
-0.3
0.3
0.4
0.1 -0.2
Back-propagation Algorithm
𝒙𝟏
𝒙𝟐
ෝ
𝒚
𝑎1 𝜎 𝑎1
𝑎2 𝜎 𝑎2
𝑏1 𝜎 𝑏1
𝑻𝒂𝒓𝒈𝒆𝒕
𝒚 = 𝟏
1
1
1
0
1
0.6
0.4
-0.1
0.5
-0.3
0.3
0.4
0.1 -0.2
𝑾𝟏𝟏
(𝟏)
𝑾𝟐𝟏
(𝟏)
𝑾𝟏𝟐
(𝟏)
𝑾𝟏𝟎
(𝟏)
𝑾𝟏𝟐
(𝟏)
𝑾𝟐𝟎
(𝟏)
𝑾𝟏𝟐
(𝟐)
𝑾𝟏𝟎
(𝟐)
𝑾𝟏𝟏
(𝟐)
Back-propagation Algorithm
𝒙𝟏
𝒙𝟐
ෝ
𝒚
𝑎1 𝜎 𝑎1
𝑎2 𝜎 𝑎2
𝑏1 𝜎 𝑏1
𝑻𝒂𝒓𝒈𝒆𝒕
𝒚 = 𝟏
1
1
1
0
1
0.6
0.4
-0.1
0.5
-0.3
0.3
0.4
0.1 -0.2
𝑾𝟏𝟏
(𝟏)
𝑾𝟐𝟏
(𝟏)
𝑾𝟏𝟐
(𝟏)
𝑾𝟏𝟎
(𝟏)
𝑾𝟏𝟐
(𝟏)
𝑾𝟐𝟎
(𝟏)
𝑾𝟏𝟐
(𝟐)
𝑾𝟏𝟎
(𝟐)
𝑾𝟏𝟏
(𝟐)
𝑳 =
𝟏
𝟐
𝒚 − ෝ
𝒚 𝟐
Loss function
Back-propagation Algorithm
𝒙𝟏
𝒙𝟐
ෝ
𝒚
𝑎1 𝜎 𝑎1
𝑎2 𝜎 𝑎2
𝑏1 𝜎 𝑏1
𝑻𝒂𝒓𝒈𝒆𝒕
𝒚 = 𝟏
1
1
1
0
1
0.6
0.4
-0.1
0.5
-0.3
0.3
0.4
0.1 -0.2
𝑾𝟏𝟏
(𝟏)
𝑾𝟐𝟏
(𝟏)
𝑾𝟏𝟐
(𝟏)
𝑾𝟏𝟎
(𝟏)
𝑾𝟏𝟐
(𝟏)
𝑾𝟐𝟎
(𝟏)
𝑾𝟏𝟐
(𝟐)
𝑾𝟏𝟎
(𝟐)
𝑾𝟏𝟏
(𝟐)
𝑳 =
𝟏
𝟐
𝒚 − ෝ
𝒚 𝟐
Loss function
𝑾𝑵𝒆𝒘
= 𝑾𝒐𝒍𝒅
− 𝜼
𝝏𝑳
𝝏𝑾𝒐𝒍𝒅
Back-propagation Algorithm
Step 1 : Forward pass
𝒙𝟏
𝒙𝟐
𝑎1 𝜎 𝑎1
𝒂𝟏=0.2
𝑎2 𝜎 𝑎2
𝑏1 𝜎 𝑏1
𝑻𝒂𝒓𝒈𝒆𝒕
𝒚 = 𝟏
1
1
1
0
1
0.6
0.4
-0.1
0.5
-0.3
0.3
0.4
0.1 -0.2
𝑾𝟏𝟏
(𝟏)
𝑾𝟐𝟏
(𝟏)
𝑾𝟏𝟐
(𝟏)
𝑾𝟏𝟎
(𝟏)
𝑊
12
(1)
𝑾𝟐𝟎
(𝟏)
𝑾𝟏𝟐
(𝟐)
𝑾𝟏𝟎
(𝟐)
𝑾𝟏𝟏
(𝟐)
𝑳 =
𝟏
𝟐
𝒚 − ෝ
𝒚 𝟐
Loss function
𝝈 𝒂𝟏 =
𝟏
𝟏 + 𝒆−𝟎.𝟐
= 𝟎. 𝟓𝟒𝟗𝟖
𝑾𝑵𝒆𝒘 = 𝑾𝒐𝒍𝒅 − 𝜼
𝝏𝑳
𝝏𝑾𝒐𝒍𝒅
Back-propagation Algorithm
Step 1 : Forward pass
𝒙𝟏
𝒙𝟐
𝑎1 𝜎 𝑎1
𝒂𝟏=0.2
𝒂𝟐=0.9
𝑎2 𝜎 𝑎2
𝑏1 𝜎 𝑏1
𝑻𝒂𝒓𝒈𝒆𝒕
𝒚 = 𝟏
1
1
1
0
1
0.6
0.4
-0.1
0.5
-0.3
0.3
0.4
0.1 -0.2
𝑾𝟏𝟏
(𝟏)
𝑾𝟐𝟏
(𝟏)
𝑾𝟏𝟐
(𝟏)
𝑾𝟏𝟎
(𝟏)
𝑾𝟏𝟐
(𝟏)
𝑾𝟐𝟎
(𝟏)
𝑾𝟏𝟐
(𝟐)
𝑾𝟏𝟎
(𝟐)
𝑾𝟏𝟏
(𝟐)
𝑳 =
𝟏
𝟐
𝒚 − ෝ
𝒚 𝟐
Loss function
𝝈 𝒂𝟏 =
𝟏
𝟏 + 𝒆−𝟎.𝟐
= 𝟎. 𝟓𝟒𝟗𝟖
𝝈 𝒂𝟐 =
𝟏
𝟏 + 𝒆−𝟎.𝟗
= 𝟎. 𝟕𝟏𝟎𝟗
ෝ
𝒚
𝑾𝑵𝒆𝒘
= 𝑾𝒐𝒍𝒅
− 𝜼
𝝏𝑳
𝝏𝑾𝒐𝒍𝒅
Back-propagation Algorithm
Step 1 : Forward pass
𝒙𝟏
𝒙𝟐
𝑎1 𝜎 𝑎1
𝒂𝟏=0.2
𝒂𝟐=0.9
𝒃𝟏=0.09101
𝑎2 𝜎 𝑎2
𝑏1 𝜎 𝑏1
ෝ
𝒚 = 𝝈 𝒃𝟏 =0.5227
𝑻𝒂𝒓𝒈𝒆𝒕
𝒚 = 𝟏
1
1
1
0
1
0.6
0.4
-0.1
0.5
-0.3
0.3
0.4
0.1 -0.2
𝑾𝟏𝟏
(𝟏)
𝑾𝟐𝟏
(𝟏)
𝑾𝟏𝟐
(𝟏)
𝑾𝟏𝟎
(𝟏)
𝑾𝟏𝟐
(𝟏)
𝑾𝟐𝟎
(𝟏)
𝑾𝟏𝟐
(𝟐)
𝑾𝟏𝟎
(𝟐)
𝑾𝟏𝟏
(𝟐)
𝑳 =
𝟏
𝟐
𝒚 − ෝ
𝒚 𝟐
Loss function
𝝈 𝒂𝟏 =
𝟏
𝟏 + 𝒆−𝟎.𝟐
= 𝟎. 𝟓𝟒𝟗𝟖
𝝈 𝒂𝟐 =
𝟏
𝟏 + 𝒆−𝟎.𝟗
= 𝟎. 𝟕𝟏𝟎𝟗
Back-propagation Algorithm
Step 2 : Backpropagation of error
𝒙𝟏
𝒙𝟐
𝑎1 𝜎 𝑎1
𝒂𝟏=0.2
𝒂𝟐=0.9
𝒃𝟏=0.09101
𝑎2 𝜎 𝑎2
𝑏1 𝜎 𝑏1
ෝ
𝒚 = 𝝈 𝒃𝟏 =0.5227
𝑻𝒂𝒓𝒈𝒆𝒕
𝒚 = 𝟏
1
1
1
0
1
0.6
0.4
-0.1
0.5
-0.3
0.3
0.4
0.1 -0.2
𝑾𝟏𝟏
(𝟏)
𝑾𝟐𝟏
(𝟏)
𝑾𝟏𝟐
(𝟏)
𝑾𝟏𝟎
(𝟏)
𝑾𝟏𝟐
(𝟏)
𝑾𝟐𝟎
(𝟏)
𝑾𝟏𝟐
(𝟐)
𝑾𝟏𝟎
(𝟐)
𝑾𝟏𝟏
(𝟐)
𝑳 =
𝟏
𝟐
𝒚 − ෝ
𝒚 𝟐
Loss function
𝝈 𝒂𝟏 =
𝟏
𝟏 + 𝒆−𝟎.𝟐
= 𝟎. 𝟓𝟒𝟗𝟖
𝝈 𝒂𝟐 =
𝟏
𝟏 + 𝒆−𝟎.𝟗
= 𝟎. 𝟕𝟏𝟎𝟗
𝑎𝑙 𝑧𝑙
𝑎𝑖 𝑧𝑖
𝑎𝑗 𝑧𝑗
⋮
⋮
⋮
⋮
⋮
⋮
𝝏𝑳
𝝏𝑾𝒊𝒍
= 𝜹𝒊𝒁𝒍
𝜹𝒊 = 𝝈′
𝒂𝒊 ෍
𝑱
𝜹𝒋𝑾𝒋𝒊
𝑾𝒊𝒍
𝑾𝒋𝒊
Imagine
Back-propagation Algorithm
𝝏𝑳
𝝏𝑾𝒊𝒍
= 𝜹𝒊𝒁𝒍
𝜹𝒊 = 𝝈′
𝒂𝒊 ෍
𝑱
𝜹𝒋𝑾𝒋𝒊
𝑎𝑙 𝑧𝑙
𝑎𝑖 𝑧𝑖
𝑎𝑗 𝑧𝑗
⋮
⋮
⋮
⋮
⋮
⋮
𝑾𝒊𝒍 𝑾𝒋𝒊
𝜹𝒊=
𝝏𝑳
𝝏𝒂𝒊
𝜹𝒋=
𝝏𝑳
𝝏𝒂𝒋
𝝈′
𝒂𝒊 = 𝝈 𝒂𝒊 𝟏 − 𝝈 𝒂𝒊
Cost/Error Function
𝑳 =
𝟏
𝟐
(𝒚 − ෝ
𝒚) 𝟐
𝒛𝒊 = 𝝈 𝒂𝒊
𝜹𝟏 = 𝝈′
𝒂𝟏 𝜹𝒋𝑾𝟏𝟏
(𝟐)
= 𝝈 𝒂𝟏 𝟏 − 𝝈 𝒂𝟏 𝜹𝒐𝒖𝒕𝑾𝟏𝟏
(𝟐)
= −𝝈 𝒂𝟏 𝟏 − 𝝈 𝒂𝟏 𝒚 − 𝒛𝒐𝒖𝒕 𝝈 𝒃𝒐𝒖𝒕 𝟏 − 𝝈 𝒃𝒐𝒖𝒕 𝑾𝟏𝟏
(𝟐)
𝜹𝒋=
𝝏𝑳
𝝏𝒂𝒋
𝜹𝒐𝒖𝒕 =
𝝏𝑳
𝝏𝒃𝒐𝒖𝒕
= − 𝒚 − ෝ
𝒚
𝝏ෝ
𝒚
𝝏𝒃𝒐𝒖𝒕
= − 𝒚 − 𝒛𝒐𝒖𝒕
𝝏𝒛𝒐𝒖𝒕
𝝏𝒃𝒐𝒖𝒕
= − 𝒚 − 𝒛𝒐𝒖𝒕 𝝈′ 𝒃𝒐𝒖𝒕
= − 𝒚 − 𝒛𝒐𝒖𝒕 𝝈 𝒃𝒐𝒖𝒕 𝟏 − 𝝈 𝒃𝒐𝒖𝒕
𝒙𝟏
𝑎1 𝑧1
𝒃𝒐𝒖𝒕 𝒛𝒐𝒖𝒕
𝑾𝟏𝟏
(𝟐)
𝑾𝟏𝟏
(𝟏)
𝒛𝟏 = 𝝈 𝒂𝟏
𝒛𝒐𝒖𝒕 = 𝝈 𝒃𝒐𝒖𝒕 = ෝ
𝒚
𝒙𝟐
𝑎2 𝑧2
𝑾𝟏𝟐
(𝟐)
𝑾𝟐𝟐
(𝟏)
𝒛𝟐 = 𝝈 𝒂𝟐
𝜹𝟐 = 𝝈′ 𝒂𝟐 𝜹𝒋𝑾𝟏𝟐
(𝟐)
= 𝝈 𝒂𝟐 𝟏 − 𝝈 𝒂𝟐 𝜹𝒐𝒖𝒕𝑾𝟏𝟐
(𝟐)
= −𝝈 𝒂𝟐 𝟏 − 𝝈 𝒂𝟐 𝒚 − 𝒛𝒐𝒖𝒕 𝝈 𝒃𝒐𝒖𝒕 𝟏 − 𝝈 𝒃𝒐𝒖𝒕 𝑾𝟏𝟐
(𝟐)
Back-propagation Algorithm - error propagation (Update of Layer 1 weights)
0.5227
0.09101
𝟎. 𝟕𝟏𝟎𝟗
0.9
𝟎. 𝟓𝟒𝟗𝟖
0.2
𝜹𝟏 = 𝝈′
𝒂𝟏 𝜹𝒋𝑾𝟏𝟏
(𝟐)
= 𝝈 𝒂𝟏 𝟏 − 𝝈 𝒂𝟏 𝜹𝒐𝒖𝒕𝑾𝟏𝟏
(𝟐)
= −𝝈 𝒂𝟏 𝟏 − 𝝈 𝒂𝟏 𝒚 − 𝒛𝒐𝒖𝒕 𝝈 𝒃𝒐𝒖𝒕 𝟏 − 𝝈 𝒃𝒐𝒖𝒕 𝑾𝟏𝟏
(𝟐)
𝜹𝒋=
𝝏𝑳
𝝏𝒂𝒋
𝜹𝒐𝒖𝒕 =
𝝏𝑳
𝝏𝒃𝒐𝒖𝒕
= − 𝒚 − ෝ
𝒚
𝝏ෝ
𝒚
𝝏𝒃𝒐𝒖𝒕
= − 𝒚 − 𝒛𝒐𝒖𝒕
𝝏𝒛𝒐𝒖𝒕
𝝏𝒃𝒐𝒖𝒕
= − 𝒚 − 𝒛𝒐𝒖𝒕 𝝈′
𝒃𝒐𝒖𝒕
= − 𝒚 − 𝒛𝒐𝒖𝒕 𝝈 𝒃𝒐𝒖𝒕 𝟏 − 𝝈 𝒃𝒐𝒖𝒕
𝒙𝟏
𝑎1 𝑧1
𝒃𝒐𝒖𝒕 𝒛𝒐𝒖𝒕
𝑾𝟏𝟏
(𝟐)
𝑾𝟏𝟏
(𝟏)
𝒛𝟏 = 𝝈 𝒂𝟏
𝒛𝒐𝒖𝒕 = 𝝈 𝒃𝒐𝒖𝒕 = ෝ
𝒚
𝑾𝟏𝟐
(𝟏)
𝑾𝟐𝟐
(𝟏) 𝑾𝟏𝟎
(𝟏)
𝑾𝟐𝟎
(𝟏)
𝑾𝟏𝟎
(𝟐)
𝒙𝟐
𝑎2 𝑧2
𝑾𝟏𝟐
(𝟐)
𝑾𝟐𝟐
(𝟏)
𝒛𝟐 = 𝝈 𝒂𝟐
𝜹𝟐 = 𝝈′
𝒂𝟐 𝜹𝒋𝑾𝟏𝟐
(𝟐)
= 𝝈 𝒂𝟐 𝟏 − 𝝈 𝒂𝟐 𝜹𝒐𝒖𝒕𝑾𝟏𝟐
(𝟐)
= −𝝈 𝒂𝟐 𝟏 − 𝝈 𝒂𝟐 𝒚 − 𝒛𝒐𝒖𝒕 𝝈 𝒃𝒐𝒖𝒕 𝟏 − 𝝈 𝒃𝒐𝒖𝒕 𝑾𝟏𝟐
(𝟐)
0.4
0.1
𝜹𝟏 = − 𝟎. 𝟓𝟒𝟗𝟖 𝟏 − 𝟎. 𝟓𝟒𝟗𝟖 𝟏 − 𝟎. 𝟓𝟐𝟐𝟕 𝟎. 𝟓𝟐𝟐𝟕 𝟏 − 𝟎. 𝟓𝟐𝟐𝟕 𝟎. 𝟒 = −𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗
𝜹𝟐 = − 𝟎. 𝟕𝟏𝟎𝟗 𝟏 − 𝟎. 𝟕𝟏𝟎𝟗 𝟏 − 𝟎. 𝟓𝟐𝟐𝟕 𝟎. 𝟓𝟐𝟐𝟕 𝟏 − 𝟎. 𝟓𝟐𝟐𝟕 𝟎. 𝟏 = −𝟎. 𝟎𝟎𝟐𝟒𝟒𝟕
0
1
𝒚 = 𝟏
1
1
1
Back-propagation Algorithm (Update of Layer 1 weights)
𝑾𝒊𝒋
𝑵𝒆𝒘
= 𝑾𝒊𝒋
𝒐𝒍𝒅
− 𝜼
𝝏𝑳
𝝏𝑾𝒊𝒋
𝒐𝒍𝒅
𝜹𝟏 = −𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗
𝜹𝟐 = −𝟎. 𝟎𝟎𝟐𝟒𝟒𝟕
𝜼 = 𝟎. 𝟐𝟓
0.5227
0.09101
0.9
𝟎. 𝟕𝟏𝟎𝟗
𝟎. 𝟓𝟒𝟗𝟖
0.2
𝒙𝟏
𝑎1 𝑧1
𝒃𝒐𝒖𝒕 𝒛𝒐𝒖𝒕
𝑾𝟏𝟏
(𝟐)
𝑾𝟏𝟏
(𝟏)
𝒛𝟏 = 𝝈 𝒂𝟏
𝒛𝒐𝒖𝒕 = 𝝈 𝒃𝒐𝒖𝒕 = ෝ
𝒚
𝑾𝟏𝟐
(𝟏)
𝑾𝟐𝟏
(𝟏)
𝑾𝟏𝟎
(𝟏)
1
𝑾𝟐𝟎
(𝟏)
1
1
𝒙𝟐
𝑎2 𝑧2
𝑾𝟏𝟐
(𝟐)
𝑾𝟐𝟐
(𝟏)
𝒛𝟐 = 𝝈 𝒂𝟐
0.6
0.4
0.4
0.1
-0.1
-0.3
0
1
𝒚 = 𝟏
0.3
0.5
-0.2
𝝏𝑳
𝝏𝑾𝟏𝟏
(𝟏)
= 𝜹𝟏𝒙𝟏 = −𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗 × 𝟎 = 𝟎
𝝏𝑳
𝝏𝑾𝟏𝟐
(𝟏)
= 𝜹𝟏𝒙𝟐 = −𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗 × 𝟏 = −𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗
𝝏𝑳
𝝏𝑾𝟏𝟎
(𝟏)
= 𝜹𝟏𝒙𝟎 = −𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗 × 𝟏 = −𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗
𝝏𝑳
𝝏𝑾𝟐𝟏
(𝟏)
= 𝜹𝟐𝒙𝟏 = −𝟎. 𝟎𝟎𝟐𝟒𝟒𝟕 × 𝟎 = 𝟎
𝝏𝑳
𝝏𝑾𝟐𝟐
(𝟏) = 𝜹𝟐𝒙𝟐 = −𝟎. 𝟎𝟎𝟐𝟒𝟒𝟕 × 𝟏 = −𝟎. 𝟎02447
𝝏𝑳
𝝏𝑾𝟐𝟎
(𝟏) = 𝜹𝟐𝒙𝟎 = −𝟎. 𝟎𝟎𝟐𝟒𝟒𝟕 × 𝟏 = −𝟎. 𝟎𝟎2447
𝑾𝟏𝟏
𝟏
𝒕 + 𝟏 = 𝑾𝟏𝟏
𝟏
𝒕 − 𝟎. 𝟐𝟓
𝝏𝑳
𝝏𝑾𝟏𝟏
𝟏
𝒕
= 𝟎. 𝟔 − 𝟎. 𝟐𝟓 × 𝟎 = 𝟎. 𝟔
𝑾𝟏𝟐
𝟏
𝒕 + 𝟏 = 𝑾𝟏𝟐
𝟏
𝒕 − 𝟎. 𝟐𝟓
𝝏𝑳
𝝏𝑾𝟏𝟐
𝟏
𝒕
= −𝟎. 𝟏 + 𝟎. 𝟐𝟓 × 𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗 = −𝟎. . 𝟎𝟗𝟕𝟎𝟓
𝑾𝟏𝟎
𝟏
𝒕 + 𝟏 = 𝑾𝟏𝟎
𝟏
𝒕 − 𝟎. 𝟐𝟓
𝝏𝑳
𝝏𝑾𝟏𝟎
𝟏
𝒕
= 𝟎. 𝟑 + 𝟎. 𝟐𝟓 × 𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗 = 𝟎. 𝟑𝟎𝟐𝟗𝟓
𝑾𝟐𝟏
𝟏
𝒕 + 𝟏 = 𝑾𝟐𝟏
𝟏
𝒕 − 𝟎. 𝟐𝟓
𝝏𝑳
𝝏𝑾𝟐𝟏
𝟏
𝒕
= −𝟎. 𝟑 − 𝟎. 𝟐𝟓 × 𝟎 = −𝟎. 𝟑
𝑾𝟐𝟐
𝟏
𝒕 + 𝟏 = 𝑾𝟐𝟐
𝟏
𝒕 − 𝟎. 𝟐𝟓
𝝏𝑳
𝝏𝑾𝟐𝟐
𝟏
𝒕
= 𝟎. 𝟒 + 𝟎. 𝟐𝟓 × 𝟎. 𝟎𝟎𝟐𝟒𝟒𝟕 = 𝟎.4006125
𝑾𝟐𝟎
𝟏
𝒕 + 𝟏 = 𝑾𝟐𝟎
𝟏
𝒕 − 𝟎. 𝟐𝟓
𝝏𝑳
𝝏𝑾𝟐𝟎
𝟏
𝒕
= 𝟎. 𝟓 + 𝟎. 𝟐𝟓 × 𝟎. 𝟎𝟎𝟐𝟒𝟒𝟕 = 𝟎. 𝟓𝟎𝟎𝟔𝟏𝟐𝟓
𝑾𝟏𝟎
(𝟐)
Back-propagation Algorithm (Update of layer 2 Weights)
𝑾𝒊𝒋
𝑵𝒆𝒘
= 𝑾𝒊𝒋
𝒐𝒍𝒅
− 𝜼
𝝏𝑳
𝝏𝑾𝒊𝒋
𝒐𝒍𝒅
𝜹𝟏 = −𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗
𝜹𝟐 = −𝟎. 𝟎𝟎𝟐𝟒𝟒𝟕
0.5227
0.09101
𝟎. 𝟕𝟏𝟎𝟗
0.9
0.2 𝟎. 𝟓𝟒𝟗𝟖
𝒙𝟏
𝑎1 𝑧1
𝒃𝒐𝒖𝒕 𝒛𝒐𝒖𝒕
𝑾𝟏𝟏
(𝟐)
𝑾𝟏𝟏
(𝟏)
𝒛𝟏 = 𝝈 𝒂𝟏
𝒛𝒐𝒖𝒕 = 𝝈 𝒃𝒐𝒖𝒕 = ෝ
𝒚
𝑾𝟏𝟐
(𝟏)
𝑾𝟐𝟏
(𝟏)
𝑾𝟏𝟎
(𝟏)
1
𝑾𝟐𝟎
(𝟏)
1
1
𝑾𝟏𝟎
(𝟐)
𝒙𝟐
𝑎2 𝑧2
𝑾𝟏𝟐
(𝟐)
𝑾𝟐𝟐
(𝟏)
𝒛𝟐 = 𝝈 𝒂𝟐
0.6
0.4
0.4
0.1
-0.1
-0.3
0
1
𝒚 = 𝟏
0.3
0.5
-0.2
𝒃𝒐𝒖𝒕 = 𝒛𝟏𝑾𝟏𝟏
(𝟐)
+ 𝒛𝟐𝑾𝟏𝟐
(𝟐)
+ 𝑾𝟏𝟎
(𝟐)
, 𝒛𝒐𝒖𝒕 = 𝝈 𝒃𝒐𝒖𝒕 = ෝ
𝒚
𝝏𝑳
𝝏𝑾𝟏𝟏
(𝟐)
= − 𝒚 − ෝ
𝒚
𝝏ෝ
𝒚
𝝏𝑾𝟏𝟏
𝟐
= − 𝒚 − ෝ
𝒚 𝝈′
𝒃𝒐𝒖𝒕 𝒛𝟏 = 𝜹𝒐𝒖𝒕𝒛𝟏
𝝏𝑳
𝝏𝑾𝟏𝟐
(𝟐)
= − 𝒚 − ෝ
𝒚
𝝏ෝ
𝒚
𝝏𝑾𝟏𝟐
𝟐
= − 𝒚 − ෝ
𝒚 𝝈′
𝒃𝒐𝒖𝒕 𝒛𝟐 = 𝜹𝒐𝒖𝒕𝒛𝟐
𝝏𝑳
𝝏𝑾𝟏𝟎
(𝟐)
= − 𝒚 − ෝ
𝒚
𝝏ෝ
𝒚
𝝏𝑾𝟏𝟎
𝟐
= − 𝒚 − ෝ
𝒚 𝝈′
𝒃𝒐𝒖𝒕 = 𝜹𝒐𝒖𝒕
𝝈′
𝒃𝒐𝒖𝒕 = 𝝈 𝒃𝒐𝒖𝒕 𝟏 − 𝝈 𝒃𝒐𝒖𝒕
𝜹𝑶𝒖𝒕 = − 𝒚 − ෝ
𝒚 𝝈 𝒃𝒐𝒖𝒕 𝟏 − 𝝈 𝒃𝒐𝒖𝒕
= − 𝟏 − 𝟎. 𝟓𝟐𝟐𝟕 𝟎. 𝟓𝟐𝟐𝟕 𝟏 − 𝟎. 𝟓𝟐𝟐𝟕
= −𝟎. 𝟏𝟏𝟗𝟎𝟖
𝑾𝟏𝟏
𝟐
𝒕 + 𝟏 = 𝑾𝟏𝟏
𝟐
𝒕 − 𝜼
𝝏𝑳
𝝏𝑾𝟏𝟏
𝟐
= 𝟎. 𝟒 + 𝟎. 𝟐𝟓 × 𝟎. 𝟏𝟏𝟗𝟎𝟖 × 𝟎. 𝟓𝟒𝟗𝟖 = 𝟎. 𝟒𝟏𝟔𝟒
𝑾𝟏𝟐
𝟐
𝒕 + 𝟏 = 𝑾𝟏𝟐
𝟐
𝒕 − 𝜼
𝝏𝑳
𝝏𝑾𝟏𝟐
𝟐
= 𝟎. 𝟏 + 𝟎. 𝟐𝟓 × 𝟎. 𝟏𝟏𝟗𝟎𝟖 × 𝟎. 𝟕𝟏𝟎𝟗 = 𝟎. 𝟏𝟐𝟏𝟐
𝑾𝟏𝟎
𝟐
𝒕 + 𝟏 = 𝑾𝟏𝟎
𝟐
𝒕 − 𝜼
𝝏𝑳
𝝏𝑾𝟏𝟎
𝟐
= −𝟎. 𝟐 + 𝟎. 𝟐𝟓 × 𝟎. 𝟏𝟏𝟗𝟎𝟖 = −𝟎. 𝟏𝟕𝟎𝟐𝟑

Contenu connexe

Tendances

Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Simplilearn
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Simplilearn
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
Francesco Collova'
 

Tendances (20)

Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rules
 
Perceptron & Neural Networks
Perceptron & Neural NetworksPerceptron & Neural Networks
Perceptron & Neural Networks
 
Adaline madaline
Adaline madalineAdaline madaline
Adaline madaline
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNN
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descent
 
Back propagation
Back propagationBack propagation
Back propagation
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptx
 
Mc Culloch Pitts Neuron
Mc Culloch Pitts NeuronMc Culloch Pitts Neuron
Mc Culloch Pitts Neuron
 
Intro to Neural Networks
Intro to Neural NetworksIntro to Neural Networks
Intro to Neural Networks
 
Artificial Neural Network Topology
Artificial Neural Network TopologyArtificial Neural Network Topology
Artificial Neural Network Topology
 
Introduction to artificial neural network
Introduction to artificial neural networkIntroduction to artificial neural network
Introduction to artificial neural network
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
 

Similaire à Lecture 5 backpropagation

14th_Class_19-03-2024 Control systems.pptx
14th_Class_19-03-2024 Control systems.pptx14th_Class_19-03-2024 Control systems.pptx
14th_Class_19-03-2024 Control systems.pptx
buttshaheemsoci77
 

Similaire à Lecture 5 backpropagation (20)

04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 
Solving Poisson Equation using Conjugate Gradient Method and its implementation
Solving Poisson Equation using Conjugate Gradient Methodand its implementationSolving Poisson Equation using Conjugate Gradient Methodand its implementation
Solving Poisson Equation using Conjugate Gradient Method and its implementation
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Vibration Isolation of a LEGO® plate
Vibration Isolation of a LEGO® plateVibration Isolation of a LEGO® plate
Vibration Isolation of a LEGO® plate
 
Passivity-based control of rigid-body manipulator
Passivity-based control of rigid-body manipulatorPassivity-based control of rigid-body manipulator
Passivity-based control of rigid-body manipulator
 
Lecture 6 radial basis-function_network
Lecture 6 radial basis-function_networkLecture 6 radial basis-function_network
Lecture 6 radial basis-function_network
 
Stochastic optimal control & rl
Stochastic optimal control & rlStochastic optimal control & rl
Stochastic optimal control & rl
 
Lec10.pptx
Lec10.pptxLec10.pptx
Lec10.pptx
 
تطبيقات المعادلات التفاضلية
تطبيقات المعادلات التفاضليةتطبيقات المعادلات التفاضلية
تطبيقات المعادلات التفاضلية
 
14th_Class_19-03-2024 Control systems.pptx
14th_Class_19-03-2024 Control systems.pptx14th_Class_19-03-2024 Control systems.pptx
14th_Class_19-03-2024 Control systems.pptx
 
MLU_DTE_Lecture_2.pptx
MLU_DTE_Lecture_2.pptxMLU_DTE_Lecture_2.pptx
MLU_DTE_Lecture_2.pptx
 
Lecture Notes: EEEC4340318 Instrumentation and Control Systems - Fundamental...
Lecture Notes:  EEEC4340318 Instrumentation and Control Systems - Fundamental...Lecture Notes:  EEEC4340318 Instrumentation and Control Systems - Fundamental...
Lecture Notes: EEEC4340318 Instrumentation and Control Systems - Fundamental...
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorch
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from Scratch
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep Learning
 
Direct solution of sparse network equations by optimally ordered triangular f...
Direct solution of sparse network equations by optimally ordered triangular f...Direct solution of sparse network equations by optimally ordered triangular f...
Direct solution of sparse network equations by optimally ordered triangular f...
 
DNN and RBM
DNN and RBMDNN and RBM
DNN and RBM
 
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
 

Plus de ParveenMalik18 (10)

Lecture 4 neural networks
Lecture 4 neural networksLecture 4 neural networks
Lecture 4 neural networks
 
Lecture 3 fuzzy inference system
Lecture 3  fuzzy inference systemLecture 3  fuzzy inference system
Lecture 3 fuzzy inference system
 
Lecture 2 fuzzy inference system
Lecture 2  fuzzy inference systemLecture 2  fuzzy inference system
Lecture 2 fuzzy inference system
 
Lecture 1 computational intelligence
Lecture 1  computational intelligenceLecture 1  computational intelligence
Lecture 1 computational intelligence
 
Chapter8
Chapter8Chapter8
Chapter8
 
Chapter6
Chapter6Chapter6
Chapter6
 
Chapter5
Chapter5Chapter5
Chapter5
 
Chapter3
Chapter3Chapter3
Chapter3
 
Chapter2
Chapter2Chapter2
Chapter2
 
Electrical and Electronic Measurement
Electrical and Electronic MeasurementElectrical and Electronic Measurement
Electrical and Electronic Measurement
 

Dernier

AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Dernier (20)

(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 

Lecture 5 backpropagation

  • 1. Parveen Malik Assistant Professor KIIT University Neural Networks Backpropagation
  • 2. Background • Perceptron Learning Algorithm , Hebbian Learning can classify input pattern if input patterns are linearly separable. • We need an algorithm which can train multilayer of perceptron or classify patterns which are not linearly separable. • Algorithm should also be able to use non-linear activation function. 𝒙𝟏 𝒙𝟐 𝑪𝒍𝒂𝒔𝒔 𝟏 𝑪𝒍𝒂𝒔𝒔 𝟐 𝑳𝒊𝒏𝒆𝒂𝒓𝒍𝒚 𝑺𝒆𝒑𝒆𝒓𝒂𝒃𝒍𝒆 𝒙𝟏 𝒙𝟐 𝑪𝒍𝒂𝒔𝒔 𝟏 𝑪𝒍𝒂𝒔𝒔 𝟐 𝑳𝒊𝒏𝒆𝒂𝒓𝒍𝒚 𝑵𝒐𝒏 − 𝒔𝒆𝒑𝒆𝒓𝒂𝒃𝒍𝒆 • Need non-linear boundaries • Perceptron algorithm can't be used • Variation of GD rule is used. • More layers are required • Non-linear activation function required Perceptron Algorithm − 𝑾𝒊+𝟏 𝑵𝒆𝒘 = 𝑾𝒊 𝑵𝒆𝒘 + (𝒕 − 𝒂)𝒙𝒊 Gradient Descent Algorithm- 𝑾𝒊+𝟏 𝑵𝒆𝒘 = 𝑾𝒊 𝒐𝒍𝒅 − 𝜼 𝝏𝑳 𝝏𝒘𝒊 𝑳𝒐𝒔𝒔 𝒇𝒖𝒏𝒄𝒕𝒊𝒐𝒏, 𝑳 = 𝟏 𝟐 𝒕 − 𝒂 𝟐
  • 3. Background- Back Propagation • The perceptron learning rule of Frank Rosenblatt and the LMS algorithm of Bernard Widrow and Marcian Hoff were designed to train single-layer perceptron-like networks. • Single-layer networks suffer from the disadvantage that they are only able to solve linearly separable classification problems. Both Rosenblatt and Widrow were aware of these limitations and proposed multilayer networks that could overcome them, but they were not able to generalize their algorithms to train these more powerful networks. • First description of an algorithm to train multilayer networks was contained in the thesis of Paul Werbos in 1974 .This thesis presented the algorithm in the context of general networks, with neural networks as a special case, and was not disseminated in the neural network community. • It was not until the mid 1980s that the backpropagation algorithm was rediscovered and widely publicized. It was rediscovered independently by David Rumelhart, Geoffrey Hinton and Ronald Williams 1986, David Parker 1985 and Yann Le Cun 1985. • The algorithm was popularized by its inclusion in the book Parallel Distributed Processing [RuMc86], which described the work of the Parallel Distributed Processing Group led by psychologists David Rumelhart and James Mc-Clelland • The multilayer perceptron, trained by the backpropagation algorithm, is currently the most widely used neural network.
  • 4. Network Design Problem : Whether you watch a movie or not ? Step 1 : Design – Output can be Yes (1) or No (0). Therefore one neuron or perceptron is sufficient. Step -2 : Choose suitable activation function in the output along with a rule to update the weights. (Hard Limit function for perceptron learning algorithm, sigmoid for the Widrow-Hoff rule or delta rule.) 𝑾𝒊+𝟏 𝑵𝒆𝒘 = 𝑾𝒊 𝒐𝒍𝒅 − 𝜼 𝝏𝑳 𝝏𝒘𝒊 𝑳 = 𝟏 𝟐 𝒚 − ෝ 𝒚 𝟐 = 𝟏 𝟐 𝒚 − 𝒇 𝒘𝒙 + 𝒃 𝟐 𝝏𝑳 𝝏𝒘 = 𝟐 ∗ 𝟏 𝟐 𝒚 − 𝒇 𝒘𝒙 + 𝒃 𝝏𝒇 𝒘𝒙 + 𝒃 𝝏𝒘 = − 𝒚 − ෝ 𝒚 𝒇′ 𝒘𝒙 + 𝒃 𝒙 𝑥 ෍ 𝑓 Director or Actor or Genre or IMDB w Yes (1) or No (0) 𝑤𝑥 + 𝑏 ෝ 𝒚 = 𝒇 𝒘𝒙 + 𝒃 = 𝟏 𝟏 + 𝒆−𝒘𝒙+𝒃 𝒇 𝒘𝒙 + 𝒃 𝑤0 = 𝑏 1
  • 5. Network Design Problem : Sort the students in the 4 house based on their three qualities like lineage, choice and ethics ? Step 1 : Design – Here, the input vector is 3-D i.e for each students, 𝑆𝑡𝑢𝑑𝑒𝑛𝑡 1 = 𝐿1 𝐶1 𝐸1 , 𝑆𝑡𝑢𝑑𝑒𝑛𝑡 2 = 𝐿2 𝐶2 𝐸2 𝒙𝟏 𝒙𝟐 𝒙𝟑 N 𝒙𝟏 𝒙𝟐 𝒙𝟑 𝒙𝟎=1 𝒘𝟏 𝒘𝟐 𝒘𝟑 𝒘𝟎 = 𝒃 Yes (1) or No (0) 𝑁1 𝑁2 ෝ 𝒚𝟏 = 𝒇 𝒘𝟏𝟏𝒙𝟏 + 𝒘𝟏𝟐𝒙𝟐 + 𝒘𝟏𝟑𝒙𝟑 + 𝒃𝟏 𝒙𝟏 𝒙𝟐 𝒙𝟑 𝒘𝟏𝟏 𝒘𝟏𝟐 𝒘𝟏𝟑 𝒘𝟐𝟏 𝒘𝟐𝟐 𝒘𝟐𝟑 ෝ 𝒚𝟐 = 𝒇 𝒘𝟐𝟏𝒙𝟏 + 𝒘𝟐𝟐𝒙𝟐 + 𝒘𝟐𝟑𝒙𝟑 + 𝒃𝟐 𝒃𝟏 𝒃𝟐 ො 𝑦1 ො 𝑦2 0 1 1 0 1 1 0 0 A B C D Houses 𝑦1 𝑦2 Actual Output Target Output
  • 6. Network Design Step 2 : Choosing the activation function and rule to update weights Loss function, 𝐿 = 1 2 𝑦 − ො 𝑦 2 1 1 𝑁1 𝑁2 ෝ 𝒚𝟏 = 𝒇 𝒘𝟏𝟏𝒙𝟏 + 𝒘𝟏𝟐𝒙𝟐 + 𝒘𝟏𝟑𝒙𝟑 + 𝒃𝟏 𝒙𝟏 𝒙𝟐 𝒙𝟑 𝒘𝟏𝟏 𝒘𝟏𝟐 𝒘𝟏𝟑 𝒘𝟐𝟏 𝒘𝟐𝟐 𝒘𝟐𝟑 ෝ 𝒚𝟐 = 𝒇 𝒘𝟐𝟏𝒙𝟏 + 𝒘𝟐𝟐𝒙𝟐 + 𝒘𝟐𝟑𝒙𝟑 + 𝒃𝟐 𝒃𝟏 𝒃𝟐 ො 𝑦1 ො 𝑦2 0 1 1 0 0 0 A B C D Houses 𝑦1 𝑦2 Actual Output Target Output 𝑾𝒊𝒋 𝒕 + 𝟏 = 𝑾𝒊𝒋 𝒕 − 𝜼 𝝏𝑳 𝝏𝒘𝒊𝒋 𝝏𝑳 𝝏𝒘𝟏𝟏 = 𝒚 − ෝ 𝒚 𝒇′ 𝒘𝟏𝟏𝒙𝟏 + +𝒘𝟏𝟐𝒙𝟐 + 𝒘𝟏𝟑𝒙𝟑 + 𝒃𝟏 𝒙𝟏
  • 8. Network Architectures (More Complex) 𝑥1 𝑥2 𝑥2 𝑥2 ℎ2 (1) ℎ1 (1) ℎ3 (1) ℎ1 (2) ℎ2 (2) ℎ3 (2) 𝑦1 𝑦2 𝑾(𝟏) 𝑾(𝟐) 𝑾(𝟑)
  • 9. Input 𝝏𝑳 𝝏𝑾𝒊𝒍 = 𝜹𝒊𝒁𝒍 𝜹𝒊 = 𝝈′ 𝒂𝒊 ෍ 𝑱 𝜹𝒋𝑾𝒋𝒊 𝑎𝑙 𝑧𝑙 𝑎𝑖 𝑧𝑖 𝑎𝑗 𝑧𝑗 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 𝑾𝒊𝒍 𝑾𝒋𝒊 𝜹𝒊 = 𝝏𝑳 𝝏𝒂𝒊 𝜹𝒋 = 𝝏𝑳 𝝏𝒂𝒋 Cost Function 𝑳 = 𝟏 𝟐 (𝒚 − ෝ 𝒚) 𝟐 Error to input layer 𝝈 𝒂𝒊 𝟏 − 𝝈 𝒂𝒊 Back-propagation Algorithm (Generalized Expression) 𝜹𝒋 = 𝝏𝑳 𝝏𝒂𝒋 = − 𝒚 − ෝ 𝒚 𝝏ෝ 𝒚 𝝏𝒂𝒋 𝒂𝒊 = ෍ 𝒍 𝑾𝒊𝒍𝒁𝒍 𝜹𝒊 = 𝝏𝑳 𝝏𝒂𝒊 = ෍ 𝑱 𝝏𝑳 𝝏𝒂𝒋 𝝏𝒂𝒋 𝝏𝒂𝒊 𝝏𝒂𝒋 𝝏𝒂𝒊 = 𝝏𝒂𝒋 𝝏𝒁𝒊 𝝏𝒁𝒊 𝝏𝒂𝒊 = 𝑾𝒋𝒊𝝈′ 𝒂𝒊
  • 10. Back-propagation Algorithm 𝒙𝟏 𝒙𝟐 ෝ 𝒚 𝑎1 𝜎 𝑎1 𝑎2 𝜎 𝑎2 𝑏1 𝜎 𝑏1 1 1 1 0.6 0.4 -0.1 0.5 -0.3 0.3 0.4 0.1 -0.2
  • 11. Back-propagation Algorithm 𝒙𝟏 𝒙𝟐 ෝ 𝒚 𝑎1 𝜎 𝑎1 𝑎2 𝜎 𝑎2 𝑏1 𝜎 𝑏1 𝑻𝒂𝒓𝒈𝒆𝒕 𝒚 = 𝟏 1 1 1 0 1 0.6 0.4 -0.1 0.5 -0.3 0.3 0.4 0.1 -0.2 𝑾𝟏𝟏 (𝟏) 𝑾𝟐𝟏 (𝟏) 𝑾𝟏𝟐 (𝟏) 𝑾𝟏𝟎 (𝟏) 𝑾𝟏𝟐 (𝟏) 𝑾𝟐𝟎 (𝟏) 𝑾𝟏𝟐 (𝟐) 𝑾𝟏𝟎 (𝟐) 𝑾𝟏𝟏 (𝟐)
  • 12. Back-propagation Algorithm 𝒙𝟏 𝒙𝟐 ෝ 𝒚 𝑎1 𝜎 𝑎1 𝑎2 𝜎 𝑎2 𝑏1 𝜎 𝑏1 𝑻𝒂𝒓𝒈𝒆𝒕 𝒚 = 𝟏 1 1 1 0 1 0.6 0.4 -0.1 0.5 -0.3 0.3 0.4 0.1 -0.2 𝑾𝟏𝟏 (𝟏) 𝑾𝟐𝟏 (𝟏) 𝑾𝟏𝟐 (𝟏) 𝑾𝟏𝟎 (𝟏) 𝑾𝟏𝟐 (𝟏) 𝑾𝟐𝟎 (𝟏) 𝑾𝟏𝟐 (𝟐) 𝑾𝟏𝟎 (𝟐) 𝑾𝟏𝟏 (𝟐) 𝑳 = 𝟏 𝟐 𝒚 − ෝ 𝒚 𝟐 Loss function
  • 13. Back-propagation Algorithm 𝒙𝟏 𝒙𝟐 ෝ 𝒚 𝑎1 𝜎 𝑎1 𝑎2 𝜎 𝑎2 𝑏1 𝜎 𝑏1 𝑻𝒂𝒓𝒈𝒆𝒕 𝒚 = 𝟏 1 1 1 0 1 0.6 0.4 -0.1 0.5 -0.3 0.3 0.4 0.1 -0.2 𝑾𝟏𝟏 (𝟏) 𝑾𝟐𝟏 (𝟏) 𝑾𝟏𝟐 (𝟏) 𝑾𝟏𝟎 (𝟏) 𝑾𝟏𝟐 (𝟏) 𝑾𝟐𝟎 (𝟏) 𝑾𝟏𝟐 (𝟐) 𝑾𝟏𝟎 (𝟐) 𝑾𝟏𝟏 (𝟐) 𝑳 = 𝟏 𝟐 𝒚 − ෝ 𝒚 𝟐 Loss function 𝑾𝑵𝒆𝒘 = 𝑾𝒐𝒍𝒅 − 𝜼 𝝏𝑳 𝝏𝑾𝒐𝒍𝒅
  • 14. Back-propagation Algorithm Step 1 : Forward pass 𝒙𝟏 𝒙𝟐 𝑎1 𝜎 𝑎1 𝒂𝟏=0.2 𝑎2 𝜎 𝑎2 𝑏1 𝜎 𝑏1 𝑻𝒂𝒓𝒈𝒆𝒕 𝒚 = 𝟏 1 1 1 0 1 0.6 0.4 -0.1 0.5 -0.3 0.3 0.4 0.1 -0.2 𝑾𝟏𝟏 (𝟏) 𝑾𝟐𝟏 (𝟏) 𝑾𝟏𝟐 (𝟏) 𝑾𝟏𝟎 (𝟏) 𝑊 12 (1) 𝑾𝟐𝟎 (𝟏) 𝑾𝟏𝟐 (𝟐) 𝑾𝟏𝟎 (𝟐) 𝑾𝟏𝟏 (𝟐) 𝑳 = 𝟏 𝟐 𝒚 − ෝ 𝒚 𝟐 Loss function 𝝈 𝒂𝟏 = 𝟏 𝟏 + 𝒆−𝟎.𝟐 = 𝟎. 𝟓𝟒𝟗𝟖 𝑾𝑵𝒆𝒘 = 𝑾𝒐𝒍𝒅 − 𝜼 𝝏𝑳 𝝏𝑾𝒐𝒍𝒅
  • 15. Back-propagation Algorithm Step 1 : Forward pass 𝒙𝟏 𝒙𝟐 𝑎1 𝜎 𝑎1 𝒂𝟏=0.2 𝒂𝟐=0.9 𝑎2 𝜎 𝑎2 𝑏1 𝜎 𝑏1 𝑻𝒂𝒓𝒈𝒆𝒕 𝒚 = 𝟏 1 1 1 0 1 0.6 0.4 -0.1 0.5 -0.3 0.3 0.4 0.1 -0.2 𝑾𝟏𝟏 (𝟏) 𝑾𝟐𝟏 (𝟏) 𝑾𝟏𝟐 (𝟏) 𝑾𝟏𝟎 (𝟏) 𝑾𝟏𝟐 (𝟏) 𝑾𝟐𝟎 (𝟏) 𝑾𝟏𝟐 (𝟐) 𝑾𝟏𝟎 (𝟐) 𝑾𝟏𝟏 (𝟐) 𝑳 = 𝟏 𝟐 𝒚 − ෝ 𝒚 𝟐 Loss function 𝝈 𝒂𝟏 = 𝟏 𝟏 + 𝒆−𝟎.𝟐 = 𝟎. 𝟓𝟒𝟗𝟖 𝝈 𝒂𝟐 = 𝟏 𝟏 + 𝒆−𝟎.𝟗 = 𝟎. 𝟕𝟏𝟎𝟗 ෝ 𝒚 𝑾𝑵𝒆𝒘 = 𝑾𝒐𝒍𝒅 − 𝜼 𝝏𝑳 𝝏𝑾𝒐𝒍𝒅
  • 16. Back-propagation Algorithm Step 1 : Forward pass 𝒙𝟏 𝒙𝟐 𝑎1 𝜎 𝑎1 𝒂𝟏=0.2 𝒂𝟐=0.9 𝒃𝟏=0.09101 𝑎2 𝜎 𝑎2 𝑏1 𝜎 𝑏1 ෝ 𝒚 = 𝝈 𝒃𝟏 =0.5227 𝑻𝒂𝒓𝒈𝒆𝒕 𝒚 = 𝟏 1 1 1 0 1 0.6 0.4 -0.1 0.5 -0.3 0.3 0.4 0.1 -0.2 𝑾𝟏𝟏 (𝟏) 𝑾𝟐𝟏 (𝟏) 𝑾𝟏𝟐 (𝟏) 𝑾𝟏𝟎 (𝟏) 𝑾𝟏𝟐 (𝟏) 𝑾𝟐𝟎 (𝟏) 𝑾𝟏𝟐 (𝟐) 𝑾𝟏𝟎 (𝟐) 𝑾𝟏𝟏 (𝟐) 𝑳 = 𝟏 𝟐 𝒚 − ෝ 𝒚 𝟐 Loss function 𝝈 𝒂𝟏 = 𝟏 𝟏 + 𝒆−𝟎.𝟐 = 𝟎. 𝟓𝟒𝟗𝟖 𝝈 𝒂𝟐 = 𝟏 𝟏 + 𝒆−𝟎.𝟗 = 𝟎. 𝟕𝟏𝟎𝟗
  • 17. Back-propagation Algorithm Step 2 : Backpropagation of error 𝒙𝟏 𝒙𝟐 𝑎1 𝜎 𝑎1 𝒂𝟏=0.2 𝒂𝟐=0.9 𝒃𝟏=0.09101 𝑎2 𝜎 𝑎2 𝑏1 𝜎 𝑏1 ෝ 𝒚 = 𝝈 𝒃𝟏 =0.5227 𝑻𝒂𝒓𝒈𝒆𝒕 𝒚 = 𝟏 1 1 1 0 1 0.6 0.4 -0.1 0.5 -0.3 0.3 0.4 0.1 -0.2 𝑾𝟏𝟏 (𝟏) 𝑾𝟐𝟏 (𝟏) 𝑾𝟏𝟐 (𝟏) 𝑾𝟏𝟎 (𝟏) 𝑾𝟏𝟐 (𝟏) 𝑾𝟐𝟎 (𝟏) 𝑾𝟏𝟐 (𝟐) 𝑾𝟏𝟎 (𝟐) 𝑾𝟏𝟏 (𝟐) 𝑳 = 𝟏 𝟐 𝒚 − ෝ 𝒚 𝟐 Loss function 𝝈 𝒂𝟏 = 𝟏 𝟏 + 𝒆−𝟎.𝟐 = 𝟎. 𝟓𝟒𝟗𝟖 𝝈 𝒂𝟐 = 𝟏 𝟏 + 𝒆−𝟎.𝟗 = 𝟎. 𝟕𝟏𝟎𝟗 𝑎𝑙 𝑧𝑙 𝑎𝑖 𝑧𝑖 𝑎𝑗 𝑧𝑗 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 𝝏𝑳 𝝏𝑾𝒊𝒍 = 𝜹𝒊𝒁𝒍 𝜹𝒊 = 𝝈′ 𝒂𝒊 ෍ 𝑱 𝜹𝒋𝑾𝒋𝒊 𝑾𝒊𝒍 𝑾𝒋𝒊 Imagine
  • 18. Back-propagation Algorithm 𝝏𝑳 𝝏𝑾𝒊𝒍 = 𝜹𝒊𝒁𝒍 𝜹𝒊 = 𝝈′ 𝒂𝒊 ෍ 𝑱 𝜹𝒋𝑾𝒋𝒊 𝑎𝑙 𝑧𝑙 𝑎𝑖 𝑧𝑖 𝑎𝑗 𝑧𝑗 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 𝑾𝒊𝒍 𝑾𝒋𝒊 𝜹𝒊= 𝝏𝑳 𝝏𝒂𝒊 𝜹𝒋= 𝝏𝑳 𝝏𝒂𝒋 𝝈′ 𝒂𝒊 = 𝝈 𝒂𝒊 𝟏 − 𝝈 𝒂𝒊 Cost/Error Function 𝑳 = 𝟏 𝟐 (𝒚 − ෝ 𝒚) 𝟐 𝒛𝒊 = 𝝈 𝒂𝒊 𝜹𝟏 = 𝝈′ 𝒂𝟏 𝜹𝒋𝑾𝟏𝟏 (𝟐) = 𝝈 𝒂𝟏 𝟏 − 𝝈 𝒂𝟏 𝜹𝒐𝒖𝒕𝑾𝟏𝟏 (𝟐) = −𝝈 𝒂𝟏 𝟏 − 𝝈 𝒂𝟏 𝒚 − 𝒛𝒐𝒖𝒕 𝝈 𝒃𝒐𝒖𝒕 𝟏 − 𝝈 𝒃𝒐𝒖𝒕 𝑾𝟏𝟏 (𝟐) 𝜹𝒋= 𝝏𝑳 𝝏𝒂𝒋 𝜹𝒐𝒖𝒕 = 𝝏𝑳 𝝏𝒃𝒐𝒖𝒕 = − 𝒚 − ෝ 𝒚 𝝏ෝ 𝒚 𝝏𝒃𝒐𝒖𝒕 = − 𝒚 − 𝒛𝒐𝒖𝒕 𝝏𝒛𝒐𝒖𝒕 𝝏𝒃𝒐𝒖𝒕 = − 𝒚 − 𝒛𝒐𝒖𝒕 𝝈′ 𝒃𝒐𝒖𝒕 = − 𝒚 − 𝒛𝒐𝒖𝒕 𝝈 𝒃𝒐𝒖𝒕 𝟏 − 𝝈 𝒃𝒐𝒖𝒕 𝒙𝟏 𝑎1 𝑧1 𝒃𝒐𝒖𝒕 𝒛𝒐𝒖𝒕 𝑾𝟏𝟏 (𝟐) 𝑾𝟏𝟏 (𝟏) 𝒛𝟏 = 𝝈 𝒂𝟏 𝒛𝒐𝒖𝒕 = 𝝈 𝒃𝒐𝒖𝒕 = ෝ 𝒚 𝒙𝟐 𝑎2 𝑧2 𝑾𝟏𝟐 (𝟐) 𝑾𝟐𝟐 (𝟏) 𝒛𝟐 = 𝝈 𝒂𝟐 𝜹𝟐 = 𝝈′ 𝒂𝟐 𝜹𝒋𝑾𝟏𝟐 (𝟐) = 𝝈 𝒂𝟐 𝟏 − 𝝈 𝒂𝟐 𝜹𝒐𝒖𝒕𝑾𝟏𝟐 (𝟐) = −𝝈 𝒂𝟐 𝟏 − 𝝈 𝒂𝟐 𝒚 − 𝒛𝒐𝒖𝒕 𝝈 𝒃𝒐𝒖𝒕 𝟏 − 𝝈 𝒃𝒐𝒖𝒕 𝑾𝟏𝟐 (𝟐)
  • 19. Back-propagation Algorithm - error propagation (Update of Layer 1 weights) 0.5227 0.09101 𝟎. 𝟕𝟏𝟎𝟗 0.9 𝟎. 𝟓𝟒𝟗𝟖 0.2 𝜹𝟏 = 𝝈′ 𝒂𝟏 𝜹𝒋𝑾𝟏𝟏 (𝟐) = 𝝈 𝒂𝟏 𝟏 − 𝝈 𝒂𝟏 𝜹𝒐𝒖𝒕𝑾𝟏𝟏 (𝟐) = −𝝈 𝒂𝟏 𝟏 − 𝝈 𝒂𝟏 𝒚 − 𝒛𝒐𝒖𝒕 𝝈 𝒃𝒐𝒖𝒕 𝟏 − 𝝈 𝒃𝒐𝒖𝒕 𝑾𝟏𝟏 (𝟐) 𝜹𝒋= 𝝏𝑳 𝝏𝒂𝒋 𝜹𝒐𝒖𝒕 = 𝝏𝑳 𝝏𝒃𝒐𝒖𝒕 = − 𝒚 − ෝ 𝒚 𝝏ෝ 𝒚 𝝏𝒃𝒐𝒖𝒕 = − 𝒚 − 𝒛𝒐𝒖𝒕 𝝏𝒛𝒐𝒖𝒕 𝝏𝒃𝒐𝒖𝒕 = − 𝒚 − 𝒛𝒐𝒖𝒕 𝝈′ 𝒃𝒐𝒖𝒕 = − 𝒚 − 𝒛𝒐𝒖𝒕 𝝈 𝒃𝒐𝒖𝒕 𝟏 − 𝝈 𝒃𝒐𝒖𝒕 𝒙𝟏 𝑎1 𝑧1 𝒃𝒐𝒖𝒕 𝒛𝒐𝒖𝒕 𝑾𝟏𝟏 (𝟐) 𝑾𝟏𝟏 (𝟏) 𝒛𝟏 = 𝝈 𝒂𝟏 𝒛𝒐𝒖𝒕 = 𝝈 𝒃𝒐𝒖𝒕 = ෝ 𝒚 𝑾𝟏𝟐 (𝟏) 𝑾𝟐𝟐 (𝟏) 𝑾𝟏𝟎 (𝟏) 𝑾𝟐𝟎 (𝟏) 𝑾𝟏𝟎 (𝟐) 𝒙𝟐 𝑎2 𝑧2 𝑾𝟏𝟐 (𝟐) 𝑾𝟐𝟐 (𝟏) 𝒛𝟐 = 𝝈 𝒂𝟐 𝜹𝟐 = 𝝈′ 𝒂𝟐 𝜹𝒋𝑾𝟏𝟐 (𝟐) = 𝝈 𝒂𝟐 𝟏 − 𝝈 𝒂𝟐 𝜹𝒐𝒖𝒕𝑾𝟏𝟐 (𝟐) = −𝝈 𝒂𝟐 𝟏 − 𝝈 𝒂𝟐 𝒚 − 𝒛𝒐𝒖𝒕 𝝈 𝒃𝒐𝒖𝒕 𝟏 − 𝝈 𝒃𝒐𝒖𝒕 𝑾𝟏𝟐 (𝟐) 0.4 0.1 𝜹𝟏 = − 𝟎. 𝟓𝟒𝟗𝟖 𝟏 − 𝟎. 𝟓𝟒𝟗𝟖 𝟏 − 𝟎. 𝟓𝟐𝟐𝟕 𝟎. 𝟓𝟐𝟐𝟕 𝟏 − 𝟎. 𝟓𝟐𝟐𝟕 𝟎. 𝟒 = −𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗 𝜹𝟐 = − 𝟎. 𝟕𝟏𝟎𝟗 𝟏 − 𝟎. 𝟕𝟏𝟎𝟗 𝟏 − 𝟎. 𝟓𝟐𝟐𝟕 𝟎. 𝟓𝟐𝟐𝟕 𝟏 − 𝟎. 𝟓𝟐𝟐𝟕 𝟎. 𝟏 = −𝟎. 𝟎𝟎𝟐𝟒𝟒𝟕 0 1 𝒚 = 𝟏 1 1 1
  • 20. Back-propagation Algorithm (Update of Layer 1 weights) 𝑾𝒊𝒋 𝑵𝒆𝒘 = 𝑾𝒊𝒋 𝒐𝒍𝒅 − 𝜼 𝝏𝑳 𝝏𝑾𝒊𝒋 𝒐𝒍𝒅 𝜹𝟏 = −𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗 𝜹𝟐 = −𝟎. 𝟎𝟎𝟐𝟒𝟒𝟕 𝜼 = 𝟎. 𝟐𝟓 0.5227 0.09101 0.9 𝟎. 𝟕𝟏𝟎𝟗 𝟎. 𝟓𝟒𝟗𝟖 0.2 𝒙𝟏 𝑎1 𝑧1 𝒃𝒐𝒖𝒕 𝒛𝒐𝒖𝒕 𝑾𝟏𝟏 (𝟐) 𝑾𝟏𝟏 (𝟏) 𝒛𝟏 = 𝝈 𝒂𝟏 𝒛𝒐𝒖𝒕 = 𝝈 𝒃𝒐𝒖𝒕 = ෝ 𝒚 𝑾𝟏𝟐 (𝟏) 𝑾𝟐𝟏 (𝟏) 𝑾𝟏𝟎 (𝟏) 1 𝑾𝟐𝟎 (𝟏) 1 1 𝒙𝟐 𝑎2 𝑧2 𝑾𝟏𝟐 (𝟐) 𝑾𝟐𝟐 (𝟏) 𝒛𝟐 = 𝝈 𝒂𝟐 0.6 0.4 0.4 0.1 -0.1 -0.3 0 1 𝒚 = 𝟏 0.3 0.5 -0.2 𝝏𝑳 𝝏𝑾𝟏𝟏 (𝟏) = 𝜹𝟏𝒙𝟏 = −𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗 × 𝟎 = 𝟎 𝝏𝑳 𝝏𝑾𝟏𝟐 (𝟏) = 𝜹𝟏𝒙𝟐 = −𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗 × 𝟏 = −𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗 𝝏𝑳 𝝏𝑾𝟏𝟎 (𝟏) = 𝜹𝟏𝒙𝟎 = −𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗 × 𝟏 = −𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗 𝝏𝑳 𝝏𝑾𝟐𝟏 (𝟏) = 𝜹𝟐𝒙𝟏 = −𝟎. 𝟎𝟎𝟐𝟒𝟒𝟕 × 𝟎 = 𝟎 𝝏𝑳 𝝏𝑾𝟐𝟐 (𝟏) = 𝜹𝟐𝒙𝟐 = −𝟎. 𝟎𝟎𝟐𝟒𝟒𝟕 × 𝟏 = −𝟎. 𝟎02447 𝝏𝑳 𝝏𝑾𝟐𝟎 (𝟏) = 𝜹𝟐𝒙𝟎 = −𝟎. 𝟎𝟎𝟐𝟒𝟒𝟕 × 𝟏 = −𝟎. 𝟎𝟎2447 𝑾𝟏𝟏 𝟏 𝒕 + 𝟏 = 𝑾𝟏𝟏 𝟏 𝒕 − 𝟎. 𝟐𝟓 𝝏𝑳 𝝏𝑾𝟏𝟏 𝟏 𝒕 = 𝟎. 𝟔 − 𝟎. 𝟐𝟓 × 𝟎 = 𝟎. 𝟔 𝑾𝟏𝟐 𝟏 𝒕 + 𝟏 = 𝑾𝟏𝟐 𝟏 𝒕 − 𝟎. 𝟐𝟓 𝝏𝑳 𝝏𝑾𝟏𝟐 𝟏 𝒕 = −𝟎. 𝟏 + 𝟎. 𝟐𝟓 × 𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗 = −𝟎. . 𝟎𝟗𝟕𝟎𝟓 𝑾𝟏𝟎 𝟏 𝒕 + 𝟏 = 𝑾𝟏𝟎 𝟏 𝒕 − 𝟎. 𝟐𝟓 𝝏𝑳 𝝏𝑾𝟏𝟎 𝟏 𝒕 = 𝟎. 𝟑 + 𝟎. 𝟐𝟓 × 𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗 = 𝟎. 𝟑𝟎𝟐𝟗𝟓 𝑾𝟐𝟏 𝟏 𝒕 + 𝟏 = 𝑾𝟐𝟏 𝟏 𝒕 − 𝟎. 𝟐𝟓 𝝏𝑳 𝝏𝑾𝟐𝟏 𝟏 𝒕 = −𝟎. 𝟑 − 𝟎. 𝟐𝟓 × 𝟎 = −𝟎. 𝟑 𝑾𝟐𝟐 𝟏 𝒕 + 𝟏 = 𝑾𝟐𝟐 𝟏 𝒕 − 𝟎. 𝟐𝟓 𝝏𝑳 𝝏𝑾𝟐𝟐 𝟏 𝒕 = 𝟎. 𝟒 + 𝟎. 𝟐𝟓 × 𝟎. 𝟎𝟎𝟐𝟒𝟒𝟕 = 𝟎.4006125 𝑾𝟐𝟎 𝟏 𝒕 + 𝟏 = 𝑾𝟐𝟎 𝟏 𝒕 − 𝟎. 𝟐𝟓 𝝏𝑳 𝝏𝑾𝟐𝟎 𝟏 𝒕 = 𝟎. 𝟓 + 𝟎. 𝟐𝟓 × 𝟎. 𝟎𝟎𝟐𝟒𝟒𝟕 = 𝟎. 𝟓𝟎𝟎𝟔𝟏𝟐𝟓 𝑾𝟏𝟎 (𝟐)
  • 21. Back-propagation Algorithm (Update of layer 2 Weights) 𝑾𝒊𝒋 𝑵𝒆𝒘 = 𝑾𝒊𝒋 𝒐𝒍𝒅 − 𝜼 𝝏𝑳 𝝏𝑾𝒊𝒋 𝒐𝒍𝒅 𝜹𝟏 = −𝟎. 𝟎𝟏𝟏𝟕𝟖𝟗 𝜹𝟐 = −𝟎. 𝟎𝟎𝟐𝟒𝟒𝟕 0.5227 0.09101 𝟎. 𝟕𝟏𝟎𝟗 0.9 0.2 𝟎. 𝟓𝟒𝟗𝟖 𝒙𝟏 𝑎1 𝑧1 𝒃𝒐𝒖𝒕 𝒛𝒐𝒖𝒕 𝑾𝟏𝟏 (𝟐) 𝑾𝟏𝟏 (𝟏) 𝒛𝟏 = 𝝈 𝒂𝟏 𝒛𝒐𝒖𝒕 = 𝝈 𝒃𝒐𝒖𝒕 = ෝ 𝒚 𝑾𝟏𝟐 (𝟏) 𝑾𝟐𝟏 (𝟏) 𝑾𝟏𝟎 (𝟏) 1 𝑾𝟐𝟎 (𝟏) 1 1 𝑾𝟏𝟎 (𝟐) 𝒙𝟐 𝑎2 𝑧2 𝑾𝟏𝟐 (𝟐) 𝑾𝟐𝟐 (𝟏) 𝒛𝟐 = 𝝈 𝒂𝟐 0.6 0.4 0.4 0.1 -0.1 -0.3 0 1 𝒚 = 𝟏 0.3 0.5 -0.2 𝒃𝒐𝒖𝒕 = 𝒛𝟏𝑾𝟏𝟏 (𝟐) + 𝒛𝟐𝑾𝟏𝟐 (𝟐) + 𝑾𝟏𝟎 (𝟐) , 𝒛𝒐𝒖𝒕 = 𝝈 𝒃𝒐𝒖𝒕 = ෝ 𝒚 𝝏𝑳 𝝏𝑾𝟏𝟏 (𝟐) = − 𝒚 − ෝ 𝒚 𝝏ෝ 𝒚 𝝏𝑾𝟏𝟏 𝟐 = − 𝒚 − ෝ 𝒚 𝝈′ 𝒃𝒐𝒖𝒕 𝒛𝟏 = 𝜹𝒐𝒖𝒕𝒛𝟏 𝝏𝑳 𝝏𝑾𝟏𝟐 (𝟐) = − 𝒚 − ෝ 𝒚 𝝏ෝ 𝒚 𝝏𝑾𝟏𝟐 𝟐 = − 𝒚 − ෝ 𝒚 𝝈′ 𝒃𝒐𝒖𝒕 𝒛𝟐 = 𝜹𝒐𝒖𝒕𝒛𝟐 𝝏𝑳 𝝏𝑾𝟏𝟎 (𝟐) = − 𝒚 − ෝ 𝒚 𝝏ෝ 𝒚 𝝏𝑾𝟏𝟎 𝟐 = − 𝒚 − ෝ 𝒚 𝝈′ 𝒃𝒐𝒖𝒕 = 𝜹𝒐𝒖𝒕 𝝈′ 𝒃𝒐𝒖𝒕 = 𝝈 𝒃𝒐𝒖𝒕 𝟏 − 𝝈 𝒃𝒐𝒖𝒕 𝜹𝑶𝒖𝒕 = − 𝒚 − ෝ 𝒚 𝝈 𝒃𝒐𝒖𝒕 𝟏 − 𝝈 𝒃𝒐𝒖𝒕 = − 𝟏 − 𝟎. 𝟓𝟐𝟐𝟕 𝟎. 𝟓𝟐𝟐𝟕 𝟏 − 𝟎. 𝟓𝟐𝟐𝟕 = −𝟎. 𝟏𝟏𝟗𝟎𝟖 𝑾𝟏𝟏 𝟐 𝒕 + 𝟏 = 𝑾𝟏𝟏 𝟐 𝒕 − 𝜼 𝝏𝑳 𝝏𝑾𝟏𝟏 𝟐 = 𝟎. 𝟒 + 𝟎. 𝟐𝟓 × 𝟎. 𝟏𝟏𝟗𝟎𝟖 × 𝟎. 𝟓𝟒𝟗𝟖 = 𝟎. 𝟒𝟏𝟔𝟒 𝑾𝟏𝟐 𝟐 𝒕 + 𝟏 = 𝑾𝟏𝟐 𝟐 𝒕 − 𝜼 𝝏𝑳 𝝏𝑾𝟏𝟐 𝟐 = 𝟎. 𝟏 + 𝟎. 𝟐𝟓 × 𝟎. 𝟏𝟏𝟗𝟎𝟖 × 𝟎. 𝟕𝟏𝟎𝟗 = 𝟎. 𝟏𝟐𝟏𝟐 𝑾𝟏𝟎 𝟐 𝒕 + 𝟏 = 𝑾𝟏𝟎 𝟐 𝒕 − 𝜼 𝝏𝑳 𝝏𝑾𝟏𝟎 𝟐 = −𝟎. 𝟐 + 𝟎. 𝟐𝟓 × 𝟎. 𝟏𝟏𝟗𝟎𝟖 = −𝟎. 𝟏𝟕𝟎𝟐𝟑