The document discusses backpropagation, an algorithm used to train neural networks. It begins with background on perceptron learning and the need for an algorithm that can train multilayer perceptrons to perform nonlinear classification. It then describes the development of backpropagation, from early work in the 1970s to its popularization in the 1980s. The document provides examples of using backpropagation to design networks for binary classification and multi-class problems. It also outlines the generalized mathematical expressions and steps involved in backpropagation, including calculating the error derivative with respect to weights and updating weights to minimize loss.
2. Background
• Perceptron Learning Algorithm , Hebbian Learning can classify input pattern if input
patterns are linearly separable.
• We need an algorithm which can train multilayer of perceptron or classify patterns
which are not linearly separable.
• Algorithm should also be able to use non-linear activation function.
𝒙𝟏
𝒙𝟐
𝑪𝒍𝒂𝒔𝒔 𝟏
𝑪𝒍𝒂𝒔𝒔 𝟐
𝑳𝒊𝒏𝒆𝒂𝒓𝒍𝒚 𝑺𝒆𝒑𝒆𝒓𝒂𝒃𝒍𝒆
𝒙𝟏
𝒙𝟐
𝑪𝒍𝒂𝒔𝒔 𝟏
𝑪𝒍𝒂𝒔𝒔 𝟐
𝑳𝒊𝒏𝒆𝒂𝒓𝒍𝒚 𝑵𝒐𝒏 − 𝒔𝒆𝒑𝒆𝒓𝒂𝒃𝒍𝒆
• Need non-linear boundaries
• Perceptron algorithm can't be used
• Variation of GD rule is used.
• More layers are required
• Non-linear activation function required
Perceptron Algorithm −
𝑾𝒊+𝟏
𝑵𝒆𝒘
= 𝑾𝒊
𝑵𝒆𝒘
+ (𝒕 − 𝒂)𝒙𝒊
Gradient Descent Algorithm-
𝑾𝒊+𝟏
𝑵𝒆𝒘
= 𝑾𝒊
𝒐𝒍𝒅
− 𝜼
𝝏𝑳
𝝏𝒘𝒊
𝑳𝒐𝒔𝒔 𝒇𝒖𝒏𝒄𝒕𝒊𝒐𝒏, 𝑳 =
𝟏
𝟐
𝒕 − 𝒂 𝟐
3. Background- Back Propagation
• The perceptron learning rule of Frank Rosenblatt and the LMS algorithm of Bernard Widrow and
Marcian Hoff were designed to train single-layer perceptron-like networks.
• Single-layer networks suffer from the disadvantage that they are only able to solve linearly separable
classification problems. Both Rosenblatt and Widrow were aware of these limitations and proposed
multilayer networks that could overcome them, but they were not able to generalize their algorithms
to train these more powerful networks.
• First description of an algorithm to train multilayer networks was contained in the thesis of Paul
Werbos in 1974 .This thesis presented the algorithm in the context of general networks, with neural
networks as a special case, and was not disseminated in the neural network community.
• It was not until the mid 1980s that the backpropagation algorithm was rediscovered and widely
publicized. It was rediscovered independently by David Rumelhart, Geoffrey Hinton and Ronald
Williams 1986, David Parker 1985 and Yann Le Cun 1985.
• The algorithm was popularized by its inclusion in the book Parallel Distributed Processing [RuMc86],
which described the work of the Parallel Distributed Processing Group led by psychologists David
Rumelhart and James Mc-Clelland
• The multilayer perceptron, trained by the backpropagation algorithm, is currently the most widely
used neural network.
4. Network Design
Problem : Whether you watch a movie or not ?
Step 1 : Design – Output can be Yes (1) or No (0). Therefore one neuron or perceptron is
sufficient.
Step -2 : Choose suitable activation function in the output along with a rule to update the
weights. (Hard Limit function for perceptron learning algorithm, sigmoid for the
Widrow-Hoff rule or delta rule.)
𝑾𝒊+𝟏
𝑵𝒆𝒘
= 𝑾𝒊
𝒐𝒍𝒅
− 𝜼
𝝏𝑳
𝝏𝒘𝒊
𝑳 =
𝟏
𝟐
𝒚 − ෝ
𝒚 𝟐
=
𝟏
𝟐
𝒚 − 𝒇 𝒘𝒙 + 𝒃
𝟐
𝝏𝑳
𝝏𝒘
= 𝟐 ∗
𝟏
𝟐
𝒚 − 𝒇 𝒘𝒙 + 𝒃
𝝏𝒇 𝒘𝒙 + 𝒃
𝝏𝒘
= − 𝒚 − ෝ
𝒚 𝒇′ 𝒘𝒙 + 𝒃 𝒙
𝑥 𝑓
Director
or
Actor
or
Genre
or
IMDB
w
Yes (1)
or
No (0)
𝑤𝑥 + 𝑏
ෝ
𝒚 = 𝒇 𝒘𝒙 + 𝒃 =
𝟏
𝟏 + 𝒆−𝒘𝒙+𝒃
𝒇 𝒘𝒙 + 𝒃
𝑤0 = 𝑏
1
5. Network Design
Problem : Sort the students in the 4 house based on their three qualities like lineage,
choice and ethics ?
Step 1 : Design – Here, the input vector is 3-D i.e for each students, 𝑆𝑡𝑢𝑑𝑒𝑛𝑡 1 =
𝐿1
𝐶1
𝐸1
,
𝑆𝑡𝑢𝑑𝑒𝑛𝑡 2 =
𝐿2
𝐶2
𝐸2
𝒙𝟏
𝒙𝟐
𝒙𝟑
N
𝒙𝟏
𝒙𝟐
𝒙𝟑
𝒙𝟎=1
𝒘𝟏
𝒘𝟐
𝒘𝟑
𝒘𝟎 = 𝒃
Yes (1)
or
No (0)
𝑁1
𝑁2
ෝ
𝒚𝟏 = 𝒇 𝒘𝟏𝟏𝒙𝟏 + 𝒘𝟏𝟐𝒙𝟐 + 𝒘𝟏𝟑𝒙𝟑 + 𝒃𝟏
𝒙𝟏
𝒙𝟐
𝒙𝟑
𝒘𝟏𝟏
𝒘𝟏𝟐
𝒘𝟏𝟑
𝒘𝟐𝟏
𝒘𝟐𝟐
𝒘𝟐𝟑
ෝ
𝒚𝟐 = 𝒇 𝒘𝟐𝟏𝒙𝟏 + 𝒘𝟐𝟐𝒙𝟐 + 𝒘𝟐𝟑𝒙𝟑 + 𝒃𝟐
𝒃𝟏
𝒃𝟐
ො
𝑦1
ො
𝑦2
0
1
1
0
1
1
0
0
A B C D
Houses
𝑦1
𝑦2
Actual Output
Target Output