Today’s Topics
Backpropagation Neural Network
1. Introduction
2. Architecture
3. Training Algorithm
4. Choice of Initial Weights & Biases
(i) Random Initialization
(ii) Nguyen-Widrow Initialization
5. A Particular Case Study (Matlab Program)
Introduction
• The mathematical basis for the backpropagation algorithm is the
optimization technique known as the gradient decent.
• The gradient of a function (in this case, the function is the error and
the variables are the weights of the net) gives the direction in which
the function increases more rapidly.
• The negative of the gradient gives the direction in which the function
decreases most rapidly.
-2 -1 0 1 2 3 4 5 6 7 8
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
w1
w2
Backpropagation
Neural
Network
Training Algorithm
Step 0: Initialize weights ( set to small random values)
Step 1: While stopping condition is false, do steps 2-9
Step 2: For each training pair, do steps 3-8
Feedforward
Step 3: each input unit (Xi, I=1,2,…,n) receives input signal xi and
broadcasts this signal to all units in the layer above (the hidden units).
Each hidden unit (Zj, j=1,2,…,p) sums its weighted input signals,
applies its activation function to compute its output signal,
and send this signal to all units in the layer above (output units).
n
i
ij
i
j
j v
x
v
in
z
1
0
_
)
_
( j
j in
z
f
z
Backpropagation
Neural
Network
Training Algorithm (con. . .)
Step 5: Each output unit (Yk, k=1,2,…,m) sums its weighted input
signals,
and applies its activation function to compute its output signal,
Backpropagation of Errors:
Step 6: Each output unit (Yk, k=1,2,…,m) receives a target pattern
corresponding to the input training pattern, computes its error
information term,
calculates its weight correction term (used to update wjk later),
calculates its bias correction term (used to update w0k later),
and sends the error to units in the layer below.
p
j
jk
j
k
k v
z
w
in
y
1
0
_
)
_
( k
k in
y
f
y
)
_
(
'
)
( k
k
k
k in
y
f
y
t
j
k
jk z
w
k
k
w
0
Backpropagation
Neural
Network
Training Algorithm (con . . .)
Step 7: Each hidden unit (Zj, j=1,2,…,p) sums its delta inputs (from units
in the layer above),
multiplies by the derivative of its activation function to calculate its error
information term,
calculates its weight correction term (used to update vij later),
and calculates its bias correction term (used to update v0j later),
Update Weights and Biases
Step 8: Each output unit (Yk, k=1,2,…,m) updates its bias and weights
(j=0,..,p)
Each hidden unit (Zj, j=1,2,…,p) updates its bias and weights (I=0,…,n):
Step 9: Test stopping condition.
i
j
ij x
v
j
k
v
0
)
_
(
'
_ j
j
j in
z
f
in
m
k
jk
k
j w
in
1
_
jk
jk
jk w
old
w
new
w
)
(
)
(
ij
ij
ij v
old
v
new
v
)
(
)
(
Backpropagation
Neural
Network
Activation Functions
The activation function for a backpropagation net
should be
– continuous
– differentiable (should have derivative, which could
easily be computed)
– monotonically increasing
The most commonly used activation functions are:
– Binary Sigmoid function (which has range of [0,1]):
– Bipolar Sigmoid function (which has range of [-1,1]):
– Hyperbolic Tangent function (closely related to
bipolar sigmoid)
)]
(
1
)[
(
)
(
'
)],
exp(
1
[
1
)
( 1
1
1
1 x
f
x
f
x
f
x
x
f
)]
(
1
)][
(
1
[
5
.
0
)
(
'
,
1
)]}
exp(
1
[
2
{
)
( 2
2
2
2 x
f
x
f
x
f
x
x
f
x
x
x
x
e
e
e
e
x
)
tanh(
Backpropagation
Neural
Network
Choice of Initial Weights & Biases
The choice of initial weights will influence whether the net
reaches a global (or only a local) minimum of the error.
It also dictates how long does it take to reach
convergence.
The following two choices are generally taken:
1. Random Initialization
Weights are initialized randomly between -0.5 and 0.5
(or -1 and 1).
The values may be positive and negative, because the
final weights after the training may be of either sign.
The values of the initial weights must not be too large.
Large initial weights after multiplication with the signals
from the input or hidden layer neurons, result in output
signals falling in the saturation region of the activation
functions. This in turn results in zero derivative of the
activation function. Thus there are no weight changes.
Backpropagation
Neural
Network
2. Nguyen-Widrow Initialization
– This initialization scheme typically leads to faster learning.
– The approach is based on the geometrical analysis of the
response of the hidden layer neurons to a single input.
The analysis is extended to several inputs by using
Fourier transforms.
– Weights from the hidden units to the output units are
initialized to random values between -0.5 and 0.5 as is the
commonly the case.
– For initializing the weights from input to hidden layer units:
• For each hidden unit ( j = 1,.., p )
– Initialize the weight vector using = random number
between -0.5 and 0.5
– Compute
– Reinitialize weights
– Set bias = random number between and
– where the scale factor is given by
– n being the number of input units and p being the number of hidden
units
)
(old
vij
)
(old
ij
v
)
(
)
( old
old
v
v ij
ij
ij v
j
v0
n
p /
1
]
[
7
.
0
Backpropagation
Neural
Network