CS767_Lecture_05.pptx

Advance Topics in Artificial Intelligence
CSC 767
Learning Algorithm - Artificial Neural Networks
(ANN)
Courtesy Shahid Abid

Today’s Topics
Backpropagation Neural Network
1. Introduction
2. Architecture
3. Training Algorithm
4. Choice of Initial Weights & Biases
(i) Random Initialization
(ii) Nguyen-Widrow Initialization
5. A Particular Case Study (Matlab Program)

Introduction
• The mathematical basis for the backpropagation algorithm is the
optimization technique known as the gradient decent.
• The gradient of a function (in this case, the function is the error and
the variables are the weights of the net) gives the direction in which
the function increases more rapidly.
• The negative of the gradient gives the direction in which the function
decreases most rapidly.
-2 -1 0 1 2 3 4 5 6 7 8
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
w1
w2
Backpropagation
Neural
Network

Architecture
l
l
l
Hidden
Units
l
l
l
Z1
Zj
Zp
Output
Units
Y1
l
l
l
Yk
l
l
l
Ym
Input
Units
l
l
l
l
l
l
Xi
X1
Xn
v11
vi1
vn1
v1j
vij
vnj
v1p
vip
vnp
w11
wj1
wp1
w1k
wjk
wpk
w1m
wjm
wpm
Backpropagation
Neural
Network

Training Algorithm
 Step 0: Initialize weights ( set to small random values)
 Step 1: While stopping condition is false, do steps 2-9
 Step 2: For each training pair, do steps 3-8
Feedforward
 Step 3: each input unit (Xi, I=1,2,…,n) receives input signal xi and
broadcasts this signal to all units in the layer above (the hidden units).
 Each hidden unit (Zj, j=1,2,…,p) sums its weighted input signals,
applies its activation function to compute its output signal,
and send this signal to all units in the layer above (output units).




n
i
ij
i
j
j v
x
v
in
z
1
0
_
)
_
( j
j in
z
f
z 
Backpropagation
Neural
Network

Training Algorithm (con. . .)
 Step 5: Each output unit (Yk, k=1,2,…,m) sums its weighted input
signals,
 and applies its activation function to compute its output signal,
Backpropagation of Errors:
 Step 6: Each output unit (Yk, k=1,2,…,m) receives a target pattern
corresponding to the input training pattern, computes its error
information term,
 calculates its weight correction term (used to update wjk later),
 calculates its bias correction term (used to update w0k later),
 and sends the error to units in the layer below.




p
j
jk
j
k
k v
z
w
in
y
1
0
_
)
_
( k
k in
y
f
y 
)
_
(
'
)
( k
k
k
k in
y
f
y
t 


j
k
jk z
w 



k
k
w 


 0
Backpropagation
Neural
Network

Training Algorithm (con . . .)
 Step 7: Each hidden unit (Zj, j=1,2,…,p) sums its delta inputs (from units
in the layer above),
 multiplies by the derivative of its activation function to calculate its error
information term,
 calculates its weight correction term (used to update vij later),
 and calculates its bias correction term (used to update v0j later),
Update Weights and Biases
 Step 8: Each output unit (Yk, k=1,2,…,m) updates its bias and weights
(j=0,..,p)
 Each hidden unit (Zj, j=1,2,…,p) updates its bias and weights (I=0,…,n):
 Step 9: Test stopping condition.
i
j
ij x
v 



j
k
v 


 0
)
_
(
'
_ j
j
j in
z
f
in

 



m
k
jk
k
j w
in
1
_ 

jk
jk
jk w
old
w
new
w 

 )
(
)
(
ij
ij
ij v
old
v
new
v 

 )
(
)
(
Backpropagation
Neural
Network

Activation Functions
 The activation function for a backpropagation net
should be
– continuous
– differentiable (should have derivative, which could
easily be computed)
– monotonically increasing
 The most commonly used activation functions are:
– Binary Sigmoid function (which has range of [0,1]):
– Bipolar Sigmoid function (which has range of [-1,1]):
– Hyperbolic Tangent function (closely related to
bipolar sigmoid)
)]
(
1
)[
(
)
(
'
)],
exp(
1
[
1
)
( 1
1
1
1 x
f
x
f
x
f
x
x
f 




)]
(
1
)][
(
1
[
5
.
0
)
(
'
,
1
)]}
exp(
1
[
2
{
)
( 2
2
2
2 x
f
x
f
x
f
x
x
f 






x
x
x
x
e
e
e
e
x 




)
tanh(
Backpropagation
Neural
Network

Choice of Initial Weights & Biases
 The choice of initial weights will influence whether the net
reaches a global (or only a local) minimum of the error.
 It also dictates how long does it take to reach
convergence.
 The following two choices are generally taken:
1. Random Initialization
 Weights are initialized randomly between -0.5 and 0.5
(or -1 and 1).
 The values may be positive and negative, because the
final weights after the training may be of either sign.
 The values of the initial weights must not be too large.
Large initial weights after multiplication with the signals
from the input or hidden layer neurons, result in output
signals falling in the saturation region of the activation
functions. This in turn results in zero derivative of the
activation function. Thus there are no weight changes.
Backpropagation
Neural
Network

2. Nguyen-Widrow Initialization
– This initialization scheme typically leads to faster learning.
– The approach is based on the geometrical analysis of the
response of the hidden layer neurons to a single input.
The analysis is extended to several inputs by using
Fourier transforms.
– Weights from the hidden units to the output units are
initialized to random values between -0.5 and 0.5 as is the
commonly the case.
– For initializing the weights from input to hidden layer units:
• For each hidden unit ( j = 1,.., p )
– Initialize the weight vector using = random number
between -0.5 and 0.5
– Compute
– Reinitialize weights
– Set bias = random number between and
– where the scale factor is given by
– n being the number of input units and p being the number of hidden
units
)
(old
vij
)
(old
ij
v
)
(
)
( old
old
v
v ij
ij
ij v


j
v0 
 
 n
p /
1
]
[
7
.
0


Backpropagation
Neural
Network

Backpropagation
Neural
Network

Backpropagation
Neural
Network Error Reduction as a Function of
Number of Iterations
0 100 200 300 400 500 600 700 800 900 1000
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4

CS767_Lecture_05.pptx

Recommandé

Recommandé

Contenu connexe

Similaire à CS767_Lecture_05.pptx

Similaire à CS767_Lecture_05.pptx (20)

Plus de ShujatHussainGadi

Plus de ShujatHussainGadi (7)

Dernier

Dernier (20)

CS767_Lecture_05.pptx