SlideShare une entreprise Scribd logo
1  sur  7
Télécharger pour lire hors ligne
World Academy of Science, Engineering and Technology
International Journal of Computer, Information Science and Engineering Vol:1 No:1, 2007

A Comparison of First and Second Order Training
Algorithms for Artificial Neural Networks

International Science Index 1, 2007 waset.org/publications/9681

Syed Muhammad Aqil Burney, Tahseen Ahmed Jilani, Cemal Ardil

Abstract— Minimization methods for training feed-forward
networks with Backpropagation are compared. Feedforward network
training is a special case of functional minimization, where no
explicit model of the data is assumed. Therefore due to the high
dimensionality of the data, linearization of the training problem
through use of orthogonal basis functions is not desirable. The focus
is functional minimization on any basis. A number of methods based
on local gradient and Hessian matrices are discussed. Modifications
of many methods of first and second order training methods are
considered. Using share rates data, experimentally it is proved that
Conjugate gradient and Quasi Newton’s methods outperformed the
Gradient Descent methods. In case of the Levenberg-Marquardt
algorithm is of special interest in financial forecasting.
Keywords— Backpropagation algorithm, conjugacy condition,
line search, matrix perturbation
I. INTRODUCTION

F

eed forward neural networks are composed of neurons in
which the input layer of neurons is connected to the output
layer through one or more layers of intermediate neurons,
The training process of neural networks involves adjusting the
weights till a desired input/output relationship is obtained
[12], [19], [34]. The majority of adaptation learning
algorithms are based on the Widrow-Hoffback-algorithm [4].
The mathematical characterization of a multilayer feedforward
network is that of a composite application of functions [36].
Each of these functions represents a particular layer and may
be specific to individual units in the layer, e.g. all the units in
the layer are required to have same activation function. The
overall mapping is thus characterized by a composite function
relating feedforward network inputs to output. That is

O = f composite (x )
Using p-mapping layers in a p+1 layer feedforward net yield

O=f

Lp

(f

L p −1

( (x ).......) ) .Thus the interconnection

. . .. f

L1

Manuscrot received on December, 03. 2004.
Dr. S. M. Aqil Burney is Professor in the Department of Computer Science,
University of Karachi, Pakistan, Phone: 0092-21-9243131 ext. 2447, fax:
0092-21-9243203, Burney@computer.Org, aqil_burney@yahoo.com.
Tahseen Ahmed Jilani is lecturer in the Department of Computer Science,
university of Karachi and is Ph.D. research associate in the Department of
Statistics. University of Karachi, Pakistan. tahseenjilani@yahoo.com
Cemal Ardil is with the Azerbaijan National Academy of Aviation, Baku,
Azerbaijan. cemalardil@gmail.com

weights from unit k in L1 to unit i in L2 are w L1→ L2 . If hidden
units have a sigmoidal activation function, denoted

f

sig

⎞⎫
⎛ I
L
⎜ ∑ w kj0 → L1 i j ⎟⎪
⎟⎬
⎜ j =1
⎠⎪
⎝
⎭
Above equation illustrates neural network with supervision
and composition of non-linear function.
H1
⎧
⎪
L
O iL2 = ∑ w ik1 → L2 ⎨f ksig
⎪
k =1
⎩

II. LEARNING IN NEURAL NETWORKS

Learning is a process by which the free Parameters of a neural
network are adapted through a continuing process of
stimulation by the environment in which the network is
embedded [2]. The type of learning is determined by the
manner in which the parameters changes take place. A
prescribed set of well- defined rules for the solution of a
learning problem is called learning algorithm [6].The
Learning algorithms differ from each other in the way in
which the adjustment ∆wkj to the synaptic weight wkj is
formulated.
The basic approach in learning is to start with an untrained
network. The network outcomes are compared with target
values that provide some error. Suppose that t k (n ) denote
th

some desired outcome (response) for the k neuron at time n
and let the actual response of the neuron is Ok (n ) . Suppose the

response y k (n ) was produced when x(n ) applied to the

network. If the actual response y k (n ) is not same as t k (n ) ,
we may define an error signal as

e k (n ) = t k (n ) − y k (n )
The purpose of error–correction learning is to minimize a cost
function based on the error signal e k (n ) . Once a cost
function is selected, error-correction learning is strictly an
optimization problem. A cost function usually used in neural
networks is mean-square-error criteria called L.M.S learning.

⎡1
⎤
J = E ⎢ ∑ e 2 (n )⎥
k
⎣2 k
⎦

(1)

Here summation runs over all neurons in the output layer of
the network. This method has the task of continually search
for the bottom of cost function in iterative manner
Minimization of the cost function J with respect to free

134
World Academy of Science, Engineering and Technology
International Journal of Computer, Information Science and Engineering Vol:1 No:1, 2007

parameters of the network leads to so-called Method of
Gradient Descent.

w k ,(n +1) = w k ,(n ) − η ∇J (n )

(2)

η

is called learning rate and it defines the proportion of error
for weight updating (correction). The learning parameter has a
profound impact on the performance of convergence of
learning [21].
A plot of the cost function versus the synaptic weights
characterizes the neural network consists of a
multidimensional surface called error surface. The neural
network consists of cross-correction learning algorithm to
start from a n arbitrary point on the error surface (initial
weights) and then move towards a global minima, in step-bystep fashion.

function with respect to the weights and biases in the network.
Gradient Descent algorithm is the most commonly used error
backpropagation method [13], [14].
Standard backpropagation is a gradient descent algorithm, in
which the network weights are moved along the negative of
the gradient of the performance function. There are a number
of variations on the basic algorithm that are based on other
standard optimization techniques, such as conjugate gradient
[22], [23].
Backpropagation was created by generalizing the WidrowHoff learning rule to multiple-layered networks with nonlinear differentiable transfer functions.
For If the scalar product of the connections (weights) and the
stimulus, we define

net = w T . x
is large and have a differentiable activation function. Given a
training set of input-target pairs of the form
International Science Index 1, 2007 waset.org/publications/9681

Local Minimum

H =

{( x

1

)(

) (

, t 1 , x 2 , t 2 , , .. x n , t n

( )

d e p 2 we use the error
dw
th
correction method to adjust w see [8]. Let for the k
element of w , i.e., we have
The basic idea is to compute

Global Minimum

( )

2

Fig: 1 Quadratic error surface with local and global minima.

We can gain understanding and intuition about the algorithm
by studying error surfaces themselves—the function J (w ) .
Such an error surface depends upon the task in hand, but even
there are some general properties of error surfaces that seem
to hold over a broad range of real-world pattern recognition
problems. In case of non-linear neural network, the error
surface may have troughs, valleys, canyons, and host of
shapes, where as in low dimensional data, contains many
minima and so many local minima plague the error
landscapes, then it is unlikely that the network will find the
global minimum. Another issue is the presence of plateausregions where the error varies only slightly as a function of
weights see [9], [15] and. Thus in presence of many plateaus,
training will get slow. To overcome this situation momentum
is introduced that forces the iterative process to cross saddle
points and small landscapes [16]. Neural network training
begins with small weights; the error surface in the
neighborhood of w ≈ 0 will determine the general direction of
descent. High dimensional space may afford more ways
(dimensions) for the system to ‘get around’ a barrier or local
maximum during training. For large networks, the error varies
quite gradually as a single weight is changed.
III. BACKPROPAGATION
For networks having differentiable activation functions, there
exist a powerful and computationally efficient method, called
error backpropagation for finding the derivatives of an error

)}

( )

2

d ep
d ep
dO p d net
.
.
=
p
dw k
d net d w k
dO

(

)

⎡ d f net p ⎤ p
= Op −tp ⎢
.xk
p ⎥
⎣ d net ⎦

(

)

which is the gradient of the pattern error with respect to
weight wk and forms the basis of the gradient descent training
algorithm. Specifically, we assign the weight correction, ∆w k ,
such that

Fig 2 Gradient Descent algorithm showing minimization of cost
function

135
World Academy of Science, Engineering and Technology
International Journal of Computer, Information Science and Engineering Vol:1 No:1, 2007

) (

)

⎡
⎤
d f net p
w kj +1 = w kj − η ⎢ O p − t p .
. x kp ⎥
p
d net
⎣
⎦
and for linear activation function

(

w

j +1
k

(

(3)

of the algorithm, the component of the gradient parallel to the
previous search direction which has just been made zero, is
unaltered [17, ][24]. Suppose, we have already performed a
line minimization along the direction d n , starting from the

)

⎡ Op −t p
⎤ j +1
j
p
p
= w − η ⎢.
. x kp ⎥ w k = w k + η e x k
p
⎣ − 2e
⎦
j
k

point w n , to give the new point w n +1 . Then at the

IV. LINE SEARCH
Mostly, the learning algorithms involve a sequence of steps
through weight space. We can consider each of these steps in
two parts. First decide the direction in which to move and
secondly, how far to move in that direction. This direction of
search provides minimum of the error function in that
direction in weight space [18], [25], [27]. This procedure is
referred to as line search and it forms the basis for several
algorithms which are considerably more powerful than
gradient descent. Suppose at step n in some algorithm the
current weight vector is w n and we wish to obtain a
International Science Index 1, 2007 waset.org/publications/9681

particular search direction p n through weight space. The
minimum along the search direction then gives the next values
for the weight vector as:

w n +1 = w n + λ n p n
where the parameter λ is chosen to minimize

(4)

E (λ ) = E (w n + λ p n )

(5)

This gives us an automatic procedure for setting the step
length, once we have chosen the search direction. The line
search represents a one dimensional minimization problem.
This minimization can be performed in a number of ways. A
simple approach would be to proceed long the search direction
in small steps, evaluating the error function at each step
(position), and stop when the error starts to increase. It is
possible, however, to find much more efficient approaches.
This includes the issue that whether local gradient information
is preferable in line search. The search direction is towards the
minimum is obtained (possibly) through proper weight
adjustments, this involves search through negative direction of
gradient information.
V. CONJUGATE GRADIENT

To apply line search to the problems of error function
minimization, we need to choose a suitable search direction at
each stage of the algorithm. Note that at minimum of the line
search
∂
E (w n + λ p n ) = 0
∂λ

gives

g (w n +1 ) . p n = 0 Where g n =
T

converge, even for a quadratic error function. The solution to
this problem lies in choosing the successive search
directions d n to minimize cost function such that, at each step

(6)

∂E
∂ w n +1

(7)

Thus the gradient of the new minimum is orthogonal to the
search direction. Choosing successive search directions to the
local (negative) gradient directions can lead to the problem of
slow learning. The algorithm can then take many steps to

point w n +1 , we have

g n +1 (w ) . p n = 0

(8).

Now the new search d n such that

g (w n +1 + λ p n +1 ) . p n = 0
T

gives

p n +1 . p n = o

(9)

that is along the new direction, the component of the gradient
parallel to the previous search direction remains zero [31].
VI. CONJUGATE GRADIENT TECHNIQUES

The conjugate Descent algorithms have been derived on the
assumption of a quadratic error function with a positive
definite Hessian matrix. For a non-quadratic error function,
the Hessian matrix will depend on the current weight vector,
and so will need to be re-evaluated at each step of the
algorithm. Since the evaluation of Hessian is computationally
costly for non-linear neural networks, and since its evaluation
would have to be done repeatedly, we would like to avoid
having to use the Hessian. In fact, it turns out that the search
direction and learning rate can be found with out explicit
knowledge of H. This leads to the CGA.
The conjugate method avoids this problem by incorporating
an intricate relationship between direction and gradient vector.
The conjugate gradient method is guaranteed to locate the
minimum of any quadratic function of N variables in at most
N steps. But for non-quadratic function, like as the cost
function in M.L.P, the process is iterative rather than N-step,
and a criterion for convergence is required. The weight vector
of the network is updated in accordance with the rule in [4]
In conjugate gradient approach, successive steps in ∆w n are
chosen such that weight correction at previous step is
preserved. A direction for yth weight correction is computed
and line minimization in this direction is performed,
generating w n +1 .
Successive weight corrections are constrained to be conjugate
to those used previously. Interestingly, this is achieved
without inversion of Hessian matrix, when compared with
Newton’s method. There exists several techniques, most often
gradient is involved. Suppose initial weight correction is
p (0) = − g (0) . Line minimization in the direction p (0 ) takes
weight vector w (0) . Beginning with K = 0 , we require

136
World Academy of Science, Engineering and Technology
International Journal of Computer, Information Science and Engineering Vol:1 No:1, 2007

subsequent gradient to be orthogonal to the previous weight
correction direction; that is from [8]

p n T . g n+i = 0

; i = 1,2,..., n

p n [g n + 2 − g n +1 ]= 0

(10)

Where g n + 2 − g n +1 is the change of gradient of E from w n +1
to w n + 2 .

g (n ) = F + H (w − w 0 )

p T H p n +1 = 0
n

Therefore

(11)

which is called the conjugacy condition, see fig. 3. Weight
correction direction p n + 2 and p n +1 are said to be conjugate to
one another in the context of H , or H -conjugate. It is
important to note that there exist several methods of finding
conjugate vector p(n + 1) see [29] and [31]. Each search
direction vector is then computed as a linear combination of
the current gradient vector and the previous direction vector.

p n +1 = − g n +1 + β n +1 . p n
defines

a

(12)

path

or sequence of search directions
in the weight space where β n is a time
p n ; n = 1,2,..., n
varying parameter. There are various rules for determining β n
in terms of the gradient vector g n and g n +1 . Each weight
correction p n is computed sequentially with p 0 and is
International Science Index 1, 2007 waset.org/publications/9681

formed as the sum of the current negative gradient direction
and a scaled version (i.e., β n ) of previous correction. Using
(11) and (12), we have

p n H p n +1 = p n H [− g n +1 + β n +1 p n ]
T

T

T

p n H g n +1

(13)
T
pn H pn
In most of the conjugate gradient algorithms, the step size is
adjusted at each iteration, and a search is made along the
conjugate gradient direction to determine the step size which
minimizes the performance function along that line [10].
There are a number of different search functions. Some of
these are,

β n +1 =

Fletcher-Reeves β n +1 = g n +1 . g n +1
T
g n .gn
T

Polak-Ribiere β n +1 = g n +1 .(g n +1 − g n )
T
g n .g n
T

Fig. 3 Conjugal gradient descent in weight space employs a sequence
of line searches. If ∆w 1 is the first descent direction, the second

direction obeys conjugacy condition; as such, the second descent does
not “spoil” the contribution due to the previous line search. In the case
where the Hessian is diagonal (right), the direction of the line searches
are orthogonal [12].

Using Taylor’s series, we have

E(w ) ≈ E(w 0 ) +

∂E
∂w

+
w= w 0

2
1
(w − w 0 ) ∂ 2 E
2!
∂ w

2
2
where J = ∂E and H = ∂ E = ⎡ ∂ E ⎤
⎢
⎥
2
∂w
∂ w ⎢ ∂wi ∂w j ⎥
⎣
⎦

Differentiating with respect to w , we have

(w − w 0 )
w= w 0

(14)
(15)

Powell-Beale Restarts β n +1 = g n +1 .(g n +1 − g n )
(16)
T
p n (g n +1 − g n )
The conjugate gradient algorithms are usually much faster
than variable leaning rate backpropagtaion algorithm,
although the results will vary from one problem to another.
The conjugate Gradient algorithms required only a little more
storage than the simple algorithm, so that are often a good
choice for networks with a large number of weights. It is
important to note that above three expressions are equivalent
provided the error function is exactly quadratic. For nonquadratic error functions, the Polak-Ribiere form is generally
found to give slightly better results that the other expressions,
probably due to the fact that, if the algorithm is making little
progress, so that successive gradient vectors are very similar,
the Polak -Ribiere form gives a small value for β j so that the
T

137
World Academy of Science, Engineering and Technology
International Journal of Computer, Information Science and Engineering Vol:1 No:1, 2007

search direction p n +1 tends to be reset to the negative
gradient direction, which is equivalent to restarting the
conjugate gradient procedure. For a quadratic error function
the conjugate gradient algorithm finds the minimum after at
most w line minimization s, with out calculating the Hessian
matrix [39]. This clearly represents a significant improvement
on the simple gradient descent approach which could take a
very large number of steps to minimize even a quadratic error
function.

the Newton location for a quadratic error surface, evaluated at
any w , points directly at the minimum of the error function
provided H is positive definite. Since the quadratic
approximation in above algorithm is not exact, it would be
necessary to apply w * = w − H −1 g iteratively with the
Hessian being re-evaluated at each new search [40]. The
Hessian for non-quadratic networks is computationally
demanding since it required O Nw 2 steps, where w is the
number of weights in the network and N is the number of
patterns in the data and O w 3 operation for inversion in
each iteration [5], [32]. The Newton’s step may move towards
a maximum or a saddle point rather than a minimum. This
occur if the Hessian is not positive definite. Thus the error is
not guaranteed to be reduced at each iteration see [6], [11].

(

)

( )

International Science Index 1, 2007 waset.org/publications/9681

One of the drawbacks of Newton’s method is that it requires
the analytical derivative of Hessian matrix at each iteration.
This is a problem if the derivative is very expensive or
difficult to compute. In such cases it may be convenient to
iterate according to

w n +1 = w n − G −1 g n
Fig. 4 A comparison between GDA and CGA
minimization.

VII. TRUST REGION MODELS
A large class of methods for unconstrained optimization is
based on a slightly different model algorithm. In particular,
the step size α j is nearly always taken as unity. In order to
ensure that he descent condition holds, it may thus be
necessary to compute several trial vectors. Methods of this
type are often termed as trust-region methods.
For large values of

λn

the step size becomes small.

Techniques such as this are well in standard optimization
theory, where they are called model trust region methods [35],
[37] and [38] because the model is effectively trusted only in a
small region around the current search point. The size of the
trust region is governed by the parameter λ n , so that the large

λn

gives the small trust region see [33].

(17)

where G is an easily computed approximation to H. For
example, in one dimension, the secant method approximates
the derivative with the difference quotient

a (n ) =

g (w n +1 ) − g(w n )
w n +1 − w n

such an iteration is called a quasi-Newton method. If G is
positive definite, as it usually is, an alternative name is
variable metric method. One positive advantage to using an
approximation in place of H is that G can be chosen to be
positive definite, ensuring that the step will not be attracted to
a maximum of f when one wants a minimum. Another
advantage is that G (n ) g (w n ) is a descent direction
−1

from w n allowing the use of line searches.
Among general purpose quasi-Newton algorithms, the best is
probably the Broydon–Fletcher–Goldfarb–Shanno (BFGS)
algorithm. The BFGS algorithm builds upon the earlier and
similar Davidon–Fletcher–Powell (DFP) algorithm. The
BFGS algorithm starts with a positive definite matrix
approximation to H (w 0 ) usually the identity matrix I . At

VIII. QUASI NEWTON METHODS

each iteration it makes a minimalist modification to gradually

Using the local quadratic approximation, we can obtain
directly an expression for the location of the minimum or
more generally the stationary point of the error function

IX. LEVENBERG-MARQUARDT ALGORITHM

( ) 1 (w − w ) H (w − w )
2

E (w ) = E w * +

gives

* T

*

w * = w − H −1 g

The vector − H −1 g is known as the Newton step or the
Newton’s direction [20],[29]. Unlike the local gradient vector,

−1

approximate H .

Levenberg-Marquardt is a trust region based method with
hyper-spherical trust region. This method work extremely well
in practice, and is considered the most efficient algorithm for
training median sized artificial neural networks. Like QuasiNewton methods, the Levenberg-Marquardt algorithm was
designed to approach second order training speed with out
having to compute Hessian matrix [1]. When the performance

138
World Academy of Science, Engineering and Technology
International Journal of Computer, Information Science and Engineering Vol:1 No:1, 2007

function has the form of a sum of squares, then from (17) we
have

w n +1 = w n − H −1 ∇F ( w )

(18)

here the Hessian matrix can be approximated as, H ≈ J
and the gradient can be computed as

T

J

Gradient descent method with momentum and learning rate is
considered. Finally, the results of forecasting for 10 days are
shown in the table. It is best to use the forecast for only one
day due to possibility of error growth. The performance goal
is 10

(

T

)

−1

g =J T e

Target
Values

International Science Index 1, 2007 waset.org/publications/9681

Where (J T J ) is positive definite, but if it is not, then, we
make some perturbation into it that will control the probability
of being non positive definite. Such that the recursion
equation is

(

)

T

15.95
16.05
15.95
15.80
15.35
15.45
15.55
15.60
15.75
15.80

Jn T en

W +1 = W − Jn Jn + λ I −1 Jn T en
n
n

(19)

Where in neural computing context the quantity
the learning parameter, it ensures that J

T

.
Table: 1

A comparison of neural network training algorithms (in m iterations)

where the Jacobian matrix J contains the first derivatives of
the network errors with respect to weights and biases, and e
is a vector of network errors. The Gauss-Newton update
formula can be

W n+1 = Wn − Jn Jn

−5

λ

is called

J is positive

definite. It is always possible to choose λ sufficiently large
enough to ensure a descent step. The learning parameter is
decreased as the iterative process approaches to a minimum.
Thus, in Levenberg-Marquardt optimization, inversion of

square matrix J J + λ I is involved which requires large
memory space to store the Jacobian matrix and the
approximated
T

Daily Shares rates data is considered and presented in
concurrent form. For programming in C/C++, we used the
approach of [3], [7], [26] and [30]. The average number of
iterations to converge to the targets are indicated with each
algorithm

[1]

[2]
[3]

[4]
[5]

[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]

Fig. 5. Time series plot of share rates
We trained the network for each method and average of 50
successful results are taken and presented in the table.

BGFS

L.M

CGFP

CGPR

45

21

400

240

SCG
1700

15.95
16.05
15.95
15.80
15.35
15.45
15.55
15.60
15.75
15.80

15.95
16.05
15.95
15.80
15.35
15.45
15.55
15.60
15.75
15.80

15.95
16.05
15.95
15.80
15.35
15.45
15.56
15.60
15.75
15.80

15.95
16.05
15.95
15.80
15.35
15.45
15.55
15.60
15.75
15.80

15.95
16.05
15.95
15.80
15.35
15.45
15.55
15.60
15.75
15.80

REFERENCES

[6]

X. NETWORK TRAINING RESULTS

GD
2500
0
15.92
15.92
19.93
15.77
15.44
15.58
15.65
15.65
15.77
15.78

[16]
[17]

139

Aqil Burney S.M., Jilani A. Tahseen and Cemal Aril, 2004. LevenbergMarquardt algorithm for Karachi Stock Exchange share rates
forecasting. International Journal of Computational Intelligence, pp 168173.
Aqil Burney S.M., Jilani A. Tahseen, 2002, Time Series forecasting
using artificial neural network methods for Karachi Stock Exchange. A
Project at Department of Computer Science, University of Karachi.
Aqil Burney S.M. Measures of central tendency in News-MCB stock
exchange. Price index pp 12-31. Developed by Vital information
services (pvt) limited, Karachi. C/o Department of Computer Science,
University of Karachi.
B. Widrow and M. A. Lehr, 1990. 30 years of adaptive neural networks:
Perceptron, Madaline, and Backpropagation. IEEE Proceedings-78, pp.
1415-1442.
Barak A. Pearlmutter, 1993. Fast Exact Multiplication by the Hessian.
Siemens Corporate Research, Neural Computation.
Battiti, 1992. First and second-order methods for learning: between
steepest descent and Newton’s method. Neural Computation 4, pp. 141166.
Blum, Adam, 1992. Neural Network in C++. John Wiley and Sons, New
York.
Bortolettis, A., Di Fiore, C., Aselli, S. and Bellini, P. A new class of
quasi-Newtonian methods for optimal learning in MLP-networks, IEEE
Transactions on Neural Networks, vol: 14, Issue: 2.
Duda, R. O. Hart, P. E. and Stork, D. G. 2001. Pattern Classification,
John Wiley & Sons.
C. M. Bishop, 1995. Neural networks for pattern recognition. Clarendon
Press.
Chauvin, Y., & Rumelhart, D. 1995. Backpropagation: theory,
architecture, and applications. Lawrence Erlbaum Association.
Chung-Ming Kuan and Tung Liu. 1995. Forecasting exchange rates
using feedforward and recurrent neural networks. Journal of Applied
Econometrics, pp. 347-64.
De Villiers and E. Barnard 1993, Backpropagation neural nets with one
and two hidden layers, IEEE Transaction on Neural Networks 4, pp.
136-141
E. Rumelhart, G. E. Hinton, and R. J. Williams, 1986. Learning internal
representations by error propagation, Parallel distributed processing.
Vol. I, MIT Press, Cambridge, MA, pp. 318-362.
Ergezinger and E. Thomsen. 1995. An accelerated learning algorithm for
multilayer perceptrons: optimization layer by layer. IEEE Transaction on
Neural Networks 6, pp. 31-42.
F. Molar, 1997. Efficient Training of feedforward Neural Networks.
Ph.D thesis, Computer Science Department, Aarhus University
F. Møllar, 1993. A Scaled Conjugate Gradient Algorithm for Fast
Supervised Learning. Neural Networks, 6(4): pp. 525-533.
International Science Index 1, 2007 waset.org/publications/9681

World Academy of Science, Engineering and Technology
International Journal of Computer, Information Science and Engineering Vol:1 No:1, 2007

[18] Gill, P. E., Murray, W. and Wright, M. H. 1993. Practical Optimization”,
Academic Press.
[19] Iebeling Kaastra and Milton S.Boyd, 1995. Forecasting future trading
volume using neural networks. Journal of Future Markets, pp. 953-70.
[20] J. Shepherd, 1997. Second-order optimization methods. Second-order
methods for neural networks. Springer-Verlag, Berlin, New York, pp.
43-72.
[21] Jacobs, 1988. Increased rate of convergence through learning rate
adaptation. Neural Networks 1, pp.295-307.
[22] Hornik, M. Stinchcombe, and H. White, 1989. Multi-layer feedforward
networks are universal approximators. Neural Networks 2, pp. 359-366.
[23] Karayiannis and A. N. Venetsanopoulos, 1993. Artificial neural
networks: learning algorithms, performance evaluation, and applications.
Kluver Academic, Boston, MA.
[24] M. R. Hestenes and E. Stifle, 1952. Methods of conjugate gradients for
solving linear systems. Journal of Research of the National Bureau of
Standards-49, pp. 409–436.
[25] Michael Husken, Jens E. Gaykoand Bernard Sendoff, 2000.
Optimization for Problem Classes-Neural Networks that Learn to Learn.
IEEE Symposium of Evolutionary computation and Neural networks
(ECNN-2000), pages 98-109, IEEE Press 2000.
[26] Mir F. Atiya, Suzan M. El-Shoura, Samir I. Shaken, 1999. A comparison
between neural network forecasting techniques- case study: river flow
forecasting". IEEE Transactions on Neural Networks. Vol. 10, No. 2.
[27] N. N. Schraudolph, Thore Grapple, 2003. Combining Conjugate
Direction Methods with Stochastic Approximation of Gradients.
Proceedings of the Ninth International Workshop on Artificial
Intelligence and Intelligence.
[28] N. N. Schraudolph, 2002. Fast curvature matrix-vector product for
second-order gradient descent. Neural Computation, 14(7), pp. 17231728.
[29] N. N. Schraudolph, 1999. Local gain adaptation in stochastic-gradient
descent. In Proceedings of the Ninth International Conference on
Artificial Neural Networks, pp. 569–574, Edinburgh, Scotland, 1999.
IEE, London.
[30] Neural network toolbox, 2002. : For use with Matlab, MathWorks,
Natick, MA.
[31] Patrick van der Smagt, 1994. Minimization methods for training
feedforward neural networks. Neural Networks 7(1), pp. 1-11.
[32] Pearlmutter, 1994. Fast exact multiplication by the Hessian. Neural
Computation, 6(1), pp.147–160.
[33] Puha, P. K. H. Daohua Ming, 2003. Parallel nonlinear optimization
techniques for training neural networks. IEEE Transactions on Neural
Networks, vol: 14, Issue: 6, pages, pp. 1460-1468.
[34] Scarselli and A. C. Tso, 1998. Universal approximation using
feedforward neural networks: a survey of some existing methods, and
some new results. Neural Networks, pp. 1537.
[35] Ripley, B. D. 1994. Neural networks and related methods for
classification. Journal of the Royal Statistician Society, B 56(3),409456.
[36] S. Haykin, Neural networks, 1994. A comprehensive foundation, IEEE
Press, Piscataway, NJ.
[37] S.D. Hunt, J. R. Deller, 1995. Selective training of feedforward
artificialneural networks using matrix perturbation theory. Neural
networks, 8(6), pp 931-944.
[38] Welstead, Stephen T. 1994. Neural network and fuzzy logic applications
in C/C++. John Wiley and Sons, Inc. N.Y.
[39] Z. Strako¢s and P. Tich´ y., 2002. On error estimation in the conjugate
gradient method and why it works in finite precision computations.
ETNA 13, 56–80.
[40] Zhou and J. Si, 1998. Advanced neural-network training algorithm with
reduced complexity based on Jacobian deficiency. IEEE Trans Neural
Networks 9, pp. 448-453.

degree in Mathematics from Strathclyde University, Glasgow with
specialization in estimation, modeling and simulation of multivariate Time
series models using algorithmic approach with software development.
He is currently professor and approved supervisor in Computer Science and
Statistics by the High education Commission Government of Pakistan. He is
also the project director of Umair Basha Institute of Information technology
(UBIT). He is also member of various higher academic boards of different
universities of Pakistan. His research interest includes AI, Soft computing
neural networks, fuzzy logic, data mining, statistics, simulation and stochastic
modeling of mobile communication system and networks and network
security. He is author of three books, various technical reports and supervised
more than 100software/Information technology projects of Masters level
degree programs and project director of various projects funded by
Government of Pakistan.
He is member of IEEE (USA), ACM (USA) and fellow of Royal Statistical
Society United Kingdom and also a member of Islamic society of Statistical
Sciences. He is teaching since 1973 in various universities in the field of
Econometric, Bio-Statistics, Statistics, Mathematic and Computer Science He
has vast education management experience at the University level. Dr. Burney
has received appreciations and awards for his research and as educationist.
Tahseen A. Jilani received the B.Sc.(Computer Science, Mathematics,
Statistics) from Islamia Science College affiliated with the University of
Karachi in 1998; First class second M.Sc. (Statistics,)
from Karachi University in 2001. Since 2003, he is
Ph.D. research fellow in the Department of Computer
Science, University of Karachi.
His research interest includes AI, neural networks, soft
computing, fuzzy logic, Statistical data mining and
simulation. He is teaching since 2002 in the fields of
Statistics, Mathematics and Computer Science.

S. M. Aqil Burney received the B.Sc.(Physics,
Mathematics, Statistics) from D. J. College affiliated
with the University of Karachi in 1970; First class first
M.Sc. (Statistics,) from Karachi University in 1972
with M.Phil. in 1983 with specialization in
Computing, Simulation and stochastic modeling in
from risk management. Dr. Burney received Ph.D.

140

Contenu connexe

Tendances

Neural Networks: Model Building Through Linear Regression
Neural Networks: Model Building Through Linear RegressionNeural Networks: Model Building Through Linear Regression
Neural Networks: Model Building Through Linear RegressionMostafa G. M. Mostafa
 
CS221: HMM and Particle Filters
CS221: HMM and Particle FiltersCS221: HMM and Particle Filters
CS221: HMM and Particle Filterszukun
 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)James McMurray
 
Training and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian ProcessesTraining and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian ProcessesKeyon Vafa
 
010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian ProcessHa Phuong
 
Neural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) AlgorithmNeural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) AlgorithmMostafa G. M. Mostafa
 
Classification of handwritten characters by their symmetry features
Classification of handwritten characters by their symmetry featuresClassification of handwritten characters by their symmetry features
Classification of handwritten characters by their symmetry featuresAYUSH RAJ
 
Radial basis function network ppt bySheetal,Samreen and Dhanashri
Radial basis function network ppt bySheetal,Samreen and DhanashriRadial basis function network ppt bySheetal,Samreen and Dhanashri
Radial basis function network ppt bySheetal,Samreen and Dhanashrisheetal katkar
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringAllenWu
 
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)neeraj7svp
 
Co-clustering of multi-view datasets: a parallelizable approach
Co-clustering of multi-view datasets: a parallelizable approachCo-clustering of multi-view datasets: a parallelizable approach
Co-clustering of multi-view datasets: a parallelizable approachAllen Wu
 
Bridging knowledge graphs_to_generate_scene_graphs
Bridging knowledge graphs_to_generate_scene_graphsBridging knowledge graphs_to_generate_scene_graphs
Bridging knowledge graphs_to_generate_scene_graphsWoen Yon Lai
 
Review : Prototype Mixture Models for Few-shot Semantic Segmentation
Review : Prototype Mixture Models for Few-shot Semantic SegmentationReview : Prototype Mixture Models for Few-shot Semantic Segmentation
Review : Prototype Mixture Models for Few-shot Semantic SegmentationDongmin Choi
 
Performance Analysis of Various Activation Functions in Generalized MLP Archi...
Performance Analysis of Various Activation Functions in Generalized MLP Archi...Performance Analysis of Various Activation Functions in Generalized MLP Archi...
Performance Analysis of Various Activation Functions in Generalized MLP Archi...Waqas Tariq
 
Neural Networks: Support Vector machines
Neural Networks: Support Vector machinesNeural Networks: Support Vector machines
Neural Networks: Support Vector machinesMostafa G. M. Mostafa
 

Tendances (20)

Clustering lect
Clustering lectClustering lect
Clustering lect
 
Zoooooohaib
ZoooooohaibZoooooohaib
Zoooooohaib
 
Neural Networks: Model Building Through Linear Regression
Neural Networks: Model Building Through Linear RegressionNeural Networks: Model Building Through Linear Regression
Neural Networks: Model Building Through Linear Regression
 
CS221: HMM and Particle Filters
CS221: HMM and Particle FiltersCS221: HMM and Particle Filters
CS221: HMM and Particle Filters
 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)
 
Bistablecamnets
BistablecamnetsBistablecamnets
Bistablecamnets
 
Training and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian ProcessesTraining and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian Processes
 
010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process
 
Neural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) AlgorithmNeural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) Algorithm
 
Classification of handwritten characters by their symmetry features
Classification of handwritten characters by their symmetry featuresClassification of handwritten characters by their symmetry features
Classification of handwritten characters by their symmetry features
 
Radial basis function network ppt bySheetal,Samreen and Dhanashri
Radial basis function network ppt bySheetal,Samreen and DhanashriRadial basis function network ppt bySheetal,Samreen and Dhanashri
Radial basis function network ppt bySheetal,Samreen and Dhanashri
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
 
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
 
Co-clustering of multi-view datasets: a parallelizable approach
Co-clustering of multi-view datasets: a parallelizable approachCo-clustering of multi-view datasets: a parallelizable approach
Co-clustering of multi-view datasets: a parallelizable approach
 
Bridging knowledge graphs_to_generate_scene_graphs
Bridging knowledge graphs_to_generate_scene_graphsBridging knowledge graphs_to_generate_scene_graphs
Bridging knowledge graphs_to_generate_scene_graphs
 
Review : Prototype Mixture Models for Few-shot Semantic Segmentation
Review : Prototype Mixture Models for Few-shot Semantic SegmentationReview : Prototype Mixture Models for Few-shot Semantic Segmentation
Review : Prototype Mixture Models for Few-shot Semantic Segmentation
 
G013124354
G013124354G013124354
G013124354
 
Performance Analysis of Various Activation Functions in Generalized MLP Archi...
Performance Analysis of Various Activation Functions in Generalized MLP Archi...Performance Analysis of Various Activation Functions in Generalized MLP Archi...
Performance Analysis of Various Activation Functions in Generalized MLP Archi...
 
poster-lowe-6
poster-lowe-6poster-lowe-6
poster-lowe-6
 
Neural Networks: Support Vector machines
Neural Networks: Support Vector machinesNeural Networks: Support Vector machines
Neural Networks: Support Vector machines
 

En vedette

Automatic authentication-of-handwritten-documents-via-low-density-pixel-measu...
Automatic authentication-of-handwritten-documents-via-low-density-pixel-measu...Automatic authentication-of-handwritten-documents-via-low-density-pixel-measu...
Automatic authentication-of-handwritten-documents-via-low-density-pixel-measu...Cemal Ardil
 
A simplified-single-correlator-rake-receiver-for-cdma-communications
A simplified-single-correlator-rake-receiver-for-cdma-communicationsA simplified-single-correlator-rake-receiver-for-cdma-communications
A simplified-single-correlator-rake-receiver-for-cdma-communicationsCemal Ardil
 
Development of-new-control-techniques-for-vibration-isolation-of-structures-u...
Development of-new-control-techniques-for-vibration-isolation-of-structures-u...Development of-new-control-techniques-for-vibration-isolation-of-structures-u...
Development of-new-control-techniques-for-vibration-isolation-of-structures-u...Cemal Ardil
 
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-Cemal Ardil
 
Identification-of-aircraft-gas-turbine-engines-temperature-condition-
 Identification-of-aircraft-gas-turbine-engines-temperature-condition- Identification-of-aircraft-gas-turbine-engines-temperature-condition-
Identification-of-aircraft-gas-turbine-engines-temperature-condition-Cemal Ardil
 
A black-box-approach-for-response-quality-evaluation-of-conversational-agent-...
A black-box-approach-for-response-quality-evaluation-of-conversational-agent-...A black-box-approach-for-response-quality-evaluation-of-conversational-agent-...
A black-box-approach-for-response-quality-evaluation-of-conversational-agent-...Cemal Ardil
 
Design development-and-implementation-of-a temperature-sensor-using-zigbee-co...
Design development-and-implementation-of-a temperature-sensor-using-zigbee-co...Design development-and-implementation-of-a temperature-sensor-using-zigbee-co...
Design development-and-implementation-of-a temperature-sensor-using-zigbee-co...Cemal Ardil
 
Adaptive non-linear-filtering-technique-for-image-restoration
Adaptive non-linear-filtering-technique-for-image-restorationAdaptive non-linear-filtering-technique-for-image-restoration
Adaptive non-linear-filtering-technique-for-image-restorationCemal Ardil
 
E collaborative-learning-circles
E collaborative-learning-circlesE collaborative-learning-circles
E collaborative-learning-circlesCemal Ardil
 
Approximate bounded-knowledge-extractionusing-type-i-fuzzy-logic
Approximate bounded-knowledge-extractionusing-type-i-fuzzy-logicApproximate bounded-knowledge-extractionusing-type-i-fuzzy-logic
Approximate bounded-knowledge-extractionusing-type-i-fuzzy-logicCemal Ardil
 
Fuzzy c means_realestate_application
Fuzzy c means_realestate_applicationFuzzy c means_realestate_application
Fuzzy c means_realestate_applicationCemal Ardil
 
Mimo system-order-reduction-using-real-coded-genetic-algorithm
Mimo system-order-reduction-using-real-coded-genetic-algorithmMimo system-order-reduction-using-real-coded-genetic-algorithm
Mimo system-order-reduction-using-real-coded-genetic-algorithmCemal Ardil
 
Application of-computational-intelligence-techniques-for-economic-load-dispatch
Application of-computational-intelligence-techniques-for-economic-load-dispatchApplication of-computational-intelligence-techniques-for-economic-load-dispatch
Application of-computational-intelligence-techniques-for-economic-load-dispatchCemal Ardil
 
Aircraft gas-turbine-engines-technical-condition-identification-system
Aircraft gas-turbine-engines-technical-condition-identification-systemAircraft gas-turbine-engines-technical-condition-identification-system
Aircraft gas-turbine-engines-technical-condition-identification-systemCemal Ardil
 
Electricity consumption-prediction-model-using neuro-fuzzy-system
Electricity consumption-prediction-model-using neuro-fuzzy-systemElectricity consumption-prediction-model-using neuro-fuzzy-system
Electricity consumption-prediction-model-using neuro-fuzzy-systemCemal Ardil
 
Bip based-alarm-declaration-and-clearing-in-sonet-networks-employing-automati...
Bip based-alarm-declaration-and-clearing-in-sonet-networks-employing-automati...Bip based-alarm-declaration-and-clearing-in-sonet-networks-employing-automati...
Bip based-alarm-declaration-and-clearing-in-sonet-networks-employing-automati...Cemal Ardil
 
Energy efficient-resource-allocation-in-distributed-computing-systems
Energy efficient-resource-allocation-in-distributed-computing-systemsEnergy efficient-resource-allocation-in-distributed-computing-systems
Energy efficient-resource-allocation-in-distributed-computing-systemsCemal Ardil
 

En vedette (17)

Automatic authentication-of-handwritten-documents-via-low-density-pixel-measu...
Automatic authentication-of-handwritten-documents-via-low-density-pixel-measu...Automatic authentication-of-handwritten-documents-via-low-density-pixel-measu...
Automatic authentication-of-handwritten-documents-via-low-density-pixel-measu...
 
A simplified-single-correlator-rake-receiver-for-cdma-communications
A simplified-single-correlator-rake-receiver-for-cdma-communicationsA simplified-single-correlator-rake-receiver-for-cdma-communications
A simplified-single-correlator-rake-receiver-for-cdma-communications
 
Development of-new-control-techniques-for-vibration-isolation-of-structures-u...
Development of-new-control-techniques-for-vibration-isolation-of-structures-u...Development of-new-control-techniques-for-vibration-isolation-of-structures-u...
Development of-new-control-techniques-for-vibration-isolation-of-structures-u...
 
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
 
Identification-of-aircraft-gas-turbine-engines-temperature-condition-
 Identification-of-aircraft-gas-turbine-engines-temperature-condition- Identification-of-aircraft-gas-turbine-engines-temperature-condition-
Identification-of-aircraft-gas-turbine-engines-temperature-condition-
 
A black-box-approach-for-response-quality-evaluation-of-conversational-agent-...
A black-box-approach-for-response-quality-evaluation-of-conversational-agent-...A black-box-approach-for-response-quality-evaluation-of-conversational-agent-...
A black-box-approach-for-response-quality-evaluation-of-conversational-agent-...
 
Design development-and-implementation-of-a temperature-sensor-using-zigbee-co...
Design development-and-implementation-of-a temperature-sensor-using-zigbee-co...Design development-and-implementation-of-a temperature-sensor-using-zigbee-co...
Design development-and-implementation-of-a temperature-sensor-using-zigbee-co...
 
Adaptive non-linear-filtering-technique-for-image-restoration
Adaptive non-linear-filtering-technique-for-image-restorationAdaptive non-linear-filtering-technique-for-image-restoration
Adaptive non-linear-filtering-technique-for-image-restoration
 
E collaborative-learning-circles
E collaborative-learning-circlesE collaborative-learning-circles
E collaborative-learning-circles
 
Approximate bounded-knowledge-extractionusing-type-i-fuzzy-logic
Approximate bounded-knowledge-extractionusing-type-i-fuzzy-logicApproximate bounded-knowledge-extractionusing-type-i-fuzzy-logic
Approximate bounded-knowledge-extractionusing-type-i-fuzzy-logic
 
Fuzzy c means_realestate_application
Fuzzy c means_realestate_applicationFuzzy c means_realestate_application
Fuzzy c means_realestate_application
 
Mimo system-order-reduction-using-real-coded-genetic-algorithm
Mimo system-order-reduction-using-real-coded-genetic-algorithmMimo system-order-reduction-using-real-coded-genetic-algorithm
Mimo system-order-reduction-using-real-coded-genetic-algorithm
 
Application of-computational-intelligence-techniques-for-economic-load-dispatch
Application of-computational-intelligence-techniques-for-economic-load-dispatchApplication of-computational-intelligence-techniques-for-economic-load-dispatch
Application of-computational-intelligence-techniques-for-economic-load-dispatch
 
Aircraft gas-turbine-engines-technical-condition-identification-system
Aircraft gas-turbine-engines-technical-condition-identification-systemAircraft gas-turbine-engines-technical-condition-identification-system
Aircraft gas-turbine-engines-technical-condition-identification-system
 
Electricity consumption-prediction-model-using neuro-fuzzy-system
Electricity consumption-prediction-model-using neuro-fuzzy-systemElectricity consumption-prediction-model-using neuro-fuzzy-system
Electricity consumption-prediction-model-using neuro-fuzzy-system
 
Bip based-alarm-declaration-and-clearing-in-sonet-networks-employing-automati...
Bip based-alarm-declaration-and-clearing-in-sonet-networks-employing-automati...Bip based-alarm-declaration-and-clearing-in-sonet-networks-employing-automati...
Bip based-alarm-declaration-and-clearing-in-sonet-networks-employing-automati...
 
Energy efficient-resource-allocation-in-distributed-computing-systems
Energy efficient-resource-allocation-in-distributed-computing-systemsEnergy efficient-resource-allocation-in-distributed-computing-systems
Energy efficient-resource-allocation-in-distributed-computing-systems
 

Similaire à A comparison-of-first-and-second-order-training-algorithms-for-artificial-neural-networks-

Design of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approachDesign of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approachEditor Jacotech
 
Design of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approachDesign of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approachEditor Jacotech
 
Web spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithmsWeb spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithmsaciijournal
 
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network AlgorithmsWeb Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network Algorithmsaciijournal
 
Black-box modeling of nonlinear system using evolutionary neural NARX model
Black-box modeling of nonlinear system using evolutionary neural NARX modelBlack-box modeling of nonlinear system using evolutionary neural NARX model
Black-box modeling of nonlinear system using evolutionary neural NARX modelIJECEIAES
 
NEURALNETWORKS_DM_SOWMYAJYOTHI.pdf
NEURALNETWORKS_DM_SOWMYAJYOTHI.pdfNEURALNETWORKS_DM_SOWMYAJYOTHI.pdf
NEURALNETWORKS_DM_SOWMYAJYOTHI.pdfSowmyaJyothi3
 
Advanced topics in artificial neural networks
Advanced topics in artificial neural networksAdvanced topics in artificial neural networks
Advanced topics in artificial neural networksswapnac12
 
A COMPARATIVE STUDY OF BACKPROPAGATION ALGORITHMS IN FINANCIAL PREDICTION
A COMPARATIVE STUDY OF BACKPROPAGATION ALGORITHMS IN FINANCIAL PREDICTIONA COMPARATIVE STUDY OF BACKPROPAGATION ALGORITHMS IN FINANCIAL PREDICTION
A COMPARATIVE STUDY OF BACKPROPAGATION ALGORITHMS IN FINANCIAL PREDICTIONIJCSEA Journal
 
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...IJCSEA Journal
 
Adaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on CooperativeAdaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on CooperativeESCOM
 
ML_Unit_2_Part_A
ML_Unit_2_Part_AML_Unit_2_Part_A
ML_Unit_2_Part_ASrimatre K
 
Optimization of Number of Neurons in the Hidden Layer in Feed Forward Neural ...
Optimization of Number of Neurons in the Hidden Layer in Feed Forward Neural ...Optimization of Number of Neurons in the Hidden Layer in Feed Forward Neural ...
Optimization of Number of Neurons in the Hidden Layer in Feed Forward Neural ...IJERA Editor
 
Adaptive modified backpropagation algorithm based on differential errors
Adaptive modified backpropagation algorithm based on differential errorsAdaptive modified backpropagation algorithm based on differential errors
Adaptive modified backpropagation algorithm based on differential errorsIJCSEA Journal
 
Modeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technologyModeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technologytheijes
 
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...IJDKP
 
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKS
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKSSLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKS
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKSIJCI JOURNAL
 

Similaire à A comparison-of-first-and-second-order-training-algorithms-for-artificial-neural-networks- (20)

Design of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approachDesign of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approach
 
Design of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approachDesign of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approach
 
Web spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithmsWeb spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithms
 
Ann
Ann Ann
Ann
 
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network AlgorithmsWeb Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
 
Efficient projections
Efficient projectionsEfficient projections
Efficient projections
 
Efficient projections
Efficient projectionsEfficient projections
Efficient projections
 
Black-box modeling of nonlinear system using evolutionary neural NARX model
Black-box modeling of nonlinear system using evolutionary neural NARX modelBlack-box modeling of nonlinear system using evolutionary neural NARX model
Black-box modeling of nonlinear system using evolutionary neural NARX model
 
NEURALNETWORKS_DM_SOWMYAJYOTHI.pdf
NEURALNETWORKS_DM_SOWMYAJYOTHI.pdfNEURALNETWORKS_DM_SOWMYAJYOTHI.pdf
NEURALNETWORKS_DM_SOWMYAJYOTHI.pdf
 
Advanced topics in artificial neural networks
Advanced topics in artificial neural networksAdvanced topics in artificial neural networks
Advanced topics in artificial neural networks
 
UofT_ML_lecture.pptx
UofT_ML_lecture.pptxUofT_ML_lecture.pptx
UofT_ML_lecture.pptx
 
A COMPARATIVE STUDY OF BACKPROPAGATION ALGORITHMS IN FINANCIAL PREDICTION
A COMPARATIVE STUDY OF BACKPROPAGATION ALGORITHMS IN FINANCIAL PREDICTIONA COMPARATIVE STUDY OF BACKPROPAGATION ALGORITHMS IN FINANCIAL PREDICTION
A COMPARATIVE STUDY OF BACKPROPAGATION ALGORITHMS IN FINANCIAL PREDICTION
 
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...
 
Adaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on CooperativeAdaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on Cooperative
 
ML_Unit_2_Part_A
ML_Unit_2_Part_AML_Unit_2_Part_A
ML_Unit_2_Part_A
 
Optimization of Number of Neurons in the Hidden Layer in Feed Forward Neural ...
Optimization of Number of Neurons in the Hidden Layer in Feed Forward Neural ...Optimization of Number of Neurons in the Hidden Layer in Feed Forward Neural ...
Optimization of Number of Neurons in the Hidden Layer in Feed Forward Neural ...
 
Adaptive modified backpropagation algorithm based on differential errors
Adaptive modified backpropagation algorithm based on differential errorsAdaptive modified backpropagation algorithm based on differential errors
Adaptive modified backpropagation algorithm based on differential errors
 
Modeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technologyModeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technology
 
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
 
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKS
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKSSLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKS
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKS
 

Plus de Cemal Ardil

Upfc supplementary-controller-design-using-real-coded-genetic-algorithm-for-d...
Upfc supplementary-controller-design-using-real-coded-genetic-algorithm-for-d...Upfc supplementary-controller-design-using-real-coded-genetic-algorithm-for-d...
Upfc supplementary-controller-design-using-real-coded-genetic-algorithm-for-d...Cemal Ardil
 
The main-principles-of-text-to-speech-synthesis-system
The main-principles-of-text-to-speech-synthesis-systemThe main-principles-of-text-to-speech-synthesis-system
The main-principles-of-text-to-speech-synthesis-systemCemal Ardil
 
The feedback-control-for-distributed-systems
The feedback-control-for-distributed-systemsThe feedback-control-for-distributed-systems
The feedback-control-for-distributed-systemsCemal Ardil
 
System overflow blocking-transients-for-queues-with-batch-arrivals-using-a-fa...
System overflow blocking-transients-for-queues-with-batch-arrivals-using-a-fa...System overflow blocking-transients-for-queues-with-batch-arrivals-using-a-fa...
System overflow blocking-transients-for-queues-with-batch-arrivals-using-a-fa...Cemal Ardil
 
Sonic localization-cues-for-classrooms-a-structural-model-proposal
Sonic localization-cues-for-classrooms-a-structural-model-proposalSonic localization-cues-for-classrooms-a-structural-model-proposal
Sonic localization-cues-for-classrooms-a-structural-model-proposalCemal Ardil
 
Robust fuzzy-observer-design-for-nonlinear-systems
Robust fuzzy-observer-design-for-nonlinear-systemsRobust fuzzy-observer-design-for-nonlinear-systems
Robust fuzzy-observer-design-for-nonlinear-systemsCemal Ardil
 
Response quality-evaluation-in-heterogeneous-question-answering-system-a-blac...
Response quality-evaluation-in-heterogeneous-question-answering-system-a-blac...Response quality-evaluation-in-heterogeneous-question-answering-system-a-blac...
Response quality-evaluation-in-heterogeneous-question-answering-system-a-blac...Cemal Ardil
 
Reduction of-linear-time-invariant-systems-using-routh-approximation-and-pso
Reduction of-linear-time-invariant-systems-using-routh-approximation-and-psoReduction of-linear-time-invariant-systems-using-routh-approximation-and-pso
Reduction of-linear-time-invariant-systems-using-routh-approximation-and-psoCemal Ardil
 
Real coded-genetic-algorithm-for-robust-power-system-stabilizer-design
Real coded-genetic-algorithm-for-robust-power-system-stabilizer-designReal coded-genetic-algorithm-for-robust-power-system-stabilizer-design
Real coded-genetic-algorithm-for-robust-power-system-stabilizer-designCemal Ardil
 
Performance of-block-codes-using-the-eigenstructure-of-the-code-correlation-m...
Performance of-block-codes-using-the-eigenstructure-of-the-code-correlation-m...Performance of-block-codes-using-the-eigenstructure-of-the-code-correlation-m...
Performance of-block-codes-using-the-eigenstructure-of-the-code-correlation-m...Cemal Ardil
 
Optimal supplementary-damping-controller-design-for-tcsc-employing-rcga
Optimal supplementary-damping-controller-design-for-tcsc-employing-rcgaOptimal supplementary-damping-controller-design-for-tcsc-employing-rcga
Optimal supplementary-damping-controller-design-for-tcsc-employing-rcgaCemal Ardil
 
Optimal straight-line-trajectory-generation-in-3 d-space-using-deviation-algo...
Optimal straight-line-trajectory-generation-in-3 d-space-using-deviation-algo...Optimal straight-line-trajectory-generation-in-3 d-space-using-deviation-algo...
Optimal straight-line-trajectory-generation-in-3 d-space-using-deviation-algo...Cemal Ardil
 
On the-optimal-number-of-smart-dust-particles
On the-optimal-number-of-smart-dust-particlesOn the-optimal-number-of-smart-dust-particles
On the-optimal-number-of-smart-dust-particlesCemal Ardil
 
On the-joint-optimization-of-performance-and-power-consumption-in-data-centers
On the-joint-optimization-of-performance-and-power-consumption-in-data-centersOn the-joint-optimization-of-performance-and-power-consumption-in-data-centers
On the-joint-optimization-of-performance-and-power-consumption-in-data-centersCemal Ardil
 
On the-approximate-solution-of-a-nonlinear-singular-integral-equation
On the-approximate-solution-of-a-nonlinear-singular-integral-equationOn the-approximate-solution-of-a-nonlinear-singular-integral-equation
On the-approximate-solution-of-a-nonlinear-singular-integral-equationCemal Ardil
 
On problem-of-parameters-identification-of-dynamic-object
On problem-of-parameters-identification-of-dynamic-objectOn problem-of-parameters-identification-of-dynamic-object
On problem-of-parameters-identification-of-dynamic-objectCemal Ardil
 
Numerical modeling-of-gas-turbine-engines
Numerical modeling-of-gas-turbine-enginesNumerical modeling-of-gas-turbine-engines
Numerical modeling-of-gas-turbine-enginesCemal Ardil
 
New technologies-for-modeling-of-gas-turbine-cooled-blades
New technologies-for-modeling-of-gas-turbine-cooled-bladesNew technologies-for-modeling-of-gas-turbine-cooled-blades
New technologies-for-modeling-of-gas-turbine-cooled-bladesCemal Ardil
 
Neuro -fuzzy-networks-for-identification-of-mathematical-model-parameters-of-...
Neuro -fuzzy-networks-for-identification-of-mathematical-model-parameters-of-...Neuro -fuzzy-networks-for-identification-of-mathematical-model-parameters-of-...
Neuro -fuzzy-networks-for-identification-of-mathematical-model-parameters-of-...Cemal Ardil
 
Multivariate high-order-fuzzy-time-series-forecasting-for-car-road-accidents
Multivariate high-order-fuzzy-time-series-forecasting-for-car-road-accidentsMultivariate high-order-fuzzy-time-series-forecasting-for-car-road-accidents
Multivariate high-order-fuzzy-time-series-forecasting-for-car-road-accidentsCemal Ardil
 

Plus de Cemal Ardil (20)

Upfc supplementary-controller-design-using-real-coded-genetic-algorithm-for-d...
Upfc supplementary-controller-design-using-real-coded-genetic-algorithm-for-d...Upfc supplementary-controller-design-using-real-coded-genetic-algorithm-for-d...
Upfc supplementary-controller-design-using-real-coded-genetic-algorithm-for-d...
 
The main-principles-of-text-to-speech-synthesis-system
The main-principles-of-text-to-speech-synthesis-systemThe main-principles-of-text-to-speech-synthesis-system
The main-principles-of-text-to-speech-synthesis-system
 
The feedback-control-for-distributed-systems
The feedback-control-for-distributed-systemsThe feedback-control-for-distributed-systems
The feedback-control-for-distributed-systems
 
System overflow blocking-transients-for-queues-with-batch-arrivals-using-a-fa...
System overflow blocking-transients-for-queues-with-batch-arrivals-using-a-fa...System overflow blocking-transients-for-queues-with-batch-arrivals-using-a-fa...
System overflow blocking-transients-for-queues-with-batch-arrivals-using-a-fa...
 
Sonic localization-cues-for-classrooms-a-structural-model-proposal
Sonic localization-cues-for-classrooms-a-structural-model-proposalSonic localization-cues-for-classrooms-a-structural-model-proposal
Sonic localization-cues-for-classrooms-a-structural-model-proposal
 
Robust fuzzy-observer-design-for-nonlinear-systems
Robust fuzzy-observer-design-for-nonlinear-systemsRobust fuzzy-observer-design-for-nonlinear-systems
Robust fuzzy-observer-design-for-nonlinear-systems
 
Response quality-evaluation-in-heterogeneous-question-answering-system-a-blac...
Response quality-evaluation-in-heterogeneous-question-answering-system-a-blac...Response quality-evaluation-in-heterogeneous-question-answering-system-a-blac...
Response quality-evaluation-in-heterogeneous-question-answering-system-a-blac...
 
Reduction of-linear-time-invariant-systems-using-routh-approximation-and-pso
Reduction of-linear-time-invariant-systems-using-routh-approximation-and-psoReduction of-linear-time-invariant-systems-using-routh-approximation-and-pso
Reduction of-linear-time-invariant-systems-using-routh-approximation-and-pso
 
Real coded-genetic-algorithm-for-robust-power-system-stabilizer-design
Real coded-genetic-algorithm-for-robust-power-system-stabilizer-designReal coded-genetic-algorithm-for-robust-power-system-stabilizer-design
Real coded-genetic-algorithm-for-robust-power-system-stabilizer-design
 
Performance of-block-codes-using-the-eigenstructure-of-the-code-correlation-m...
Performance of-block-codes-using-the-eigenstructure-of-the-code-correlation-m...Performance of-block-codes-using-the-eigenstructure-of-the-code-correlation-m...
Performance of-block-codes-using-the-eigenstructure-of-the-code-correlation-m...
 
Optimal supplementary-damping-controller-design-for-tcsc-employing-rcga
Optimal supplementary-damping-controller-design-for-tcsc-employing-rcgaOptimal supplementary-damping-controller-design-for-tcsc-employing-rcga
Optimal supplementary-damping-controller-design-for-tcsc-employing-rcga
 
Optimal straight-line-trajectory-generation-in-3 d-space-using-deviation-algo...
Optimal straight-line-trajectory-generation-in-3 d-space-using-deviation-algo...Optimal straight-line-trajectory-generation-in-3 d-space-using-deviation-algo...
Optimal straight-line-trajectory-generation-in-3 d-space-using-deviation-algo...
 
On the-optimal-number-of-smart-dust-particles
On the-optimal-number-of-smart-dust-particlesOn the-optimal-number-of-smart-dust-particles
On the-optimal-number-of-smart-dust-particles
 
On the-joint-optimization-of-performance-and-power-consumption-in-data-centers
On the-joint-optimization-of-performance-and-power-consumption-in-data-centersOn the-joint-optimization-of-performance-and-power-consumption-in-data-centers
On the-joint-optimization-of-performance-and-power-consumption-in-data-centers
 
On the-approximate-solution-of-a-nonlinear-singular-integral-equation
On the-approximate-solution-of-a-nonlinear-singular-integral-equationOn the-approximate-solution-of-a-nonlinear-singular-integral-equation
On the-approximate-solution-of-a-nonlinear-singular-integral-equation
 
On problem-of-parameters-identification-of-dynamic-object
On problem-of-parameters-identification-of-dynamic-objectOn problem-of-parameters-identification-of-dynamic-object
On problem-of-parameters-identification-of-dynamic-object
 
Numerical modeling-of-gas-turbine-engines
Numerical modeling-of-gas-turbine-enginesNumerical modeling-of-gas-turbine-engines
Numerical modeling-of-gas-turbine-engines
 
New technologies-for-modeling-of-gas-turbine-cooled-blades
New technologies-for-modeling-of-gas-turbine-cooled-bladesNew technologies-for-modeling-of-gas-turbine-cooled-blades
New technologies-for-modeling-of-gas-turbine-cooled-blades
 
Neuro -fuzzy-networks-for-identification-of-mathematical-model-parameters-of-...
Neuro -fuzzy-networks-for-identification-of-mathematical-model-parameters-of-...Neuro -fuzzy-networks-for-identification-of-mathematical-model-parameters-of-...
Neuro -fuzzy-networks-for-identification-of-mathematical-model-parameters-of-...
 
Multivariate high-order-fuzzy-time-series-forecasting-for-car-road-accidents
Multivariate high-order-fuzzy-time-series-forecasting-for-car-road-accidentsMultivariate high-order-fuzzy-time-series-forecasting-for-car-road-accidents
Multivariate high-order-fuzzy-time-series-forecasting-for-car-road-accidents
 

Dernier

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Dernier (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

A comparison-of-first-and-second-order-training-algorithms-for-artificial-neural-networks-

  • 1. World Academy of Science, Engineering and Technology International Journal of Computer, Information Science and Engineering Vol:1 No:1, 2007 A Comparison of First and Second Order Training Algorithms for Artificial Neural Networks International Science Index 1, 2007 waset.org/publications/9681 Syed Muhammad Aqil Burney, Tahseen Ahmed Jilani, Cemal Ardil Abstract— Minimization methods for training feed-forward networks with Backpropagation are compared. Feedforward network training is a special case of functional minimization, where no explicit model of the data is assumed. Therefore due to the high dimensionality of the data, linearization of the training problem through use of orthogonal basis functions is not desirable. The focus is functional minimization on any basis. A number of methods based on local gradient and Hessian matrices are discussed. Modifications of many methods of first and second order training methods are considered. Using share rates data, experimentally it is proved that Conjugate gradient and Quasi Newton’s methods outperformed the Gradient Descent methods. In case of the Levenberg-Marquardt algorithm is of special interest in financial forecasting. Keywords— Backpropagation algorithm, conjugacy condition, line search, matrix perturbation I. INTRODUCTION F eed forward neural networks are composed of neurons in which the input layer of neurons is connected to the output layer through one or more layers of intermediate neurons, The training process of neural networks involves adjusting the weights till a desired input/output relationship is obtained [12], [19], [34]. The majority of adaptation learning algorithms are based on the Widrow-Hoffback-algorithm [4]. The mathematical characterization of a multilayer feedforward network is that of a composite application of functions [36]. Each of these functions represents a particular layer and may be specific to individual units in the layer, e.g. all the units in the layer are required to have same activation function. The overall mapping is thus characterized by a composite function relating feedforward network inputs to output. That is O = f composite (x ) Using p-mapping layers in a p+1 layer feedforward net yield O=f Lp (f L p −1 ( (x ).......) ) .Thus the interconnection . . .. f L1 Manuscrot received on December, 03. 2004. Dr. S. M. Aqil Burney is Professor in the Department of Computer Science, University of Karachi, Pakistan, Phone: 0092-21-9243131 ext. 2447, fax: 0092-21-9243203, Burney@computer.Org, aqil_burney@yahoo.com. Tahseen Ahmed Jilani is lecturer in the Department of Computer Science, university of Karachi and is Ph.D. research associate in the Department of Statistics. University of Karachi, Pakistan. tahseenjilani@yahoo.com Cemal Ardil is with the Azerbaijan National Academy of Aviation, Baku, Azerbaijan. cemalardil@gmail.com weights from unit k in L1 to unit i in L2 are w L1→ L2 . If hidden units have a sigmoidal activation function, denoted f sig ⎞⎫ ⎛ I L ⎜ ∑ w kj0 → L1 i j ⎟⎪ ⎟⎬ ⎜ j =1 ⎠⎪ ⎝ ⎭ Above equation illustrates neural network with supervision and composition of non-linear function. H1 ⎧ ⎪ L O iL2 = ∑ w ik1 → L2 ⎨f ksig ⎪ k =1 ⎩ II. LEARNING IN NEURAL NETWORKS Learning is a process by which the free Parameters of a neural network are adapted through a continuing process of stimulation by the environment in which the network is embedded [2]. The type of learning is determined by the manner in which the parameters changes take place. A prescribed set of well- defined rules for the solution of a learning problem is called learning algorithm [6].The Learning algorithms differ from each other in the way in which the adjustment ∆wkj to the synaptic weight wkj is formulated. The basic approach in learning is to start with an untrained network. The network outcomes are compared with target values that provide some error. Suppose that t k (n ) denote th some desired outcome (response) for the k neuron at time n and let the actual response of the neuron is Ok (n ) . Suppose the response y k (n ) was produced when x(n ) applied to the network. If the actual response y k (n ) is not same as t k (n ) , we may define an error signal as e k (n ) = t k (n ) − y k (n ) The purpose of error–correction learning is to minimize a cost function based on the error signal e k (n ) . Once a cost function is selected, error-correction learning is strictly an optimization problem. A cost function usually used in neural networks is mean-square-error criteria called L.M.S learning. ⎡1 ⎤ J = E ⎢ ∑ e 2 (n )⎥ k ⎣2 k ⎦ (1) Here summation runs over all neurons in the output layer of the network. This method has the task of continually search for the bottom of cost function in iterative manner Minimization of the cost function J with respect to free 134
  • 2. World Academy of Science, Engineering and Technology International Journal of Computer, Information Science and Engineering Vol:1 No:1, 2007 parameters of the network leads to so-called Method of Gradient Descent. w k ,(n +1) = w k ,(n ) − η ∇J (n ) (2) η is called learning rate and it defines the proportion of error for weight updating (correction). The learning parameter has a profound impact on the performance of convergence of learning [21]. A plot of the cost function versus the synaptic weights characterizes the neural network consists of a multidimensional surface called error surface. The neural network consists of cross-correction learning algorithm to start from a n arbitrary point on the error surface (initial weights) and then move towards a global minima, in step-bystep fashion. function with respect to the weights and biases in the network. Gradient Descent algorithm is the most commonly used error backpropagation method [13], [14]. Standard backpropagation is a gradient descent algorithm, in which the network weights are moved along the negative of the gradient of the performance function. There are a number of variations on the basic algorithm that are based on other standard optimization techniques, such as conjugate gradient [22], [23]. Backpropagation was created by generalizing the WidrowHoff learning rule to multiple-layered networks with nonlinear differentiable transfer functions. For If the scalar product of the connections (weights) and the stimulus, we define net = w T . x is large and have a differentiable activation function. Given a training set of input-target pairs of the form International Science Index 1, 2007 waset.org/publications/9681 Local Minimum H = {( x 1 )( ) ( , t 1 , x 2 , t 2 , , .. x n , t n ( ) d e p 2 we use the error dw th correction method to adjust w see [8]. Let for the k element of w , i.e., we have The basic idea is to compute Global Minimum ( ) 2 Fig: 1 Quadratic error surface with local and global minima. We can gain understanding and intuition about the algorithm by studying error surfaces themselves—the function J (w ) . Such an error surface depends upon the task in hand, but even there are some general properties of error surfaces that seem to hold over a broad range of real-world pattern recognition problems. In case of non-linear neural network, the error surface may have troughs, valleys, canyons, and host of shapes, where as in low dimensional data, contains many minima and so many local minima plague the error landscapes, then it is unlikely that the network will find the global minimum. Another issue is the presence of plateausregions where the error varies only slightly as a function of weights see [9], [15] and. Thus in presence of many plateaus, training will get slow. To overcome this situation momentum is introduced that forces the iterative process to cross saddle points and small landscapes [16]. Neural network training begins with small weights; the error surface in the neighborhood of w ≈ 0 will determine the general direction of descent. High dimensional space may afford more ways (dimensions) for the system to ‘get around’ a barrier or local maximum during training. For large networks, the error varies quite gradually as a single weight is changed. III. BACKPROPAGATION For networks having differentiable activation functions, there exist a powerful and computationally efficient method, called error backpropagation for finding the derivatives of an error )} ( ) 2 d ep d ep dO p d net . . = p dw k d net d w k dO ( ) ⎡ d f net p ⎤ p = Op −tp ⎢ .xk p ⎥ ⎣ d net ⎦ ( ) which is the gradient of the pattern error with respect to weight wk and forms the basis of the gradient descent training algorithm. Specifically, we assign the weight correction, ∆w k , such that Fig 2 Gradient Descent algorithm showing minimization of cost function 135
  • 3. World Academy of Science, Engineering and Technology International Journal of Computer, Information Science and Engineering Vol:1 No:1, 2007 ) ( ) ⎡ ⎤ d f net p w kj +1 = w kj − η ⎢ O p − t p . . x kp ⎥ p d net ⎣ ⎦ and for linear activation function ( w j +1 k ( (3) of the algorithm, the component of the gradient parallel to the previous search direction which has just been made zero, is unaltered [17, ][24]. Suppose, we have already performed a line minimization along the direction d n , starting from the ) ⎡ Op −t p ⎤ j +1 j p p = w − η ⎢. . x kp ⎥ w k = w k + η e x k p ⎣ − 2e ⎦ j k point w n , to give the new point w n +1 . Then at the IV. LINE SEARCH Mostly, the learning algorithms involve a sequence of steps through weight space. We can consider each of these steps in two parts. First decide the direction in which to move and secondly, how far to move in that direction. This direction of search provides minimum of the error function in that direction in weight space [18], [25], [27]. This procedure is referred to as line search and it forms the basis for several algorithms which are considerably more powerful than gradient descent. Suppose at step n in some algorithm the current weight vector is w n and we wish to obtain a International Science Index 1, 2007 waset.org/publications/9681 particular search direction p n through weight space. The minimum along the search direction then gives the next values for the weight vector as: w n +1 = w n + λ n p n where the parameter λ is chosen to minimize (4) E (λ ) = E (w n + λ p n ) (5) This gives us an automatic procedure for setting the step length, once we have chosen the search direction. The line search represents a one dimensional minimization problem. This minimization can be performed in a number of ways. A simple approach would be to proceed long the search direction in small steps, evaluating the error function at each step (position), and stop when the error starts to increase. It is possible, however, to find much more efficient approaches. This includes the issue that whether local gradient information is preferable in line search. The search direction is towards the minimum is obtained (possibly) through proper weight adjustments, this involves search through negative direction of gradient information. V. CONJUGATE GRADIENT To apply line search to the problems of error function minimization, we need to choose a suitable search direction at each stage of the algorithm. Note that at minimum of the line search ∂ E (w n + λ p n ) = 0 ∂λ gives g (w n +1 ) . p n = 0 Where g n = T converge, even for a quadratic error function. The solution to this problem lies in choosing the successive search directions d n to minimize cost function such that, at each step (6) ∂E ∂ w n +1 (7) Thus the gradient of the new minimum is orthogonal to the search direction. Choosing successive search directions to the local (negative) gradient directions can lead to the problem of slow learning. The algorithm can then take many steps to point w n +1 , we have g n +1 (w ) . p n = 0 (8). Now the new search d n such that g (w n +1 + λ p n +1 ) . p n = 0 T gives p n +1 . p n = o (9) that is along the new direction, the component of the gradient parallel to the previous search direction remains zero [31]. VI. CONJUGATE GRADIENT TECHNIQUES The conjugate Descent algorithms have been derived on the assumption of a quadratic error function with a positive definite Hessian matrix. For a non-quadratic error function, the Hessian matrix will depend on the current weight vector, and so will need to be re-evaluated at each step of the algorithm. Since the evaluation of Hessian is computationally costly for non-linear neural networks, and since its evaluation would have to be done repeatedly, we would like to avoid having to use the Hessian. In fact, it turns out that the search direction and learning rate can be found with out explicit knowledge of H. This leads to the CGA. The conjugate method avoids this problem by incorporating an intricate relationship between direction and gradient vector. The conjugate gradient method is guaranteed to locate the minimum of any quadratic function of N variables in at most N steps. But for non-quadratic function, like as the cost function in M.L.P, the process is iterative rather than N-step, and a criterion for convergence is required. The weight vector of the network is updated in accordance with the rule in [4] In conjugate gradient approach, successive steps in ∆w n are chosen such that weight correction at previous step is preserved. A direction for yth weight correction is computed and line minimization in this direction is performed, generating w n +1 . Successive weight corrections are constrained to be conjugate to those used previously. Interestingly, this is achieved without inversion of Hessian matrix, when compared with Newton’s method. There exists several techniques, most often gradient is involved. Suppose initial weight correction is p (0) = − g (0) . Line minimization in the direction p (0 ) takes weight vector w (0) . Beginning with K = 0 , we require 136
  • 4. World Academy of Science, Engineering and Technology International Journal of Computer, Information Science and Engineering Vol:1 No:1, 2007 subsequent gradient to be orthogonal to the previous weight correction direction; that is from [8] p n T . g n+i = 0 ; i = 1,2,..., n p n [g n + 2 − g n +1 ]= 0 (10) Where g n + 2 − g n +1 is the change of gradient of E from w n +1 to w n + 2 . g (n ) = F + H (w − w 0 ) p T H p n +1 = 0 n Therefore (11) which is called the conjugacy condition, see fig. 3. Weight correction direction p n + 2 and p n +1 are said to be conjugate to one another in the context of H , or H -conjugate. It is important to note that there exist several methods of finding conjugate vector p(n + 1) see [29] and [31]. Each search direction vector is then computed as a linear combination of the current gradient vector and the previous direction vector. p n +1 = − g n +1 + β n +1 . p n defines a (12) path or sequence of search directions in the weight space where β n is a time p n ; n = 1,2,..., n varying parameter. There are various rules for determining β n in terms of the gradient vector g n and g n +1 . Each weight correction p n is computed sequentially with p 0 and is International Science Index 1, 2007 waset.org/publications/9681 formed as the sum of the current negative gradient direction and a scaled version (i.e., β n ) of previous correction. Using (11) and (12), we have p n H p n +1 = p n H [− g n +1 + β n +1 p n ] T T T p n H g n +1 (13) T pn H pn In most of the conjugate gradient algorithms, the step size is adjusted at each iteration, and a search is made along the conjugate gradient direction to determine the step size which minimizes the performance function along that line [10]. There are a number of different search functions. Some of these are, β n +1 = Fletcher-Reeves β n +1 = g n +1 . g n +1 T g n .gn T Polak-Ribiere β n +1 = g n +1 .(g n +1 − g n ) T g n .g n T Fig. 3 Conjugal gradient descent in weight space employs a sequence of line searches. If ∆w 1 is the first descent direction, the second direction obeys conjugacy condition; as such, the second descent does not “spoil” the contribution due to the previous line search. In the case where the Hessian is diagonal (right), the direction of the line searches are orthogonal [12]. Using Taylor’s series, we have E(w ) ≈ E(w 0 ) + ∂E ∂w + w= w 0 2 1 (w − w 0 ) ∂ 2 E 2! ∂ w 2 2 where J = ∂E and H = ∂ E = ⎡ ∂ E ⎤ ⎢ ⎥ 2 ∂w ∂ w ⎢ ∂wi ∂w j ⎥ ⎣ ⎦ Differentiating with respect to w , we have (w − w 0 ) w= w 0 (14) (15) Powell-Beale Restarts β n +1 = g n +1 .(g n +1 − g n ) (16) T p n (g n +1 − g n ) The conjugate gradient algorithms are usually much faster than variable leaning rate backpropagtaion algorithm, although the results will vary from one problem to another. The conjugate Gradient algorithms required only a little more storage than the simple algorithm, so that are often a good choice for networks with a large number of weights. It is important to note that above three expressions are equivalent provided the error function is exactly quadratic. For nonquadratic error functions, the Polak-Ribiere form is generally found to give slightly better results that the other expressions, probably due to the fact that, if the algorithm is making little progress, so that successive gradient vectors are very similar, the Polak -Ribiere form gives a small value for β j so that the T 137
  • 5. World Academy of Science, Engineering and Technology International Journal of Computer, Information Science and Engineering Vol:1 No:1, 2007 search direction p n +1 tends to be reset to the negative gradient direction, which is equivalent to restarting the conjugate gradient procedure. For a quadratic error function the conjugate gradient algorithm finds the minimum after at most w line minimization s, with out calculating the Hessian matrix [39]. This clearly represents a significant improvement on the simple gradient descent approach which could take a very large number of steps to minimize even a quadratic error function. the Newton location for a quadratic error surface, evaluated at any w , points directly at the minimum of the error function provided H is positive definite. Since the quadratic approximation in above algorithm is not exact, it would be necessary to apply w * = w − H −1 g iteratively with the Hessian being re-evaluated at each new search [40]. The Hessian for non-quadratic networks is computationally demanding since it required O Nw 2 steps, where w is the number of weights in the network and N is the number of patterns in the data and O w 3 operation for inversion in each iteration [5], [32]. The Newton’s step may move towards a maximum or a saddle point rather than a minimum. This occur if the Hessian is not positive definite. Thus the error is not guaranteed to be reduced at each iteration see [6], [11]. ( ) ( ) International Science Index 1, 2007 waset.org/publications/9681 One of the drawbacks of Newton’s method is that it requires the analytical derivative of Hessian matrix at each iteration. This is a problem if the derivative is very expensive or difficult to compute. In such cases it may be convenient to iterate according to w n +1 = w n − G −1 g n Fig. 4 A comparison between GDA and CGA minimization. VII. TRUST REGION MODELS A large class of methods for unconstrained optimization is based on a slightly different model algorithm. In particular, the step size α j is nearly always taken as unity. In order to ensure that he descent condition holds, it may thus be necessary to compute several trial vectors. Methods of this type are often termed as trust-region methods. For large values of λn the step size becomes small. Techniques such as this are well in standard optimization theory, where they are called model trust region methods [35], [37] and [38] because the model is effectively trusted only in a small region around the current search point. The size of the trust region is governed by the parameter λ n , so that the large λn gives the small trust region see [33]. (17) where G is an easily computed approximation to H. For example, in one dimension, the secant method approximates the derivative with the difference quotient a (n ) = g (w n +1 ) − g(w n ) w n +1 − w n such an iteration is called a quasi-Newton method. If G is positive definite, as it usually is, an alternative name is variable metric method. One positive advantage to using an approximation in place of H is that G can be chosen to be positive definite, ensuring that the step will not be attracted to a maximum of f when one wants a minimum. Another advantage is that G (n ) g (w n ) is a descent direction −1 from w n allowing the use of line searches. Among general purpose quasi-Newton algorithms, the best is probably the Broydon–Fletcher–Goldfarb–Shanno (BFGS) algorithm. The BFGS algorithm builds upon the earlier and similar Davidon–Fletcher–Powell (DFP) algorithm. The BFGS algorithm starts with a positive definite matrix approximation to H (w 0 ) usually the identity matrix I . At VIII. QUASI NEWTON METHODS each iteration it makes a minimalist modification to gradually Using the local quadratic approximation, we can obtain directly an expression for the location of the minimum or more generally the stationary point of the error function IX. LEVENBERG-MARQUARDT ALGORITHM ( ) 1 (w − w ) H (w − w ) 2 E (w ) = E w * + gives * T * w * = w − H −1 g The vector − H −1 g is known as the Newton step or the Newton’s direction [20],[29]. Unlike the local gradient vector, −1 approximate H . Levenberg-Marquardt is a trust region based method with hyper-spherical trust region. This method work extremely well in practice, and is considered the most efficient algorithm for training median sized artificial neural networks. Like QuasiNewton methods, the Levenberg-Marquardt algorithm was designed to approach second order training speed with out having to compute Hessian matrix [1]. When the performance 138
  • 6. World Academy of Science, Engineering and Technology International Journal of Computer, Information Science and Engineering Vol:1 No:1, 2007 function has the form of a sum of squares, then from (17) we have w n +1 = w n − H −1 ∇F ( w ) (18) here the Hessian matrix can be approximated as, H ≈ J and the gradient can be computed as T J Gradient descent method with momentum and learning rate is considered. Finally, the results of forecasting for 10 days are shown in the table. It is best to use the forecast for only one day due to possibility of error growth. The performance goal is 10 ( T ) −1 g =J T e Target Values International Science Index 1, 2007 waset.org/publications/9681 Where (J T J ) is positive definite, but if it is not, then, we make some perturbation into it that will control the probability of being non positive definite. Such that the recursion equation is ( ) T 15.95 16.05 15.95 15.80 15.35 15.45 15.55 15.60 15.75 15.80 Jn T en W +1 = W − Jn Jn + λ I −1 Jn T en n n (19) Where in neural computing context the quantity the learning parameter, it ensures that J T . Table: 1 A comparison of neural network training algorithms (in m iterations) where the Jacobian matrix J contains the first derivatives of the network errors with respect to weights and biases, and e is a vector of network errors. The Gauss-Newton update formula can be W n+1 = Wn − Jn Jn −5 λ is called J is positive definite. It is always possible to choose λ sufficiently large enough to ensure a descent step. The learning parameter is decreased as the iterative process approaches to a minimum. Thus, in Levenberg-Marquardt optimization, inversion of square matrix J J + λ I is involved which requires large memory space to store the Jacobian matrix and the approximated T Daily Shares rates data is considered and presented in concurrent form. For programming in C/C++, we used the approach of [3], [7], [26] and [30]. The average number of iterations to converge to the targets are indicated with each algorithm [1] [2] [3] [4] [5] [7] [8] [9] [10] [11] [12] [13] [14] [15] Fig. 5. Time series plot of share rates We trained the network for each method and average of 50 successful results are taken and presented in the table. BGFS L.M CGFP CGPR 45 21 400 240 SCG 1700 15.95 16.05 15.95 15.80 15.35 15.45 15.55 15.60 15.75 15.80 15.95 16.05 15.95 15.80 15.35 15.45 15.55 15.60 15.75 15.80 15.95 16.05 15.95 15.80 15.35 15.45 15.56 15.60 15.75 15.80 15.95 16.05 15.95 15.80 15.35 15.45 15.55 15.60 15.75 15.80 15.95 16.05 15.95 15.80 15.35 15.45 15.55 15.60 15.75 15.80 REFERENCES [6] X. NETWORK TRAINING RESULTS GD 2500 0 15.92 15.92 19.93 15.77 15.44 15.58 15.65 15.65 15.77 15.78 [16] [17] 139 Aqil Burney S.M., Jilani A. Tahseen and Cemal Aril, 2004. LevenbergMarquardt algorithm for Karachi Stock Exchange share rates forecasting. International Journal of Computational Intelligence, pp 168173. Aqil Burney S.M., Jilani A. Tahseen, 2002, Time Series forecasting using artificial neural network methods for Karachi Stock Exchange. A Project at Department of Computer Science, University of Karachi. Aqil Burney S.M. Measures of central tendency in News-MCB stock exchange. Price index pp 12-31. Developed by Vital information services (pvt) limited, Karachi. C/o Department of Computer Science, University of Karachi. B. Widrow and M. A. Lehr, 1990. 30 years of adaptive neural networks: Perceptron, Madaline, and Backpropagation. IEEE Proceedings-78, pp. 1415-1442. Barak A. Pearlmutter, 1993. Fast Exact Multiplication by the Hessian. Siemens Corporate Research, Neural Computation. Battiti, 1992. First and second-order methods for learning: between steepest descent and Newton’s method. Neural Computation 4, pp. 141166. Blum, Adam, 1992. Neural Network in C++. John Wiley and Sons, New York. Bortolettis, A., Di Fiore, C., Aselli, S. and Bellini, P. A new class of quasi-Newtonian methods for optimal learning in MLP-networks, IEEE Transactions on Neural Networks, vol: 14, Issue: 2. Duda, R. O. Hart, P. E. and Stork, D. G. 2001. Pattern Classification, John Wiley & Sons. C. M. Bishop, 1995. Neural networks for pattern recognition. Clarendon Press. Chauvin, Y., & Rumelhart, D. 1995. Backpropagation: theory, architecture, and applications. Lawrence Erlbaum Association. Chung-Ming Kuan and Tung Liu. 1995. Forecasting exchange rates using feedforward and recurrent neural networks. Journal of Applied Econometrics, pp. 347-64. De Villiers and E. Barnard 1993, Backpropagation neural nets with one and two hidden layers, IEEE Transaction on Neural Networks 4, pp. 136-141 E. Rumelhart, G. E. Hinton, and R. J. Williams, 1986. Learning internal representations by error propagation, Parallel distributed processing. Vol. I, MIT Press, Cambridge, MA, pp. 318-362. Ergezinger and E. Thomsen. 1995. An accelerated learning algorithm for multilayer perceptrons: optimization layer by layer. IEEE Transaction on Neural Networks 6, pp. 31-42. F. Molar, 1997. Efficient Training of feedforward Neural Networks. Ph.D thesis, Computer Science Department, Aarhus University F. Møllar, 1993. A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning. Neural Networks, 6(4): pp. 525-533.
  • 7. International Science Index 1, 2007 waset.org/publications/9681 World Academy of Science, Engineering and Technology International Journal of Computer, Information Science and Engineering Vol:1 No:1, 2007 [18] Gill, P. E., Murray, W. and Wright, M. H. 1993. Practical Optimization”, Academic Press. [19] Iebeling Kaastra and Milton S.Boyd, 1995. Forecasting future trading volume using neural networks. Journal of Future Markets, pp. 953-70. [20] J. Shepherd, 1997. Second-order optimization methods. Second-order methods for neural networks. Springer-Verlag, Berlin, New York, pp. 43-72. [21] Jacobs, 1988. Increased rate of convergence through learning rate adaptation. Neural Networks 1, pp.295-307. [22] Hornik, M. Stinchcombe, and H. White, 1989. Multi-layer feedforward networks are universal approximators. Neural Networks 2, pp. 359-366. [23] Karayiannis and A. N. Venetsanopoulos, 1993. Artificial neural networks: learning algorithms, performance evaluation, and applications. Kluver Academic, Boston, MA. [24] M. R. Hestenes and E. Stifle, 1952. Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards-49, pp. 409–436. [25] Michael Husken, Jens E. Gaykoand Bernard Sendoff, 2000. Optimization for Problem Classes-Neural Networks that Learn to Learn. IEEE Symposium of Evolutionary computation and Neural networks (ECNN-2000), pages 98-109, IEEE Press 2000. [26] Mir F. Atiya, Suzan M. El-Shoura, Samir I. Shaken, 1999. A comparison between neural network forecasting techniques- case study: river flow forecasting". IEEE Transactions on Neural Networks. Vol. 10, No. 2. [27] N. N. Schraudolph, Thore Grapple, 2003. Combining Conjugate Direction Methods with Stochastic Approximation of Gradients. Proceedings of the Ninth International Workshop on Artificial Intelligence and Intelligence. [28] N. N. Schraudolph, 2002. Fast curvature matrix-vector product for second-order gradient descent. Neural Computation, 14(7), pp. 17231728. [29] N. N. Schraudolph, 1999. Local gain adaptation in stochastic-gradient descent. In Proceedings of the Ninth International Conference on Artificial Neural Networks, pp. 569–574, Edinburgh, Scotland, 1999. IEE, London. [30] Neural network toolbox, 2002. : For use with Matlab, MathWorks, Natick, MA. [31] Patrick van der Smagt, 1994. Minimization methods for training feedforward neural networks. Neural Networks 7(1), pp. 1-11. [32] Pearlmutter, 1994. Fast exact multiplication by the Hessian. Neural Computation, 6(1), pp.147–160. [33] Puha, P. K. H. Daohua Ming, 2003. Parallel nonlinear optimization techniques for training neural networks. IEEE Transactions on Neural Networks, vol: 14, Issue: 6, pages, pp. 1460-1468. [34] Scarselli and A. C. Tso, 1998. Universal approximation using feedforward neural networks: a survey of some existing methods, and some new results. Neural Networks, pp. 1537. [35] Ripley, B. D. 1994. Neural networks and related methods for classification. Journal of the Royal Statistician Society, B 56(3),409456. [36] S. Haykin, Neural networks, 1994. A comprehensive foundation, IEEE Press, Piscataway, NJ. [37] S.D. Hunt, J. R. Deller, 1995. Selective training of feedforward artificialneural networks using matrix perturbation theory. Neural networks, 8(6), pp 931-944. [38] Welstead, Stephen T. 1994. Neural network and fuzzy logic applications in C/C++. John Wiley and Sons, Inc. N.Y. [39] Z. Strako¢s and P. Tich´ y., 2002. On error estimation in the conjugate gradient method and why it works in finite precision computations. ETNA 13, 56–80. [40] Zhou and J. Si, 1998. Advanced neural-network training algorithm with reduced complexity based on Jacobian deficiency. IEEE Trans Neural Networks 9, pp. 448-453. degree in Mathematics from Strathclyde University, Glasgow with specialization in estimation, modeling and simulation of multivariate Time series models using algorithmic approach with software development. He is currently professor and approved supervisor in Computer Science and Statistics by the High education Commission Government of Pakistan. He is also the project director of Umair Basha Institute of Information technology (UBIT). He is also member of various higher academic boards of different universities of Pakistan. His research interest includes AI, Soft computing neural networks, fuzzy logic, data mining, statistics, simulation and stochastic modeling of mobile communication system and networks and network security. He is author of three books, various technical reports and supervised more than 100software/Information technology projects of Masters level degree programs and project director of various projects funded by Government of Pakistan. He is member of IEEE (USA), ACM (USA) and fellow of Royal Statistical Society United Kingdom and also a member of Islamic society of Statistical Sciences. He is teaching since 1973 in various universities in the field of Econometric, Bio-Statistics, Statistics, Mathematic and Computer Science He has vast education management experience at the University level. Dr. Burney has received appreciations and awards for his research and as educationist. Tahseen A. Jilani received the B.Sc.(Computer Science, Mathematics, Statistics) from Islamia Science College affiliated with the University of Karachi in 1998; First class second M.Sc. (Statistics,) from Karachi University in 2001. Since 2003, he is Ph.D. research fellow in the Department of Computer Science, University of Karachi. His research interest includes AI, neural networks, soft computing, fuzzy logic, Statistical data mining and simulation. He is teaching since 2002 in the fields of Statistics, Mathematics and Computer Science. S. M. Aqil Burney received the B.Sc.(Physics, Mathematics, Statistics) from D. J. College affiliated with the University of Karachi in 1970; First class first M.Sc. (Statistics,) from Karachi University in 1972 with M.Phil. in 1983 with specialization in Computing, Simulation and stochastic modeling in from risk management. Dr. Burney received Ph.D. 140