SlideShare a Scribd company logo
1 of 33
SWAPNA.C
Asst.Prof.
IT Dept.
SriDevi Women’s Engineering College
Multilayer Networks
 Multilayer networks using gradient descent algorithm.
 In Perceptron its discontinuous threshold makes it
undifferentiable and hence unsuitable for gradient descent.
swapna.c
Differentiable Threshold Unit:
 Sigmoid unit is a smoothed, differentiable threshold
function.
 Like the perceptron, the sigmoid unit first computes a
linear combination of its inputs, then applies a threshold
to the result. In the case of the sigmoid unit, however, the
threshold output is a continuous function of its input.
swapna.c
 More precisely, the sigmoid unit computes its output o as
Where
 is often called the sigmoid function or, alternatively,
the logistic function. Its output ranges between 0 and 1,
increasing monotonically with its input.
 It maps a very large input domain to a small range of
outputs, it is often referred to as the squashing function of
the unit. The sigmoid function has the useful property that
its derivative is easily expressed in terms of its output.
 The function tanh is also sometimes used in place
 of the sigmoid function
swapna.c
 The BACKPROPAGATION ALGORITHM:
 The BACKPROPAGATION Algorithm learns the weights
for a multilayer network, given a network with a fixed set
of units and interconnections.
 It employs gradient descent to attempt to minimize the
squared error between the network output values and the
target values for these outputs.
 we are considering networks with multiple output units
rather than single units as before, we begin by redefining
E to sum the errors over all of the network output units.
swapna.c
 where outputs is the set of output units in the network,
and tkd and okd are the target and output values
associated with the kth output unit and training
example d.
 One major difference in the case of multilayer
networks is that the error surface can have multiple
local minima, in contrast to the single-minimum
parabolic error surface.
 This is the incremental, or stochastic gradient
descent version of BACKPROPAGATION.
swapna.c
swapna.c
swapna.c
 The algorithm applies to layered feedforward networks containing
2 layers of sigmoid units, with units at each layer connected to all
units from the preceding layer.
 BACK-PROPAGATION.
 Notations:
swapna.c
 Constructing a network with the desired number of hidden and
output units and initializing all network weights to small
random values.
 Given this fixed network structure, the main loop of the
algorithm then repeatedly iterates over the training examples.
 calculates the error of the network output for this example,
computes the gradient with respect to the error on this example,
then updates all weights in the network.
 This gradient descent step is iterated until the network performs
acceptably well.
swapna.c
 The gradient descent weight-update rule is similar to the
delta training rule.
 Like the delta rule, it updates each weight in proportion to
the learning rate , the input value xji to which the weight is
applied, and the error in the output of the unit. The only
difference is that the error (t - o) in the delta rule is
replaced by a more complex error term.
 The weight-update loop in BACKPROPAGATION
Algorithm be iterated thousands of times in a typical
application.
swapna.c
Derivation of the BACKPROPAGATION Rule
 Recall that stochastic gradient descent involves iterating
through the training examples one at a time, for each
training example d descending the gradient of the error Ed
with respect to this single example. In other words, for
each training example d every weight wji is updated by
adding to it
 where Ed is the error on training example d, summed over
all output units in the network.
 Here outputs is the set of output units in the network, tk is
the target value of unit k for training example d, and ok is
the output of unit k given training example d.
swapna.c
 The derivation of the stochastic gradient descent rule is
conceptually straightforward, but requires keeping track of
a number of subscripts and variables.
 We will follow the notation adding a subscript j to denote
to the jth unit of the network as follows:
 We can use the chain rule to write
swapna.c
swapna.c
 We consider two cases in turn: the case where unit j is an
output unit for the network, and the case where j is an
internal unit.
 Case 1: Training Rule for Output Unit Weights. Just as
wji can influence the rest of the network only through net,,
net, can influence the network only through oj. Therefore,
we can invoke the chain rule again to write
swapna.c
swapna.c
 We have the stochastic gradient descent rule for output
units
swapna.c
Case 2: Training Rule for Hidden Unit Weights.
 In the case where j is an internal, or hidden unit in the
network, the derivation of the training rule for wji
must take into account the indirect ways in which wji can
influence the network outputs and hence Ed.
Notice that netj can influence the network output
only through the units in Downstream(j).
swapna.c
swapna.c
 A variety of termination conditions can be used to halt
the procedure:
 One may choose to halt after a fixed number of iterations
through the loop, or once the error on the training
examples falls below some threshold, or once the error on
a separate validation set of examples meets some criterion.
 The choice of termination criterion is an important one,
because too few iterations can fail to reduce error
sufficiently, and too many can lead to overfitting
the training data.
swapna.c
ADDING MOMENTUM
 In the algorithm by making the weight update on the nth
iteration depend partially on the update that occurred
during the (n - 1)th iteration, as follows:
 is the weight update performed during the
nth iteration through the main loop of the algorithm,
and 0 < < 1 is a constant called the momentum.
 The 1st term in the right equation is weight-update rule
and 2nd term momentum term.
swapna.c
 The effect of is to add momentum that tends to keep the
ball rolling in the same direction from one iteration to the
next.
 This can sometimes have the effect of keeping the ball
rolling through small local minima in the error surface, or
along flat regions in the surface where the ball would stop
if there were no momentum.
It also has the effect of gradually increasing the step size
of the search in regions where the gradient is unchanging,
thereby speeding convergence.
swapna.c
LEARNING IN ARBITRARY ACYCLIC
NETWORKS
 The definition of BACKPROPAGATION applies only to
two-layer networks.
 In general, the value for a unit r in layer m is
computed from the values at the next deeper layer m+1
according to
 We really saying here is that this step may be repeated for
any number of hidden layers in the network.
swapna.c
 It is equally straightforward to generalize the algorithm to
any directed acyclic graph, regardless of whether the
network units are arranged in uniform layers as we have
assumed up to now. In the case that they are not, the rule
for calculating for any internal unit is
 Where Downstream(r) is the set of units immediately
downstream from unit r in the network: that is, all units
whose inputs include the output of unit r.
swapna.c
REMARKS ON THE BACKPROPAGATION
ALGORITHM
 Convergence and Local Minima:
 BACKPROPAGATION over multilayer networks is only
guaranteed to converge toward some local minimum in E
and not necessarily to the global minimum error.
 When gradient descent falls into a local minimum with
respect to one of these weights, it will not necessarily be
in a local minimum with respect to the other weights.
 A second perspective on local minima can be gained by
considering the manner in which network weights evolve
as the number of training iterations increases.
swapna.c
Common heuristics to attempt to alleviate the
problem of local minima include:
 Add a momentum term to the weight-update rule.
Momentum can sometimes carry the gradient descent
procedure through narrow local minima.
 Use stochastic gradient descent rather than true gradient
descent.
 Train multiple networks using the same data, but
initializing each network with different random weights.
 If the different training efforts lead to different local
minima, then the network with the best performance over
a separate validation data set can be selected.
swapna.c
Representational Power of FeedForward
Networks
 3 general rules of FeedForward network.
 Boolean functions. Every Boolean function can be
represented exactly by some network with two layers of
units, although the number of hidden units required grows
exponentially in the worst case with the number of
network inputs.
 Continuous functions. Every bounded continuous
function can be approximated with arbitrarily small error
(under a finite norm) by a network with two layers of units
swapna.c
 Arbitrary functions. Any function can be approximated to
arbitrary accuracy by a network with three layers of units
(Cybenko 1988). Again, the output layer uses linear units,
the two hidden layers use sigmoid units, and the number
of units required at each layer is not known in general.
 The proof of this involves showing that any function can
be approximated by a linear combination of many
localized functions that have value 0 everywhere except
for some small region, and then showing that two layers of
sigmoid units are sufficient to produce good local
approximations.
swapna.c
Hypothesis Space Search and Inductive
Bias
 hypothesis space is the n-dimensional Euclidean space of
the n network weights. This hypothesis space is
continuous, in contrast to the hypothesis spaces of
decision tree learning and other methods based on discrete
representations.
 Inductive Bias is depends on the interplay between the
gradient descent search and the way in which the weight
space spans the space of represent able functions.
However, one can roughly characterize it as smooth
interpolation between data points.
swapna.c
Hidden Layer Representations
 One intriguing property of BACKPROPAGATION its
ability to discover useful intermediate representations
at the hidden unit layers inside the network.
swapna.c
Generalization, Overfitting, and Stopping
Criterion
 BACKPROPAGATION is susceptible to overfitting the
training examples at the cost of decreasing generalization
accuracy over other unseen examples.
 BACKPROPAGATION will often be able to create
overly complex decision surfaces that fit noise in the
training data or unrepresentative characteristics of the
particular training sample.
 Several techniques are available to address the overfitting
problem for BACKPROPAGATION learning. One
approach, known as weight decay, is to decrease each
weight by some small factor during each iteration.
swapna.c
 One of the most successful methods for overcoming the
overfitting problem is to simply provide a set of validation
data to the algorithm in addition to the training data.
 The algorithm monitors the error with respect to this
validation set, while using the training set to drive the
gradient descent search.
 The problem of overfitting is most severe for small
training sets.
swapna.c
 In these cases, a k-fold cross-validation approach is
sometimes used, in which cross validation is performed k
different times, each time using a different partitioning of
the data into training and validation sets, and the results
are then averaged.
 In one version of this approach, them available examples
are partitioned into k disjoint subsets, each of size m/k.
 The cross validation procedure is then run k times, each
time using a different one of these subsets as the
validation set and combining the other subsets for the
training set.
swapna.c

More Related Content

What's hot

Back propagation
Back propagationBack propagation
Back propagationNagarajan
 
I. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHMI. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHMvikas dhakane
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methodsKrish_ver2
 
Conceptual dependency
Conceptual dependencyConceptual dependency
Conceptual dependencyJismy .K.Jose
 
Semantic nets in artificial intelligence
Semantic nets in artificial intelligenceSemantic nets in artificial intelligence
Semantic nets in artificial intelligenceharshita virwani
 
8 queens problem using back tracking
8 queens problem using back tracking8 queens problem using back tracking
8 queens problem using back trackingTech_MX
 
Learning rule of first order rules
Learning rule of first order rulesLearning rule of first order rules
Learning rule of first order rulesswapnac12
 
Issues in knowledge representation
Issues in knowledge representationIssues in knowledge representation
Issues in knowledge representationSravanthi Emani
 
Register allocation and assignment
Register allocation and assignmentRegister allocation and assignment
Register allocation and assignmentKarthi Keyan
 
Logics for non monotonic reasoning-ai
Logics for non monotonic reasoning-aiLogics for non monotonic reasoning-ai
Logics for non monotonic reasoning-aiShaishavShah8
 
Predictive coding
Predictive codingPredictive coding
Predictive codingp_ayal
 
Counter propagation Network
Counter propagation NetworkCounter propagation Network
Counter propagation NetworkAkshay Dhole
 
Activation function
Activation functionActivation function
Activation functionAstha Jain
 
Neural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) AlgorithmNeural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) AlgorithmMostafa G. M. Mostafa
 
Advanced topics in artificial neural networks
Advanced topics in artificial neural networksAdvanced topics in artificial neural networks
Advanced topics in artificial neural networksswapnac12
 
Kohonen self organizing maps
Kohonen self organizing mapsKohonen self organizing maps
Kohonen self organizing mapsraphaelkiminya
 
Grid based method & model based clustering method
Grid based method & model based clustering methodGrid based method & model based clustering method
Grid based method & model based clustering methodrajshreemuthiah
 
sum of subset problem using Backtracking
sum of subset problem using Backtrackingsum of subset problem using Backtracking
sum of subset problem using BacktrackingAbhishek Singh
 

What's hot (20)

Back propagation
Back propagationBack propagation
Back propagation
 
Mc culloch pitts neuron
Mc culloch pitts neuronMc culloch pitts neuron
Mc culloch pitts neuron
 
I. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHMI. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHM
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
Conceptual dependency
Conceptual dependencyConceptual dependency
Conceptual dependency
 
Semantic nets in artificial intelligence
Semantic nets in artificial intelligenceSemantic nets in artificial intelligence
Semantic nets in artificial intelligence
 
8 queens problem using back tracking
8 queens problem using back tracking8 queens problem using back tracking
8 queens problem using back tracking
 
Learning rule of first order rules
Learning rule of first order rulesLearning rule of first order rules
Learning rule of first order rules
 
Issues in knowledge representation
Issues in knowledge representationIssues in knowledge representation
Issues in knowledge representation
 
Register allocation and assignment
Register allocation and assignmentRegister allocation and assignment
Register allocation and assignment
 
Logics for non monotonic reasoning-ai
Logics for non monotonic reasoning-aiLogics for non monotonic reasoning-ai
Logics for non monotonic reasoning-ai
 
Fuzzy c means manual work
Fuzzy c means manual workFuzzy c means manual work
Fuzzy c means manual work
 
Predictive coding
Predictive codingPredictive coding
Predictive coding
 
Counter propagation Network
Counter propagation NetworkCounter propagation Network
Counter propagation Network
 
Activation function
Activation functionActivation function
Activation function
 
Neural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) AlgorithmNeural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) Algorithm
 
Advanced topics in artificial neural networks
Advanced topics in artificial neural networksAdvanced topics in artificial neural networks
Advanced topics in artificial neural networks
 
Kohonen self organizing maps
Kohonen self organizing mapsKohonen self organizing maps
Kohonen self organizing maps
 
Grid based method & model based clustering method
Grid based method & model based clustering methodGrid based method & model based clustering method
Grid based method & model based clustering method
 
sum of subset problem using Backtracking
sum of subset problem using Backtrackingsum of subset problem using Backtracking
sum of subset problem using Backtracking
 

Similar to Multilayer & Back propagation algorithm

ML_ Unit 2_Part_B
ML_ Unit 2_Part_BML_ Unit 2_Part_B
ML_ Unit 2_Part_BSrimatre K
 
FAULT DETECTION AND CLASSIFICATION ON SINGLE CIRCUIT TRANSMISSION LINE USING ...
FAULT DETECTION AND CLASSIFICATION ON SINGLE CIRCUIT TRANSMISSION LINE USING ...FAULT DETECTION AND CLASSIFICATION ON SINGLE CIRCUIT TRANSMISSION LINE USING ...
FAULT DETECTION AND CLASSIFICATION ON SINGLE CIRCUIT TRANSMISSION LINE USING ...Politeknik Negeri Ujung Pandang
 
Artificial neural networks in hydrology
Artificial neural networks in hydrology Artificial neural networks in hydrology
Artificial neural networks in hydrology Jonathan D'Cruz
 
Macromodel of High Speed Interconnect using Vector Fitting Algorithm
Macromodel of High Speed Interconnect using Vector Fitting AlgorithmMacromodel of High Speed Interconnect using Vector Fitting Algorithm
Macromodel of High Speed Interconnect using Vector Fitting Algorithmijsrd.com
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyayabhishek upadhyay
 
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network AlgorithmsWeb Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network Algorithmsaciijournal
 
Web spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithmsWeb spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithmsaciijournal
 
IRJET- Power Theft Detection using Probabilistic Neural Network Classifier
IRJET- Power Theft Detection using Probabilistic Neural Network ClassifierIRJET- Power Theft Detection using Probabilistic Neural Network Classifier
IRJET- Power Theft Detection using Probabilistic Neural Network ClassifierIRJET Journal
 
Towards Self healing networks in distribution networks operation
Towards Self healing networks in distribution networks operationTowards Self healing networks in distribution networks operation
Towards Self healing networks in distribution networks operationVatsalMaheshwari12
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyayabhishek upadhyay
 
Back Propagation Network (Soft Computing)
Back Propagation Network (Soft Computing)Back Propagation Network (Soft Computing)
Back Propagation Network (Soft Computing)Amit Kumar Rathi
 
ARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTOR
ARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTORARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTOR
ARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTORijac123
 

Similar to Multilayer & Back propagation algorithm (20)

ML_ Unit 2_Part_B
ML_ Unit 2_Part_BML_ Unit 2_Part_B
ML_ Unit 2_Part_B
 
Unit ii supervised ii
Unit ii supervised iiUnit ii supervised ii
Unit ii supervised ii
 
FAULT DETECTION AND CLASSIFICATION ON SINGLE CIRCUIT TRANSMISSION LINE USING ...
FAULT DETECTION AND CLASSIFICATION ON SINGLE CIRCUIT TRANSMISSION LINE USING ...FAULT DETECTION AND CLASSIFICATION ON SINGLE CIRCUIT TRANSMISSION LINE USING ...
FAULT DETECTION AND CLASSIFICATION ON SINGLE CIRCUIT TRANSMISSION LINE USING ...
 
Artificial neural networks in hydrology
Artificial neural networks in hydrology Artificial neural networks in hydrology
Artificial neural networks in hydrology
 
Lec 6-bp
Lec 6-bpLec 6-bp
Lec 6-bp
 
deep CNN vs conventional ML
deep CNN vs conventional MLdeep CNN vs conventional ML
deep CNN vs conventional ML
 
Neural network
Neural networkNeural network
Neural network
 
rbm_hls
rbm_hlsrbm_hls
rbm_hls
 
Macromodel of High Speed Interconnect using Vector Fitting Algorithm
Macromodel of High Speed Interconnect using Vector Fitting AlgorithmMacromodel of High Speed Interconnect using Vector Fitting Algorithm
Macromodel of High Speed Interconnect using Vector Fitting Algorithm
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
 
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network AlgorithmsWeb Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
 
Web spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithmsWeb spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithms
 
IRJET- Power Theft Detection using Probabilistic Neural Network Classifier
IRJET- Power Theft Detection using Probabilistic Neural Network ClassifierIRJET- Power Theft Detection using Probabilistic Neural Network Classifier
IRJET- Power Theft Detection using Probabilistic Neural Network Classifier
 
Towards Self healing networks in distribution networks operation
Towards Self healing networks in distribution networks operationTowards Self healing networks in distribution networks operation
Towards Self healing networks in distribution networks operation
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
 
Back Propagation Network (Soft Computing)
Back Propagation Network (Soft Computing)Back Propagation Network (Soft Computing)
Back Propagation Network (Soft Computing)
 
ARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTOR
ARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTORARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTOR
ARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTOR
 
Unit iii update
Unit iii updateUnit iii update
Unit iii update
 
N ns 1
N ns 1N ns 1
N ns 1
 
Fulltext
FulltextFulltext
Fulltext
 

More from swapnac12

Awt, Swing, Layout managers
Awt, Swing, Layout managersAwt, Swing, Layout managers
Awt, Swing, Layout managersswapnac12
 
Event handling
Event handlingEvent handling
Event handlingswapnac12
 
Asymptotic notations(Big O, Omega, Theta )
Asymptotic notations(Big O, Omega, Theta )Asymptotic notations(Big O, Omega, Theta )
Asymptotic notations(Big O, Omega, Theta )swapnac12
 
Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)swapnac12
 
Introduction ,characteristics, properties,pseudo code conventions
Introduction ,characteristics, properties,pseudo code conventionsIntroduction ,characteristics, properties,pseudo code conventions
Introduction ,characteristics, properties,pseudo code conventionsswapnac12
 
Inductive analytical approaches to learning
Inductive analytical approaches to learningInductive analytical approaches to learning
Inductive analytical approaches to learningswapnac12
 
Using prior knowledge to initialize the hypothesis,kbann
Using prior knowledge to initialize the hypothesis,kbannUsing prior knowledge to initialize the hypothesis,kbann
Using prior knowledge to initialize the hypothesis,kbannswapnac12
 
Combining inductive and analytical learning
Combining inductive and analytical learningCombining inductive and analytical learning
Combining inductive and analytical learningswapnac12
 
Analytical learning
Analytical learningAnalytical learning
Analytical learningswapnac12
 
Genetic algorithms
Genetic algorithmsGenetic algorithms
Genetic algorithmsswapnac12
 
Instance based learning
Instance based learningInstance based learning
Instance based learningswapnac12
 
Computational learning theory
Computational learning theoryComputational learning theory
Computational learning theoryswapnac12
 
Artificial Neural Networks 1
Artificial Neural Networks 1Artificial Neural Networks 1
Artificial Neural Networks 1swapnac12
 
Evaluating hypothesis
Evaluating  hypothesisEvaluating  hypothesis
Evaluating hypothesisswapnac12
 
Introdution and designing a learning system
Introdution and designing a learning systemIntrodution and designing a learning system
Introdution and designing a learning systemswapnac12
 
Inductive bias
Inductive biasInductive bias
Inductive biasswapnac12
 
Concept learning and candidate elimination algorithm
Concept learning and candidate elimination algorithmConcept learning and candidate elimination algorithm
Concept learning and candidate elimination algorithmswapnac12
 

More from swapnac12 (18)

Awt, Swing, Layout managers
Awt, Swing, Layout managersAwt, Swing, Layout managers
Awt, Swing, Layout managers
 
Applet
 Applet Applet
Applet
 
Event handling
Event handlingEvent handling
Event handling
 
Asymptotic notations(Big O, Omega, Theta )
Asymptotic notations(Big O, Omega, Theta )Asymptotic notations(Big O, Omega, Theta )
Asymptotic notations(Big O, Omega, Theta )
 
Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)
 
Introduction ,characteristics, properties,pseudo code conventions
Introduction ,characteristics, properties,pseudo code conventionsIntroduction ,characteristics, properties,pseudo code conventions
Introduction ,characteristics, properties,pseudo code conventions
 
Inductive analytical approaches to learning
Inductive analytical approaches to learningInductive analytical approaches to learning
Inductive analytical approaches to learning
 
Using prior knowledge to initialize the hypothesis,kbann
Using prior knowledge to initialize the hypothesis,kbannUsing prior knowledge to initialize the hypothesis,kbann
Using prior knowledge to initialize the hypothesis,kbann
 
Combining inductive and analytical learning
Combining inductive and analytical learningCombining inductive and analytical learning
Combining inductive and analytical learning
 
Analytical learning
Analytical learningAnalytical learning
Analytical learning
 
Genetic algorithms
Genetic algorithmsGenetic algorithms
Genetic algorithms
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
 
Computational learning theory
Computational learning theoryComputational learning theory
Computational learning theory
 
Artificial Neural Networks 1
Artificial Neural Networks 1Artificial Neural Networks 1
Artificial Neural Networks 1
 
Evaluating hypothesis
Evaluating  hypothesisEvaluating  hypothesis
Evaluating hypothesis
 
Introdution and designing a learning system
Introdution and designing a learning systemIntrodution and designing a learning system
Introdution and designing a learning system
 
Inductive bias
Inductive biasInductive bias
Inductive bias
 
Concept learning and candidate elimination algorithm
Concept learning and candidate elimination algorithmConcept learning and candidate elimination algorithm
Concept learning and candidate elimination algorithm
 

Recently uploaded

Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 

Recently uploaded (20)

Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 

Multilayer & Back propagation algorithm

  • 2. Multilayer Networks  Multilayer networks using gradient descent algorithm.  In Perceptron its discontinuous threshold makes it undifferentiable and hence unsuitable for gradient descent. swapna.c
  • 3. Differentiable Threshold Unit:  Sigmoid unit is a smoothed, differentiable threshold function.  Like the perceptron, the sigmoid unit first computes a linear combination of its inputs, then applies a threshold to the result. In the case of the sigmoid unit, however, the threshold output is a continuous function of its input. swapna.c
  • 4.  More precisely, the sigmoid unit computes its output o as Where  is often called the sigmoid function or, alternatively, the logistic function. Its output ranges between 0 and 1, increasing monotonically with its input.  It maps a very large input domain to a small range of outputs, it is often referred to as the squashing function of the unit. The sigmoid function has the useful property that its derivative is easily expressed in terms of its output.  The function tanh is also sometimes used in place  of the sigmoid function swapna.c
  • 5.  The BACKPROPAGATION ALGORITHM:  The BACKPROPAGATION Algorithm learns the weights for a multilayer network, given a network with a fixed set of units and interconnections.  It employs gradient descent to attempt to minimize the squared error between the network output values and the target values for these outputs.  we are considering networks with multiple output units rather than single units as before, we begin by redefining E to sum the errors over all of the network output units. swapna.c
  • 6.  where outputs is the set of output units in the network, and tkd and okd are the target and output values associated with the kth output unit and training example d.  One major difference in the case of multilayer networks is that the error surface can have multiple local minima, in contrast to the single-minimum parabolic error surface.  This is the incremental, or stochastic gradient descent version of BACKPROPAGATION. swapna.c
  • 9.  The algorithm applies to layered feedforward networks containing 2 layers of sigmoid units, with units at each layer connected to all units from the preceding layer.  BACK-PROPAGATION.  Notations: swapna.c
  • 10.  Constructing a network with the desired number of hidden and output units and initializing all network weights to small random values.  Given this fixed network structure, the main loop of the algorithm then repeatedly iterates over the training examples.  calculates the error of the network output for this example, computes the gradient with respect to the error on this example, then updates all weights in the network.  This gradient descent step is iterated until the network performs acceptably well. swapna.c
  • 11.  The gradient descent weight-update rule is similar to the delta training rule.  Like the delta rule, it updates each weight in proportion to the learning rate , the input value xji to which the weight is applied, and the error in the output of the unit. The only difference is that the error (t - o) in the delta rule is replaced by a more complex error term.  The weight-update loop in BACKPROPAGATION Algorithm be iterated thousands of times in a typical application. swapna.c
  • 12. Derivation of the BACKPROPAGATION Rule  Recall that stochastic gradient descent involves iterating through the training examples one at a time, for each training example d descending the gradient of the error Ed with respect to this single example. In other words, for each training example d every weight wji is updated by adding to it  where Ed is the error on training example d, summed over all output units in the network.  Here outputs is the set of output units in the network, tk is the target value of unit k for training example d, and ok is the output of unit k given training example d. swapna.c
  • 13.  The derivation of the stochastic gradient descent rule is conceptually straightforward, but requires keeping track of a number of subscripts and variables.  We will follow the notation adding a subscript j to denote to the jth unit of the network as follows:  We can use the chain rule to write swapna.c
  • 15.  We consider two cases in turn: the case where unit j is an output unit for the network, and the case where j is an internal unit.  Case 1: Training Rule for Output Unit Weights. Just as wji can influence the rest of the network only through net,, net, can influence the network only through oj. Therefore, we can invoke the chain rule again to write swapna.c
  • 17.  We have the stochastic gradient descent rule for output units swapna.c
  • 18. Case 2: Training Rule for Hidden Unit Weights.  In the case where j is an internal, or hidden unit in the network, the derivation of the training rule for wji must take into account the indirect ways in which wji can influence the network outputs and hence Ed. Notice that netj can influence the network output only through the units in Downstream(j). swapna.c
  • 20.  A variety of termination conditions can be used to halt the procedure:  One may choose to halt after a fixed number of iterations through the loop, or once the error on the training examples falls below some threshold, or once the error on a separate validation set of examples meets some criterion.  The choice of termination criterion is an important one, because too few iterations can fail to reduce error sufficiently, and too many can lead to overfitting the training data. swapna.c
  • 21. ADDING MOMENTUM  In the algorithm by making the weight update on the nth iteration depend partially on the update that occurred during the (n - 1)th iteration, as follows:  is the weight update performed during the nth iteration through the main loop of the algorithm, and 0 < < 1 is a constant called the momentum.  The 1st term in the right equation is weight-update rule and 2nd term momentum term. swapna.c
  • 22.  The effect of is to add momentum that tends to keep the ball rolling in the same direction from one iteration to the next.  This can sometimes have the effect of keeping the ball rolling through small local minima in the error surface, or along flat regions in the surface where the ball would stop if there were no momentum. It also has the effect of gradually increasing the step size of the search in regions where the gradient is unchanging, thereby speeding convergence. swapna.c
  • 23. LEARNING IN ARBITRARY ACYCLIC NETWORKS  The definition of BACKPROPAGATION applies only to two-layer networks.  In general, the value for a unit r in layer m is computed from the values at the next deeper layer m+1 according to  We really saying here is that this step may be repeated for any number of hidden layers in the network. swapna.c
  • 24.  It is equally straightforward to generalize the algorithm to any directed acyclic graph, regardless of whether the network units are arranged in uniform layers as we have assumed up to now. In the case that they are not, the rule for calculating for any internal unit is  Where Downstream(r) is the set of units immediately downstream from unit r in the network: that is, all units whose inputs include the output of unit r. swapna.c
  • 25. REMARKS ON THE BACKPROPAGATION ALGORITHM  Convergence and Local Minima:  BACKPROPAGATION over multilayer networks is only guaranteed to converge toward some local minimum in E and not necessarily to the global minimum error.  When gradient descent falls into a local minimum with respect to one of these weights, it will not necessarily be in a local minimum with respect to the other weights.  A second perspective on local minima can be gained by considering the manner in which network weights evolve as the number of training iterations increases. swapna.c
  • 26. Common heuristics to attempt to alleviate the problem of local minima include:  Add a momentum term to the weight-update rule. Momentum can sometimes carry the gradient descent procedure through narrow local minima.  Use stochastic gradient descent rather than true gradient descent.  Train multiple networks using the same data, but initializing each network with different random weights.  If the different training efforts lead to different local minima, then the network with the best performance over a separate validation data set can be selected. swapna.c
  • 27. Representational Power of FeedForward Networks  3 general rules of FeedForward network.  Boolean functions. Every Boolean function can be represented exactly by some network with two layers of units, although the number of hidden units required grows exponentially in the worst case with the number of network inputs.  Continuous functions. Every bounded continuous function can be approximated with arbitrarily small error (under a finite norm) by a network with two layers of units swapna.c
  • 28.  Arbitrary functions. Any function can be approximated to arbitrary accuracy by a network with three layers of units (Cybenko 1988). Again, the output layer uses linear units, the two hidden layers use sigmoid units, and the number of units required at each layer is not known in general.  The proof of this involves showing that any function can be approximated by a linear combination of many localized functions that have value 0 everywhere except for some small region, and then showing that two layers of sigmoid units are sufficient to produce good local approximations. swapna.c
  • 29. Hypothesis Space Search and Inductive Bias  hypothesis space is the n-dimensional Euclidean space of the n network weights. This hypothesis space is continuous, in contrast to the hypothesis spaces of decision tree learning and other methods based on discrete representations.  Inductive Bias is depends on the interplay between the gradient descent search and the way in which the weight space spans the space of represent able functions. However, one can roughly characterize it as smooth interpolation between data points. swapna.c
  • 30. Hidden Layer Representations  One intriguing property of BACKPROPAGATION its ability to discover useful intermediate representations at the hidden unit layers inside the network. swapna.c
  • 31. Generalization, Overfitting, and Stopping Criterion  BACKPROPAGATION is susceptible to overfitting the training examples at the cost of decreasing generalization accuracy over other unseen examples.  BACKPROPAGATION will often be able to create overly complex decision surfaces that fit noise in the training data or unrepresentative characteristics of the particular training sample.  Several techniques are available to address the overfitting problem for BACKPROPAGATION learning. One approach, known as weight decay, is to decrease each weight by some small factor during each iteration. swapna.c
  • 32.  One of the most successful methods for overcoming the overfitting problem is to simply provide a set of validation data to the algorithm in addition to the training data.  The algorithm monitors the error with respect to this validation set, while using the training set to drive the gradient descent search.  The problem of overfitting is most severe for small training sets. swapna.c
  • 33.  In these cases, a k-fold cross-validation approach is sometimes used, in which cross validation is performed k different times, each time using a different partitioning of the data into training and validation sets, and the results are then averaged.  In one version of this approach, them available examples are partitioned into k disjoint subsets, each of size m/k.  The cross validation procedure is then run k times, each time using a different one of these subsets as the validation set and combining the other subsets for the training set. swapna.c