2. Mid Term Syllabus
• Introduction:
– Brain and Machine, Biological neurons and its
mathematical model, Artificial Neural Networks, Benefits
and Applications, Architectures, Learning Process
(paradigms and algorithms), Correlation Matrix Memory,
Adaptation.
• Supervised learning – I:
– Pattern space and weight space, Linearly and non-linearly
separable classes, decision boundary, Hebbian learning
and limitation, Perceptron, Perceptron convergence
theorem, Logic Functions implementations
• LMS Algorithm:
– LMS Algorithm,
• Supervised Learning – II:
– Multilayer Perceptrons, XOR problem,
3. End Sem Syllabus
• Introduction: Brain and Machine, Biological neurons and its mathematical model,
Artificial Neural Networks, Benefits and Applications, Architectures, Learning
Process (paradigms and algorithms), Correlation Matrix Memory, Adaptation.
• Supervised learning – I: Pattern space and weight space, Linearly and non-linearly
separable classes, decision boundary, Hebbian learning and limitation, Perceptron,
Perceptron convergence theorem, Logic Functions implementations
• LMS Algorithm: Wiener-Hopf equations, Steepest Descent search method, LMS
Algorithm, Convergence consideration in mean and mean square, Adaline,
Learning curve, Learning rate annealing schedules
• Supervised Learning – II: Multilayer Perceptrons, Backpropagation algorithm, XOR
problem, Training modes, Optimum learning, Local minima, Network pruning
techniques
• Unsupervised learning: Clustering, Hamming networks, Maxnet, Simple
competitive learning, Winner-take-all networks, Learning Vector Quantizers,
Counterpropagation Networks, Self Organizing Maps (Kohonen Networks),
Adaptive Resonance Theory
• Associative Models: Hopfield Networks (Discrete and Continuous), Storage
Capacity, Energy function and minimization, Brain-state-in-a-box neural network
• Applications of ANN and MATLAB Simulation: Character Recognition, Control
Applications, Data Compression, Self Organizing Semantic Maps
4. References
• Neural Networks: A Comprehensive
Foundation – Simon Haykin (Pearson
Education)
• Neural Networks: A Classroom Approach –
Satish Kumar (Tata McGraw Hill)
• Fundamentals of Neural Networks – Laurene
Fausett (Pearson Education)
• ftp://ftp.sas.com/pub/neural/FAQ.html
• MATLAB neural network toolbox and related
help notes
5. Inputs to Neural Networks
• Biology
• Graph Theory
• Algorithms
• Artificial Intelligence
• Control Systems
• Signal Theory
6. Minsky’s challenge
(adapted from Minsky, Singh and Sloman (2004))
Few Number of Causes Many
Symbolic Logical Case based
Many Intractable
Reasoning Reasoning
Ordinary
Analogy Based
Number of Effects Qualitative Classical AI
Reasoning
Reasoning
Connectionist,
Few Easy Linear, Statistical Neural Network,
Fuzzy Logic
7. Who uses Neural Networks
Area Use
Computer Scientists To understand properties of non-symbolic information processing;
Learning systems
Engineers In many areas including signal processing and automatic control
Statisticians As flexible, non-linear regression and classification models
Physicists To model phenomenon in statistical mechanics and other tasks
Cognitive Scientists To describe models of thinking and consciousness and other high
level brain functions
Neuro-physiologists To describe and explore memory, sensory functions, motor functions
and other mid-level brain functions
Biologists To interpret nucleotide sequences
Philosophers etc. For their own reasons
8. Brain vs the computer
http://scienceblogs.com/developingintelligence/2007/03/why_the_brain_is_not_like_a_co.php
Brain Computer
Brains are analogue (neuronal firing rate, Computers are digital
asynchronous, leakiness)
Brain uses content-addressable memory Computers use byte addressable memory
Brain is a massively parallel machine Computers are modular and serial
Processing speed is not fixed in the brain; there is Processing speed is fixed; there is a system clock
no system clock
Short-term memory only holds pointers to long RAM has isomorphic data
term memory
No hardware/software distinction can be made Computers have a clear distinction between
with respect to the brain or mind hardware and software
Synapses are far more complex than electrical Electrical gates are simpler in function and
logic gates mechanism
Processing and memory are performed by the Processing and memory are performed by
same components in the brain different components in the computer
The brain is a self-organizing system Computers are usually not self organizing
Brains have bodies and use them Computers do not usually use their bodies
The brain capacity is much larger than any Computer capacities though large are still not
computer comparable with those of the brain
9. Neuro products and application areas
• Academia Research • Market Segmentation
• Automotive Industry • Medical Diagnosis
• Bio Informatics • Meteorological Research
• Cancer Detection • Optical Character Recognition
• Computer Gaming • Pattern Recognition
• Credit Ratings • Predicting Business Expenses
• Drug Interaction Prediction • Real Estate Evaluations
• Electrical Load Balancing • Robotics
• Financial Forecasting • Sales Forecasting
• Fraud Detection • Search Engines
• Human Resources • Software Security
• Image Recognition • Speech Recognition
• Industrial Plant Modeling • Sports Betting
• Machine Control • Sports Handicap Predictions
• Machine Diagnostics
10. Applications of ANN (Sample Examples)
• Non linear statistical data modeling tools
• Function Approximation/ Mapping
• Pattern recognition in data
• Noise Cancellation (LMS) in signaling systems
• Time Series Predictions
• Control and Steering of Autonomous Vehicles (Feedforward)
• Protein structure prediction and RNA splice junction identification
• Sonar/radar/image/astronomy/handwriting target recognition/ classification
• Call admission control for improving QOS in telecommunications (ATM) networks
• Software engineering project management
• Reinforcement Learning in Robotics (Backpropagation)
• Pattern Completion (Hopfield)
• Object recognition (Hopfield)
• Clustering and Character Recognition (ART)
• Neural Information Retrieval System(Machine parts retrieval at Boeing) (ART)
• Neural Phonetic Typewriter (SOM)
• Control of Robot Arms (SOM)
• Vector Quantization (SOM)
• Radar based classification (SOM)
• Brain Modeling (SOM)
• Feature mapping of language data (SOM)
• Organization of massive document collection (SOM)
11. Neuroscience basics I
• 100 B (10**11) neurons in brain
• Each neuron has 10K (10**4) synapses on
average
• Thus 10**15 connections
• A lifetime of 80 years is 2.5B seconds.
12. Structural organization of levels in brain
Central Nervous System
Interregional Circuits (Systems)
Local Circuits (Maps/Networks)
Neurons
Dendritic Trees
Neural microcircuits
Synapses
Molecules
28. Types of Neural Networks
Based on Learning Algorithms Supervised and Unsupervised
Associativity in Supervised Learning Auto Associative and Hetro Associative
Based on Network Topology Feed forward and feedback / recurrent
Based on kind of data accepted Categorical variables, Quantitative variables
Based on transfer function used Linear, Non-linear
Based on number of layers Single Layer, Multilayer
29. ANN Architecture Taxonomy
Linear Hebbian, Perceptron, Adaline, Higher Order, Functional Link
MLP (Multilayer
Back Propagation, Cascade Correlation, Quick Prop, RPROP
Perceptron)
Feed RBF Networks Orthogonal Least Squares
forward
CMAC Cerebellar Model Articulation Controller
LVQ (Learning Vector Quantization), PNN (Probabilistic
Classification Only
Neural Network)
Regression Only GNN (General Regression Neural Network)
Supervised BAM (Binary Associative Memory)
Boltzmann Machine
Feedback
Back Propagation through time, Elman, FIR, Jordan, Real time
Recurrent Time
recurrent network, Recurrent Back propagation, TDNN (Time
Series
Delay Neural Nets)
Competitive ARTMAP, Fuzzy ARTMAP, Gaussian ARTMAP, Counter propagation, Neocognitron
Vector Quantization Grossberg, Kohonen, Conscience
ANN
Self Organizing Map Kohonen, GTM, Local Linear
Competitive
Adaptive Resonance Theory ART1, ART2, ART2A, ART3, Fuzzy ART
DCL (Differential Competitive Learning)
Unsupervised
Dimension Reduction Hebbian, Oja, Sanger, Differential Hebbian
Auto Association Linear Auto Associator, BSB (Brain State in a Box), Hopfield
Non learning Hopfield, various networks for optimization
31. Error Correction Learning
• Error signal: ek(n) = dk(n) – yk(n)
• Control mechanism to apply a series of corrective
adjustments
• Index of performance or instantaneous value of
Error Energy: E(n) = ½ ek2(n)
• Delta rule or Widrow-Hopf rule
– Thus Δwkj(n) = ηek(n)xj(n)
• And wkj(n+1) = wkj(n) + Δwkj(n)
• Using unit delay operator: wkj(n) = z-1[wkj(n+1)]
32. Euclidean Distance
• Ordinary distance between two points that
can be measured with a ruler.
• In multi dimensional case it is the distance
between two vectors.
33. Memory based learning
• Binary pattern classification :
– with input output pairs {(xi,di)}Ni=1
• Nearest Neighbor Rule
– xN’ є {x1, x2, …, xN}
– If mini d(xi, xtest) = d(xN’, xtest)
– Where d(xi, xtest) is the Euclidean distance between the vectors xi and x test.
• Cover and Hart (1967): nearest neighbor rule for pattern classification.
Assumptions are:
– The classified examples (xi, di) are independently and identically distributed
according to the joint probability distribution of the example
– The sample size N is infinitely large
Then the probability of classification error is bounded by twice the Bayes
probability of error. , the minimum probability of error over all decision rules.
• Radial basis function network for curve fitting (approximation problem in
higher dimensional space)
34. Hebbian Learning
• Repeated or persistent firing changes synaptic weight due to
increased efficiency
• Associative learning at cellular level
– Time dependent mechanism
– Local mechanism
– Interactive mechanism
– Conjunctional or correlational mechanism
– Here Δwkj(n) = F(yk(n), xj(n))
– Hebb’s hypothesis : Δwkj(n) = η yk(n)xj(n)
– Covariance hypothesis: Δwkj(n) = η (yk – yav)(xj(n)-xav)
• Synaptic modifications can be Hebbian, Anti-Hebbian, or non-
Hebbian.
• Evidence for Hebbian learning in the Hippocampus which plays an
important role in learning and memory
35. Competitive Learning
• The O/P neurons compete among themselves to become active
• Elements of competitive learning rule (Rumelhart and Zisper
(1985))
– Sets of neurons are same except randomly distributed synaptic
weights
– Limit on strength of each neuron
– Winner takes all mechanism
• Use as feature detectors
• Has feed forward (excitatory connections)
• Has lateral (inhibitory) connections
• Here Δwkj(n) = η(xj – wkj) if neuron k wins
• = 0 if neuron k loses
36. Boltzmann Learning
• Stochastic model of a neuron
– x = +1 with probability P(v)
– = -1 with probability 1- P(v)
– P = 1/(1+ exp(-v/T)
– T is pseudo temperature use to control uncertainty in firing (noise
level)
• Stochastic learning algorithm for statistical mechanics
• Neurons in recurrent structure
• Operate in binary manner
• Energy function
– Here E= -1/2 Σ Σ wkjxkxj
• Flip a random neuron from state xk to state –xk at some
temperature with probability
• P(xk -> -xk) = 1/(1+exp(- ΔEk/T))
37. Credit Assignment Problem in Distributed Systems
• Assignment of credit or blame for overall
outcome to internal decisions
• Credit assignment problem has two parts:
– Temporal Credit Assignment Problem
– Structural Credit Assignment Problem
• Credit Assignment problem becomes more
complex in multilayer feed forward neural
nets.
38. Supervised Learning
• Knowledge is represented by a series of input-output examples
• Environment provides training vector to both teacher and Neural
Network
• Teacher or Trainer provides Desired response
• Neural Network provides Actual response
• Error Signal = Desired response – Actual response
• Adjustment is carried out iteratively to make the neural network
emulate the teacher.
• The mean square error function can be visualized as a
multidimensional error-performance surface with the free
parameters as coordinates.
• Identification of local or global minimum is done using steepest
gradient descent method.
39. Reinforcement learning/ Neuro-dynamic
Programming
(Learning with a Critic)
• Critic converts a primary reinforcement signal from
environment to a heuristic reinforcement signal
• system learns under delayed reinforcement after
observation of temporal sequences
• goal is to minimize the cumulative cost of actions over
a sequence of steps
• Problems:
– No teacher to provide desired response
– Learning machine must solve temporal credit assignment
problem
• Reinforcement learning is related to Dynamic
Programming
40. Unsupervised Learning
(Self Organized Learning)
• No external teacher or critic
• Provision for task independent measure of quality of
learning
• Free parameters are optimized with respect to that
measure
• Network becomes tuned to statistical regularities in data
• It develops ability to form internal representations for
encoding features of input and create new classes
automatically
• Competitive Learning rule is used for Unsupervised learning
• Two layers: input layer and competitive layer
41. Learning Applications
• Pattern Association
• Pattern Recognition
• Function Approximation
• Control
• Filtering
• Beam forming
42. Pattern Association
• Cognition uses association in distributed memory :
– xk -> yk ; key pattern -> memorized pattern
– Two phases:
• storage phase (training)
• recall phase (noisy or distorted version of key pattern presented)
• y= yj (Perfect recall)
• y ≠ yj for x =xj (error)
• Two types:
– Auto associative memory:
• Output set of patterns is the same as input set: yk = xk
• Used for pattern retrieval
• Input and output spaces have same dimensionality
• Uses unsupervised learning
– Hetero associative memory:
• Output set of patterns is the different from input set: yk ≠ xk
• Used in other Pattern Association
• Input and output spaces may or may not have same dimensionality
• Uses supervised learning
43. Pattern Recognition
• Process whereby a received pattern is assigned to a prescribed number of
classes (categories)
• Two stages:
– Training Session
– New patterns
• Patterns can be considered as points in multidimensional decision space
(MDS)
• MDS is divided into regions, each associated with a class
• Decision boundaries are determined by the training process
• Boundary definition is by a statistical mechanism due to variability
between classes
• Machine has two parts:
– Feature Extraction (Unsupervised network)
– Classification (Supervised network)
– m-dimensional observation (data) space -> q-dimensional feature space -> r
dimensional decision space
• Approaches:
– Single layered feed forward network using a supervised learning algorithm
– Feature extraction is done in the hidden layer
44. Function Approximation
• I/O mapping: d=f(x)
• Function f(.) is unknown
• Set of labeled examples are available
– T= {(xi, di)}N i=1
• ||F(x) –f(x)|| < ε for all x
• Used in
– System model identification
– Inverse system model identification
45. Control
• Ref signal is compared with feedback signal
• Error signal e is fed to neural network controller
• O/P of NNC u is fed to plant as input
• Plant output is y (part of which is sent as
feedback)
• J={dyk/duj} (partial differential)
• Two approaches:
– Indirect Learning
– Direct Learning
46. Filtering
• To extract information from noisy data
• Filter used for:
– Filtering (for getting current data based on past data)
– Smoothing (for getting current data based on future data)
– Prediction (for forecasting future data based on current and past data)
• In filtering
– Cocktail party problem
– Blind signal separation
– Here x(n) = A u(n), were A = mixing matrix
– Need a de mixing W to recover the original signal
• In prediction
– Error correction learning
– x(n) provides the desired response and used for training
– A form of model building, where network acts as model
– When prediction is non-linear; NNs are a powerful method because non-linear
processing units can be used for its construction
– However if dynamic range of the time series is unknown, linear output unit is
the most reasonable choice
47. Beam forming
• Spatial form of filtering
• To provide attentional selectivity in the presence of noise
• Used in radar and sonar systems
• Detect and track a target of interest in the presence of receiver
noise and interfering signals (e.g. from jammers)
• Task is complicated by:
– Target signal can be from an unknown direction
– No prior information about interfering signals
• Generalized Side Lobe Canceller (GSLC) consisting of:
– Array of antenna elements: which samples the observed signals
– A linear combiner: acts as a spatial filter and provides the desired
response (i.e. for main lobe)
– A signal blocking matrix: to cancel leakage from side lobes
– A neural network : to accommodate variations in interfering signals
• Neural network adjusts its free parameters and acts as an
attentional neurocomputer.
48. Associative Memory
• Memory is relatively enduring neural alterations induced by the
interaction of an organism with its environment.
• Activity must be stored in memory through a learning process
• Memory may be short term or long term
• Associative memory
– Distributed
– Stimulus (key) pattern and response (stored) pattern vectors
– Information is stored in memory by setting up a spatial pattern of
neural activities across a large number of neurons
– Information in stimulus also contains storage location and address for
retrieval
– High degree of resistance to noise and damage of a diffusive kind
– May be interactions between different patterns stored in memory and
thus errors in recall process
49. Memory and noise
• For a linear network yk = W(k)xk
• Total experience gained M = Σk=1..q W(k)
• Memory matrix Mk = Mk-1 + W(k); k = 1..q
• Estimate of memory matrix Me = Σk=1..q ykxkT
• Correlation matrix memory Me= YXT
• X = key matrix; Y = memorized matrix
• Recall : y= Mxj
• y = yj + vj ; vj = noise vector is due to cross talk
between key vector xj and all other key vectors
stored in memory
• For a linear signal space cosine of angle between
vectors xj and xk cos(xk,xj) = xkTxj/(|xk|.|xj|)
• Noise vector vj = Σk=1..m cos(xk,xj)yk
50. Orthogonality, Community and Errors
• The memory associates perfectly (noise vector is
zero) when the key vectors are orthogonal, i.e.
xkTxj = {1 when k=j and 0 when k≠j}
• If key patterns are not orthogonal or highly
separated it leads to confusion and errors
• Community of set of patterns {xkey } can be such
that xkTxj >= ᵞ for k≠j
• If the lower bound ᵞis large enough, the
memory may fail to distinguish the response y
from any other key pattern contained in the set
{xkey}
51. Adaptation
• Spatiotemporal nature of learning
• Temporal structure of experience from insects to humans, thus animal can
adapt its behavior
• In time-stationary environment,
– supervised learning possible,
– synaptic weights can be frozen after learning
– learning system relies on memory
• In non-time-stationary environments
– supervised learning inadequate
– network needs a way to track the statistical variations in environment with
time
– desirable for neural network to continually adapt its free parameters to
respond in real time
– this requires continuous learning
– Linear adaptive filters perform continuous learning
• Used in radar, sonar, communications, seismology, biomedical signal processsing
• In a mature state of development
• Nonlinear adaptive filters, development not yet mature.
52. Pseudo stationary process
• Neural network requires stable time for computation
• How can it adapt to signals varying in time?
• Many non stationary processes change slowly enough for the process to
be considered pseudo stationary over a window of short enough duration.
– Speech signal: 10 – 30 ms
– Radar returns from ocean surface: few seconds
– Long range weather forecasting: few minutes
– Long range stock market trends: few days
• Retrain network at regular intervals, dynamic approach
– Select a window short enough for data to be considered pseudo stationary
– Use the sampled data to train the network
– Keep data samples in a FIFO, add new sample and drop oldest data sample
– Use updated data window to retrain and repeat
• Network undergoes continual training with time ordered examples
• Non linear filter : a generalization of linear adaptive filters
• Resources available must be fast enough to complete the compute in one
sampling period.
56. Perceptron Convergence Theorem
• 1: Initialization : set w(0) = 0
• 2: Activation: at time step n, activate the perceptron by applying
continuous valued input vector x(n) and desired response d(n)
• 3: Computation of Actual Response: Compute the actual response
of the perceptron
– y(n) = sgn(wT(n)x(n))
• 4: Adaptation of weight vector: Update the weight vector of the
perceptron:
– w(n+1) = w(n) + η[d(n) – y(n)]x(n)
– Where
– D(n) = +1 if x(n) belongs to class C1
– = -1 if x(n) belongs to class C2
• Continuation: Increment time step n by one and go back to step 2
57. LMS Rule
• Also known as:
– Delta rule
– Adaline rule
– Widrow Hopf rule
58. Neural Network Hardware
• Hardware runs orders of magnitude faster than software
• Two approaches:
– General, but probably expensive, system that can be
reprogrammed for many kinds of tasks
• e.g. Adaptive Solutions CNAPS
– Specialized but cheap chip to do one thing very quickly and
efficiently.
• e.g. IBM ZISC
• Number of neurons vary from 10 to 10**6
• Precision is mostly limited to 16 bit fixed point for weights
and 8 bit fixed point for outputs
• Recurrent NNs may require output of >16 bits
• Performance is measured in
– number of multiply and accumulate operations in unit time
(MCPS: millions of connections per second)
– Rate of weight updates (MCUPS: millions of connections update
per second)
59. NN Hardware categories
• Neurocomputers
– Standard chips
• Sequential + Accelerator
• Multiprocessor
– Neuro chips
• Analog
• Digital
• Hybrid
60. Hardware Implementation
(Accelerator Boards)
• Accelerator boards
– Most frequently used neural commercial hardware
• Relatively cheap
• Widely available
• Simple to connect to PCs, workstations
• Have user friendly software tools
• However usually specialized for certain tasks and may lack flexibility
– Based on neural network chips
• IBM ZISC036 : 36 neurons; RBF network; RCE (or ROI algorithm)
• PCI card: 19 chips, 684 prototypes,
• Can process 165,000 patterns per second; where patterns are 64 8-bit element
vectors.
• SAIC Sigma-1
• Neuro Turbo
• HNC
– Some use just fast DSPs
61. Hardware Implementation
(General Purpose Processors)
• Neuro computers built from general purpose
Processors
– BSP400
– COKOS
– RAP (Ring Array Processor)
• Used for development of connectionist algorithms for
speech recognition
• 4 to 40 TMS320C20 DSPs
• Connected via ring of Xilinx FPGAs
• VME bus to connect to host computer
• 57 MCPS in feed forward mode
• 13.2 MCPS in back propagation training