SlideShare a Scribd company logo
1 of 40
3/29/2023 236875 Visual Recognition 1
Radial Basis Function Networks
Computer Science,
KAIST
3/29/2023 236875 Visual Recognition 2
contents
• Introduction
• Architecture
• Designing
• Learning strategies
• MLP vs RBFN
3/29/2023 236875 Visual Recognition 3
introduction
• Completely different approach by viewing the
design of a neural network as a curve-fitting
(approximation) problem in high-dimensional
space ( I.e MLP )
3/29/2023 236875 Visual Recognition 4
In MLP
introduction
3/29/2023 236875 Visual Recognition 5
In RBFN
introduction
3/29/2023 236875 Visual Recognition 6
Radial Basis Function Network
• A kind of supervised neural networks
• Design of NN as curve-fitting problem
• Learning
– find surface in multidimensional space best fit to
training data
• Generalization
– Use of this multidimensional surface to interpolate the
test data
introduction
3/29/2023 236875 Visual Recognition 7
Radial Basis Function Network
• Approximate function with linear combination of
Radial basis functions
F(x) = S wi h(x)
• h(x) is mostly Gaussian function
introduction
3/29/2023 236875 Visual Recognition 8
architecture
Input layer Hidden layer Output layer
x1
x2
x3
xn
h1
h2
h3
hm
f(x)
W1
W2
W3
Wm
3/29/2023 236875 Visual Recognition 9
Three layers
• Input layer
– Source nodes that connect to the network to its
environment
• Hidden layer
– Hidden units provide a set of basis function
– High dimensionality
• Output layer
– Linear combination of hidden functions
architecture
3/29/2023 236875 Visual Recognition 10
Radial basis function
hj(x) = exp( -(x-cj)2 / rj
2 )
f(x) =  wjhj(x)
j=1
m
Where cj is center of a region,
rj is width of the receptive field
architecture
3/29/2023 236875 Visual Recognition 11
designing
• Require
– Selection of the radial basis function width parameter
– Number of radial basis neurons
3/29/2023 236875 Visual Recognition 12
Selection of the RBF width para.
• Not required for an MLP
• smaller width
– alerting in untrained test data
• Larger width
– network of smaller size & faster execution
designing
3/29/2023 236875 Visual Recognition 13
Number of radial basis neurons
• By designer
• Max of neurons = number of input
• Min of neurons = ( experimentally determined)
• More neurons
– More complex, but smaller tolerance
designing
3/29/2023 236875 Visual Recognition 14
learning strategies
• Two levels of Learning
– Center and spread learning (or determination)
– Output layer Weights Learning
• Make # ( parameters) small as possible
– Curse of Dimensionality
3/29/2023 236875 Visual Recognition 15
Various learning strategies
• how the centers of the radial-basis functions of the
network are specified.
• Fixed centers selected at random
• Self-organized selection of centers
• Supervised selection of centers
learning strategies
3/29/2023 236875 Visual Recognition 16
Fixed centers selected at random(1)
• Fixed RBFs of the hidden units
• The locations of the centers may be chosen
randomly from the training data set.
• We can use different values of centers and widths
for each radial basis function -> experimentation
with training data is needed.
learning strategies
3/29/2023 236875 Visual Recognition 17
Fixed centers selected at random(2)
• Only output layer weight is need to be learned.
• Obtain the value of the output layer weight by
pseudo-inverse method
• Main problem
– Require a large training set for a satisfactory level of
performance
learning strategies
3/29/2023 236875 Visual Recognition 18
Self-organized selection of centers(1)
• Hybrid learning
– self-organized learning to estimate the centers of RBFs
in hidden layer
– supervised learning to estimate the linear weights of the
output layer
• Self-organized learning of centers by means of
clustering.
• Supervised learning of output weights by LMS
algorithm.
learning strategies
3/29/2023 236875 Visual Recognition 19
Self-organized selection of centers(2)
• k-means clustering
1. Initialization
2. Sampling
3. Similarity matching
4. Updating
5. Continuation
learning strategies
3/29/2023 236875 Visual Recognition 20
Supervised selection of centers
• All free parameters of the network are changed by
supervised learning process.
• Error-correction learning using LMS algorithm.
learning strategies
3/29/2023 236875 Visual Recognition 21
Learning formula
learning strategies
• Linear weights (output layer)
• Positions of centers (hidden layer)
• Spreads of centers (hidden layer)






 N
j
C
i
j
j
i
i
n
G
n
e
n
w
n
1
)
||
)
(
(||
)
(
)
(
)
(
t
x M
i
n
w
n
E
n
w
n
w
i
i
i ,...,
2
,
1
,
)
(
)
(
)
(
)
1
( 1 




 




S



 N
j
i
j
i
C
i
j
j
i
i
n
n
G
n
e
n
w
n
n
E
i
1
1
'
)]
(
[
)
||
)
(
(||
)
(
)
(
2
)
(
)
(
t
x
t
x
t
M
i
n
n
E
n
n
i
i
i ,...,
2
,
1
,
)
(
)
(
)
(
)
1
( 2 





t
t
t 






S

 N
j
ji
C
i
j
j
i
i
n
n
G
n
e
n
w
n
n
E
i
1
'
1
)
(
)
||
)
(
(||
)
(
)
(
)
(
)
(
Q
t
x T
i
j
i
j
ji n
n
n )]
(
)][
(
[
)
( t
x
t
x
Q 


)
(
)
(
)
(
)
1
( 1
3
1
1
n
n
E
n
n
i
i
i 


S



S


S 
3/29/2023 236875 Visual Recognition 22
MLP vs RBFN
Global hyperplane Local receptive field
EBP LMS
Local minima Serious local minima
Smaller number of hidden
neurons
Larger number of hidden
neurons
Shorter computation time Longer computation time
Longer learning time Shorter learning time
3/29/2023 236875 Visual Recognition 23
Approximation
• MLP : Global network
– All inputs cause an output
• RBF : Local network
– Only inputs near a receptive field produce an activation
– Can give “don’t know” output
MLP vs RBFN
3/29/2023 236875 Visual Recognition 24
Gaussian Mixture
• Given a finite number of data points xn, n=1,…N, draw
from an unknown distribution, the probability function p(x)
of this distribution can be modeled by
– Parametric methods
• Assuming a known density function (e.g., Gaussian) to start
with, then
• Estimate their parameters by maximum likelihood
• For a data set of N vectors c={x1,…, xN} drawn independently
from the distribution p(x|q), the joint probability density of
the whole data set c is given by
)
(
)
|
(
)
|
(
1
q
q
q
c L
p
p
N
n
n

 

x
3/29/2023 236875 Visual Recognition 25
Gaussian Mixture
• L(q) can be viewed as a function of q for fixed c, in other
words, it is the likelihood of q for the given c
• The technique of maximum likelihood sets the value of q by
maximizing L(q).
• In practice, often, the negative logarithm of the likelihood is
considered, and the minimum of E is found.
• For normal distribution, the estimated parameters can be found
by analytic differentiation of E: )
|
(
ln
)
(
ln
1
q
q 




N
n
n
p
L
E x
T
x
x
x
)
)(
(
1
1
1
1














S

n
n
N
n
N
n
n
N
N
3/29/2023 236875 Visual Recognition 26
Gaussian Mixture
• Non-parametric methods
– Histograms
An illustration of the histogram
approach to density estimation. The
set of 30 sample data points are
drawn from the sum of two normal
distribution, with means 0.3 and 0.8,
standard deviations 0.1 and
amplitudes 0.7 and 0.3 respectively.
The original distribution is shown
by the dashed curve, and the
histogram estimates are shown by
the rectangular bins. The number M
of histogram bins within the given
interval determines the width of the
bins, which in turn controls the
smoothness of the estimated density.
3/29/2023 236875 Visual Recognition 27
Gaussian Mixture
–Density estimation by basis functions, e.g., Kenel
functions, or k-nn
(a) kernel function, (b) K-nn
Examples of kernel and K-nn approaches to density estimation.
3/29/2023 236875 Visual Recognition 28
• Discussions
• Parametric approach assumes a specific form for the
density function, which may be different from the true
density, but
• The density function can be evaluated rapidly for new input
vectors
• Non-parametric methods allows very general forms of
density functions, thus the number of variables in the
model grows directly with the number of training data
points.
•The model can not be rapidly evaluated for new input vectors
• Mixture model is a combine of both: (1) not restricted to
specific functional form, and (2) yet the size of the
model only grows with the complexity of the problem
being solved, not the size of the data set.
Gaussian Mixture
3/29/2023 236875 Visual Recognition 29
Gaussian Mixture
• The mixture model is a linear combination of component
densities p(x| j ) in the form
density
l
conditiona
-
class
a
as
regarded
be
can
)
|
x
(
hence
and
1,
x
)
|
x
(
normalized
are
function
density
component
the
1
)
(
0
and
,
1
)
(
,
point
data
of
parameters
ixing
the
is
)
(
)
(
)
|
x
(
)
x
(
1
1










j
p
d
j
p
j
P
j
P
m
j
P
j
P
j
p
p
M
j
M
j
x
3/29/2023 236875 Visual Recognition 30
Gaussian Mixture
• The key difference between the mixture model representation and a
true classification problem lies in the nature of the training data, since
in this case we are not provided with any “class labels” to say which
component was responsible for generating each data point.
• This is so called the representation of “incomplete data”
• However, the technique of mixture modeling can be applied separately
to each class-conditional density p(x|Ck) in a true classification
problem.
• In this case, each class-conditional density p(x|Ck) is represented by an
independent mixture model of the form
)
(
)
|
x
(
)
x
(
1
j
P
j
p
p
M
j



3/29/2023 236875 Visual Recognition 31
Gaussian Mixture
• Analog to conditional densities and using Bayes’ theorem, the posterior
Probabilities of the component densities can be derived as
• The value of P(j|x) represents the probability that a component j was
responsible for generating the data point x.
• Limited to the Gaussian distribution, each individual component
densities are given by :
• Determine the parameters of Gaussian Mixture methods:
(1) maximum likelihood, (2) EM algorithm.




M
j
j
P
p
j
P
j
p
j
P
1
.
1
)
x
|
(
and
,
)
x
(
)
(
)
|
x
(
)
x
|
(
.
matrix
e
convarianc
and
mean
a
with
,
2
exp
)
2
(
1
)
|
(
2
j
2
2
2
/
2
I
x
x
j
j
j
j
d
j
j
p

S














 


3/29/2023 236875 Visual Recognition 32
Gaussian Mixture
Representation of the mixture model in terms of a
network diagram. For a component densities p(x|j), lines
connecting the inputs xi to the component p(x|j) represents
the elements ji of the corresponding mean vectors j of the
component j.
3/29/2023 236875 Visual Recognition 33
Maximum likelihood
• The mixture density contains adjustable parameters: P(j), j and j where
j=1, …,M.
• The negative log-likelihood for the data set {xn} is given by:
• Maximizing the likelihood is then equivalent to minimizing E
• Differentiation E with respect to
–the centres j :
–the variances j :
 
  
 











N
n
M
j
n
N
n
n
j
P
j
p
p
L
E
1 1
1
)
(
)
(
ln
)
(
ln
ln x
x





 N
n j
n
j
n
j
j
P
E
1
2
)
(
)
(



x
x

 








 



 N
n j
j
n
j
n
j
d
j
P
E
1
3
2
)
(




x
x
3/29/2023 236875 Visual Recognition 34
•Minimizing of E with respect to to the mixing parameters P(j),
must subject to the constraints S P(j) =1, and 0< P(j) <1. This
can be alleviated by changing P(j) in terms a set of M
auxiliary variables {gj} such that:
• The transformation is called the softmax function, and
• the minimization of E with respect to gj is
•using chain rule in the form
• then,






 
j
j
P M
k k
j
g
g
g
,
)
exp(
)
exp(
)
(
1
),
(
)
(
)
(
)
(
k
P
j
P
j
P
k
P
jk
j





g
j
M
k
j
k
P
k
P
E
E
g





g




)
(
)
(
1






 N
n
n
j
j
P
j
P
E
1
)}
(
)
(
{ x
g
Maximum likelihood
3/29/2023 236875 Visual Recognition 35



n
n
n
n
n
j
j
P
j
P
)
(
)
(
ˆ
x
x
x

• Setting we obtain
• Setting
• Setting
• These formulai give some insight of the maximum likelihood
solution, they do not provide a direct method for calculating
the parameters, i.e., these formulai are in terms of P(j|x).
• They do suggest an iterative scheme for finding the minimal
of E
,
0




i
E
then
,
0




j
E

 

n
n
n j
n
n
j
j
P
j
P
d )
(
ˆ
)
(
1
ˆ
2
2
x
x
x 

then
,
0

g


j
E



N
n
n
j
P
N
j
P
1
)
(
1
)
(
ˆ x
Maximum likelihood
3/29/2023 236875 Visual Recognition 36
Maximum likelihood
• we can make some initial guess for the parameters, and use these
formula to compute a revised value of the parameters.
• Then, using P(j|xn) to estimate new parameters,
• Repeats these processes until converges
)
(
compute
to
theorem
Bayes'
and
),
(
,
using
-
and
,
)
(
compute
to
and
,
using
-
n
n
n
j|
P
|j
p
(j)
P
|j
p
x
x
x





3/29/2023 236875 Visual Recognition 37
The EM algorithm
 )
 )










n
n
old
n
new
old
new
p
p
E
E
x
x
ln
• The iteration process consists of (1) expectation and (2)
maximization steps, thus it is called EM algorithm.
• We can write the change in error of E, in terms of old and
new parameters by:
• Using we can rewrite this as follows
• Using Jensen’s inequality: given a set of numbers lj  0,
• such that Sjlj=1,
)
(
)
|
x
(
)
x
(
1
j
P
j
p
p
M
j



 )  )
 )
 )
 )















n
n
old
n
old
n
old
j
n
new
new
old
new
j
p
j
p
p
j
p
j
P
E
E
x
x
x
x
ln
 )

 








j
j
j
j
j
j x
x ln
ln l
l
3/29/2023 236875 Visual Recognition 38














n j
n
old
n
old
n
new
new
n
old
old
new
j
p
p
j
p
j
P
j
p
E
E
)
(
)
(
)
(
)
(
ln
)
(
x
x
x
x
• Consider Pold(j|x) as lj, then the changes of E gives
• Let Q = , then , and is an
upper bound of Enew.
• As shown in figure, minimizing Q will lead to a decrease of
Enew, unless Enew is already at a local minimum.



S
S
 old
j
n
p Q
E
E old
new

 Q
Eold

Schematic plot of the error function E as a
function of the new value qnew of one of the
parameters of the mixture model. The curve
Eold + Q(qnew) provides an upper bound on the
value of E (qnew) and the EM algorithm
involves finding the minimum value of this
upper bound.
The EM algorithm
3/29/2023 236875 Visual Recognition 39
 )  )  )
 



n j
n
new
new
n
old
j
p
j
P
j
p
Q x
x ln
~
 )  )
 )
 









 




n j
new
j
new
j
n
new
j
new
n
old
const
d
j
P
j
p
Q .
2
ln
ln
~
2
2



x
x
• Let’s drop terms in Q that depends on only old parameters,
and rewrite Q as
• the smallest value for the upper bound is found by
minimizing this quantity
• for the Gaussian mixture model, the quality can be
• we can now minimize this function with respect to ‘new’
parameters, and they are:
Q
~
Q
~
 )
 ) ,



n
n
old
n
n
n
old
new
j
j
P
j
P
x
x
x
  )
 )
 )

 

n
n
old
n
new
j
n
n
old
new
j
j
P
j
P
d x
x
x
2
2 1 

The EM algorithm
3/29/2023 236875 Visual Recognition 40
 ) 









 
j
new
j
P
Q
Z 1
ˆ l
 )
 )
l


 
n
new
n
old
j
P
j
P x
0
 )  )


n
n
old
new
j
P
N
j
P x
1
• For the mixing parameters Pnew (j), the constraint SjPnew (j)=1
can be considered by using the Lagrange multiplier l and
minimizing the combined function
• Setting the derivative of Z with respect to Pnew (j) to zero,
• using SjPnew (j)=1 and SjPold (j|xn)=1, we obtain l = N, thus
• Since the SjPold (j|xn) term is on the right side, thus this
results are ready for iteration computation
• Exercise 2: shown on the nets
The EM algorithm

More Related Content

Similar to RBF2.ppt

대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
NAVER Engineering
 
Single to multiple kernel learning with four popular svm kernels (survey)
Single to multiple kernel learning with four popular svm kernels (survey)Single to multiple kernel learning with four popular svm kernels (survey)
Single to multiple kernel learning with four popular svm kernels (survey)
eSAT Journals
 
Machine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by stepMachine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by step
SanjanaSaxena17
 
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Maxim Kazantsev
 

Similar to RBF2.ppt (20)

Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 AlgorithmSelf Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
 
multiarmed bandit.ppt
multiarmed bandit.pptmultiarmed bandit.ppt
multiarmed bandit.ppt
 
Computational Giants_nhom.pptx
Computational Giants_nhom.pptxComputational Giants_nhom.pptx
Computational Giants_nhom.pptx
 
Machine learning meetup
Machine learning meetupMachine learning meetup
Machine learning meetup
 
DS9 - Clustering.pptx
DS9 - Clustering.pptxDS9 - Clustering.pptx
DS9 - Clustering.pptx
 
Understandig PCA and LDA
Understandig PCA and LDAUnderstandig PCA and LDA
Understandig PCA and LDA
 
Clustering introduction
Clustering introductionClustering introduction
Clustering introduction
 
introduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptxintroduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptx
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering Algorithm
 
Single to multiple kernel learning with four popular svm kernels (survey)
Single to multiple kernel learning with four popular svm kernels (survey)Single to multiple kernel learning with four popular svm kernels (survey)
Single to multiple kernel learning with four popular svm kernels (survey)
 
Machine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by stepMachine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by step
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
 
Szeliski_NLS1.ppt
Szeliski_NLS1.pptSzeliski_NLS1.ppt
Szeliski_NLS1.ppt
 
Second subjective assignment
Second  subjective assignmentSecond  subjective assignment
Second subjective assignment
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithms
 
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

 
Implementing Minimum Error Rate Classifier
Implementing Minimum Error Rate ClassifierImplementing Minimum Error Rate Classifier
Implementing Minimum Error Rate Classifier
 
Unsupervised Learning.pptx
Unsupervised Learning.pptxUnsupervised Learning.pptx
Unsupervised Learning.pptx
 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning
 

More from ssuserec53e73

Threats in network that can be noted in security
Threats in network that can be noted in securityThreats in network that can be noted in security
Threats in network that can be noted in security
ssuserec53e73
 
Lsn21_NumPy in data science using python
Lsn21_NumPy in data science using pythonLsn21_NumPy in data science using python
Lsn21_NumPy in data science using python
ssuserec53e73
 
OpenSecure socket layerin cyber security
OpenSecure socket layerin cyber securityOpenSecure socket layerin cyber security
OpenSecure socket layerin cyber security
ssuserec53e73
 
Hash functions, digital signatures and hmac
Hash functions, digital signatures and hmacHash functions, digital signatures and hmac
Hash functions, digital signatures and hmac
ssuserec53e73
 
50134147-Knowledge-Representation-Using-Rules.ppt
50134147-Knowledge-Representation-Using-Rules.ppt50134147-Knowledge-Representation-Using-Rules.ppt
50134147-Knowledge-Representation-Using-Rules.ppt
ssuserec53e73
 

More from ssuserec53e73 (20)

Threats in network that can be noted in security
Threats in network that can be noted in securityThreats in network that can be noted in security
Threats in network that can be noted in security
 
Lsn21_NumPy in data science using python
Lsn21_NumPy in data science using pythonLsn21_NumPy in data science using python
Lsn21_NumPy in data science using python
 
OpenSecure socket layerin cyber security
OpenSecure socket layerin cyber securityOpenSecure socket layerin cyber security
OpenSecure socket layerin cyber security
 
Hash functions, digital signatures and hmac
Hash functions, digital signatures and hmacHash functions, digital signatures and hmac
Hash functions, digital signatures and hmac
 
Asian Elephant Adaptations - Chelsea P..pptx
Asian Elephant Adaptations - Chelsea P..pptxAsian Elephant Adaptations - Chelsea P..pptx
Asian Elephant Adaptations - Chelsea P..pptx
 
Module 10-Introduction to OOP.pptx
Module 10-Introduction to OOP.pptxModule 10-Introduction to OOP.pptx
Module 10-Introduction to OOP.pptx
 
unit-1-l3.ppt
unit-1-l3.pptunit-1-l3.ppt
unit-1-l3.ppt
 
AI.ppt
AI.pptAI.ppt
AI.ppt
 
50134147-Knowledge-Representation-Using-Rules.ppt
50134147-Knowledge-Representation-Using-Rules.ppt50134147-Knowledge-Representation-Using-Rules.ppt
50134147-Knowledge-Representation-Using-Rules.ppt
 
Dr Jose Reena K.pdf
Dr Jose Reena K.pdfDr Jose Reena K.pdf
Dr Jose Reena K.pdf
 
Enumeration.pptx
Enumeration.pptxEnumeration.pptx
Enumeration.pptx
 
footscan.PPT
footscan.PPTfootscan.PPT
footscan.PPT
 
UNIT II.pptx
UNIT II.pptxUNIT II.pptx
UNIT II.pptx
 
Unit 1 iot.pptx
Unit 1 iot.pptxUnit 1 iot.pptx
Unit 1 iot.pptx
 
IoT Reference Architecture.pptx
IoT Reference Architecture.pptxIoT Reference Architecture.pptx
IoT Reference Architecture.pptx
 
patent ppt.pptx
patent ppt.pptxpatent ppt.pptx
patent ppt.pptx
 
Introduction to measurement.pptx
Introduction to measurement.pptxIntroduction to measurement.pptx
Introduction to measurement.pptx
 
ML-DecisionTrees.ppt
ML-DecisionTrees.pptML-DecisionTrees.ppt
ML-DecisionTrees.ppt
 
ML_Lecture_7.ppt
ML_Lecture_7.pptML_Lecture_7.ppt
ML_Lecture_7.ppt
 
070308-simmons.ppt
070308-simmons.ppt070308-simmons.ppt
070308-simmons.ppt
 

Recently uploaded

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Recently uploaded (20)

Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 

RBF2.ppt

  • 1. 3/29/2023 236875 Visual Recognition 1 Radial Basis Function Networks Computer Science, KAIST
  • 2. 3/29/2023 236875 Visual Recognition 2 contents • Introduction • Architecture • Designing • Learning strategies • MLP vs RBFN
  • 3. 3/29/2023 236875 Visual Recognition 3 introduction • Completely different approach by viewing the design of a neural network as a curve-fitting (approximation) problem in high-dimensional space ( I.e MLP )
  • 4. 3/29/2023 236875 Visual Recognition 4 In MLP introduction
  • 5. 3/29/2023 236875 Visual Recognition 5 In RBFN introduction
  • 6. 3/29/2023 236875 Visual Recognition 6 Radial Basis Function Network • A kind of supervised neural networks • Design of NN as curve-fitting problem • Learning – find surface in multidimensional space best fit to training data • Generalization – Use of this multidimensional surface to interpolate the test data introduction
  • 7. 3/29/2023 236875 Visual Recognition 7 Radial Basis Function Network • Approximate function with linear combination of Radial basis functions F(x) = S wi h(x) • h(x) is mostly Gaussian function introduction
  • 8. 3/29/2023 236875 Visual Recognition 8 architecture Input layer Hidden layer Output layer x1 x2 x3 xn h1 h2 h3 hm f(x) W1 W2 W3 Wm
  • 9. 3/29/2023 236875 Visual Recognition 9 Three layers • Input layer – Source nodes that connect to the network to its environment • Hidden layer – Hidden units provide a set of basis function – High dimensionality • Output layer – Linear combination of hidden functions architecture
  • 10. 3/29/2023 236875 Visual Recognition 10 Radial basis function hj(x) = exp( -(x-cj)2 / rj 2 ) f(x) =  wjhj(x) j=1 m Where cj is center of a region, rj is width of the receptive field architecture
  • 11. 3/29/2023 236875 Visual Recognition 11 designing • Require – Selection of the radial basis function width parameter – Number of radial basis neurons
  • 12. 3/29/2023 236875 Visual Recognition 12 Selection of the RBF width para. • Not required for an MLP • smaller width – alerting in untrained test data • Larger width – network of smaller size & faster execution designing
  • 13. 3/29/2023 236875 Visual Recognition 13 Number of radial basis neurons • By designer • Max of neurons = number of input • Min of neurons = ( experimentally determined) • More neurons – More complex, but smaller tolerance designing
  • 14. 3/29/2023 236875 Visual Recognition 14 learning strategies • Two levels of Learning – Center and spread learning (or determination) – Output layer Weights Learning • Make # ( parameters) small as possible – Curse of Dimensionality
  • 15. 3/29/2023 236875 Visual Recognition 15 Various learning strategies • how the centers of the radial-basis functions of the network are specified. • Fixed centers selected at random • Self-organized selection of centers • Supervised selection of centers learning strategies
  • 16. 3/29/2023 236875 Visual Recognition 16 Fixed centers selected at random(1) • Fixed RBFs of the hidden units • The locations of the centers may be chosen randomly from the training data set. • We can use different values of centers and widths for each radial basis function -> experimentation with training data is needed. learning strategies
  • 17. 3/29/2023 236875 Visual Recognition 17 Fixed centers selected at random(2) • Only output layer weight is need to be learned. • Obtain the value of the output layer weight by pseudo-inverse method • Main problem – Require a large training set for a satisfactory level of performance learning strategies
  • 18. 3/29/2023 236875 Visual Recognition 18 Self-organized selection of centers(1) • Hybrid learning – self-organized learning to estimate the centers of RBFs in hidden layer – supervised learning to estimate the linear weights of the output layer • Self-organized learning of centers by means of clustering. • Supervised learning of output weights by LMS algorithm. learning strategies
  • 19. 3/29/2023 236875 Visual Recognition 19 Self-organized selection of centers(2) • k-means clustering 1. Initialization 2. Sampling 3. Similarity matching 4. Updating 5. Continuation learning strategies
  • 20. 3/29/2023 236875 Visual Recognition 20 Supervised selection of centers • All free parameters of the network are changed by supervised learning process. • Error-correction learning using LMS algorithm. learning strategies
  • 21. 3/29/2023 236875 Visual Recognition 21 Learning formula learning strategies • Linear weights (output layer) • Positions of centers (hidden layer) • Spreads of centers (hidden layer)        N j C i j j i i n G n e n w n 1 ) || ) ( (|| ) ( ) ( ) ( t x M i n w n E n w n w i i i ,..., 2 , 1 , ) ( ) ( ) ( ) 1 ( 1            S     N j i j i C i j j i i n n G n e n w n n E i 1 1 ' )] ( [ ) || ) ( (|| ) ( ) ( 2 ) ( ) ( t x t x t M i n n E n n i i i ,..., 2 , 1 , ) ( ) ( ) ( ) 1 ( 2       t t t        S   N j ji C i j j i i n n G n e n w n n E i 1 ' 1 ) ( ) || ) ( (|| ) ( ) ( ) ( ) ( Q t x T i j i j ji n n n )] ( )][ ( [ ) ( t x t x Q    ) ( ) ( ) ( ) 1 ( 1 3 1 1 n n E n n i i i    S    S   S 
  • 22. 3/29/2023 236875 Visual Recognition 22 MLP vs RBFN Global hyperplane Local receptive field EBP LMS Local minima Serious local minima Smaller number of hidden neurons Larger number of hidden neurons Shorter computation time Longer computation time Longer learning time Shorter learning time
  • 23. 3/29/2023 236875 Visual Recognition 23 Approximation • MLP : Global network – All inputs cause an output • RBF : Local network – Only inputs near a receptive field produce an activation – Can give “don’t know” output MLP vs RBFN
  • 24. 3/29/2023 236875 Visual Recognition 24 Gaussian Mixture • Given a finite number of data points xn, n=1,…N, draw from an unknown distribution, the probability function p(x) of this distribution can be modeled by – Parametric methods • Assuming a known density function (e.g., Gaussian) to start with, then • Estimate their parameters by maximum likelihood • For a data set of N vectors c={x1,…, xN} drawn independently from the distribution p(x|q), the joint probability density of the whole data set c is given by ) ( ) | ( ) | ( 1 q q q c L p p N n n     x
  • 25. 3/29/2023 236875 Visual Recognition 25 Gaussian Mixture • L(q) can be viewed as a function of q for fixed c, in other words, it is the likelihood of q for the given c • The technique of maximum likelihood sets the value of q by maximizing L(q). • In practice, often, the negative logarithm of the likelihood is considered, and the minimum of E is found. • For normal distribution, the estimated parameters can be found by analytic differentiation of E: ) | ( ln ) ( ln 1 q q      N n n p L E x T x x x ) )( ( 1 1 1 1               S  n n N n N n n N N
  • 26. 3/29/2023 236875 Visual Recognition 26 Gaussian Mixture • Non-parametric methods – Histograms An illustration of the histogram approach to density estimation. The set of 30 sample data points are drawn from the sum of two normal distribution, with means 0.3 and 0.8, standard deviations 0.1 and amplitudes 0.7 and 0.3 respectively. The original distribution is shown by the dashed curve, and the histogram estimates are shown by the rectangular bins. The number M of histogram bins within the given interval determines the width of the bins, which in turn controls the smoothness of the estimated density.
  • 27. 3/29/2023 236875 Visual Recognition 27 Gaussian Mixture –Density estimation by basis functions, e.g., Kenel functions, or k-nn (a) kernel function, (b) K-nn Examples of kernel and K-nn approaches to density estimation.
  • 28. 3/29/2023 236875 Visual Recognition 28 • Discussions • Parametric approach assumes a specific form for the density function, which may be different from the true density, but • The density function can be evaluated rapidly for new input vectors • Non-parametric methods allows very general forms of density functions, thus the number of variables in the model grows directly with the number of training data points. •The model can not be rapidly evaluated for new input vectors • Mixture model is a combine of both: (1) not restricted to specific functional form, and (2) yet the size of the model only grows with the complexity of the problem being solved, not the size of the data set. Gaussian Mixture
  • 29. 3/29/2023 236875 Visual Recognition 29 Gaussian Mixture • The mixture model is a linear combination of component densities p(x| j ) in the form density l conditiona - class a as regarded be can ) | x ( hence and 1, x ) | x ( normalized are function density component the 1 ) ( 0 and , 1 ) ( , point data of parameters ixing the is ) ( ) ( ) | x ( ) x ( 1 1           j p d j p j P j P m j P j P j p p M j M j x
  • 30. 3/29/2023 236875 Visual Recognition 30 Gaussian Mixture • The key difference between the mixture model representation and a true classification problem lies in the nature of the training data, since in this case we are not provided with any “class labels” to say which component was responsible for generating each data point. • This is so called the representation of “incomplete data” • However, the technique of mixture modeling can be applied separately to each class-conditional density p(x|Ck) in a true classification problem. • In this case, each class-conditional density p(x|Ck) is represented by an independent mixture model of the form ) ( ) | x ( ) x ( 1 j P j p p M j   
  • 31. 3/29/2023 236875 Visual Recognition 31 Gaussian Mixture • Analog to conditional densities and using Bayes’ theorem, the posterior Probabilities of the component densities can be derived as • The value of P(j|x) represents the probability that a component j was responsible for generating the data point x. • Limited to the Gaussian distribution, each individual component densities are given by : • Determine the parameters of Gaussian Mixture methods: (1) maximum likelihood, (2) EM algorithm.     M j j P p j P j p j P 1 . 1 ) x | ( and , ) x ( ) ( ) | x ( ) x | ( . matrix e convarianc and mean a with , 2 exp ) 2 ( 1 ) | ( 2 j 2 2 2 / 2 I x x j j j j d j j p  S                  
  • 32. 3/29/2023 236875 Visual Recognition 32 Gaussian Mixture Representation of the mixture model in terms of a network diagram. For a component densities p(x|j), lines connecting the inputs xi to the component p(x|j) represents the elements ji of the corresponding mean vectors j of the component j.
  • 33. 3/29/2023 236875 Visual Recognition 33 Maximum likelihood • The mixture density contains adjustable parameters: P(j), j and j where j=1, …,M. • The negative log-likelihood for the data set {xn} is given by: • Maximizing the likelihood is then equivalent to minimizing E • Differentiation E with respect to –the centres j : –the variances j :                   N n M j n N n n j P j p p L E 1 1 1 ) ( ) ( ln ) ( ln ln x x       N n j n j n j j P E 1 2 ) ( ) (    x x                  N n j j n j n j d j P E 1 3 2 ) (     x x
  • 34. 3/29/2023 236875 Visual Recognition 34 •Minimizing of E with respect to to the mixing parameters P(j), must subject to the constraints S P(j) =1, and 0< P(j) <1. This can be alleviated by changing P(j) in terms a set of M auxiliary variables {gj} such that: • The transformation is called the softmax function, and • the minimization of E with respect to gj is •using chain rule in the form • then,         j j P M k k j g g g , ) exp( ) exp( ) ( 1 ), ( ) ( ) ( ) ( k P j P j P k P jk j      g j M k j k P k P E E g      g     ) ( ) ( 1        N n n j j P j P E 1 )} ( ) ( { x g Maximum likelihood
  • 35. 3/29/2023 236875 Visual Recognition 35    n n n n n j j P j P ) ( ) ( ˆ x x x  • Setting we obtain • Setting • Setting • These formulai give some insight of the maximum likelihood solution, they do not provide a direct method for calculating the parameters, i.e., these formulai are in terms of P(j|x). • They do suggest an iterative scheme for finding the minimal of E , 0     i E then , 0     j E     n n n j n n j j P j P d ) ( ˆ ) ( 1 ˆ 2 2 x x x   then , 0  g   j E    N n n j P N j P 1 ) ( 1 ) ( ˆ x Maximum likelihood
  • 36. 3/29/2023 236875 Visual Recognition 36 Maximum likelihood • we can make some initial guess for the parameters, and use these formula to compute a revised value of the parameters. • Then, using P(j|xn) to estimate new parameters, • Repeats these processes until converges ) ( compute to theorem Bayes' and ), ( , using - and , ) ( compute to and , using - n n n j| P |j p (j) P |j p x x x     
  • 37. 3/29/2023 236875 Visual Recognition 37 The EM algorithm  )  )           n n old n new old new p p E E x x ln • The iteration process consists of (1) expectation and (2) maximization steps, thus it is called EM algorithm. • We can write the change in error of E, in terms of old and new parameters by: • Using we can rewrite this as follows • Using Jensen’s inequality: given a set of numbers lj  0, • such that Sjlj=1, ) ( ) | x ( ) x ( 1 j P j p p M j     )  )  )  )  )                n n old n old n old j n new new old new j p j p p j p j P E E x x x x ln  )            j j j j j j x x ln ln l l
  • 38. 3/29/2023 236875 Visual Recognition 38               n j n old n old n new new n old old new j p p j p j P j p E E ) ( ) ( ) ( ) ( ln ) ( x x x x • Consider Pold(j|x) as lj, then the changes of E gives • Let Q = , then , and is an upper bound of Enew. • As shown in figure, minimizing Q will lead to a decrease of Enew, unless Enew is already at a local minimum.    S S  old j n p Q E E old new   Q Eold  Schematic plot of the error function E as a function of the new value qnew of one of the parameters of the mixture model. The curve Eold + Q(qnew) provides an upper bound on the value of E (qnew) and the EM algorithm involves finding the minimum value of this upper bound. The EM algorithm
  • 39. 3/29/2023 236875 Visual Recognition 39  )  )  )      n j n new new n old j p j P j p Q x x ln ~  )  )  )                  n j new j new j n new j new n old const d j P j p Q . 2 ln ln ~ 2 2    x x • Let’s drop terms in Q that depends on only old parameters, and rewrite Q as • the smallest value for the upper bound is found by minimizing this quantity • for the Gaussian mixture model, the quality can be • we can now minimize this function with respect to ‘new’ parameters, and they are: Q ~ Q ~  )  ) ,    n n old n n n old new j j P j P x x x   )  )  )     n n old n new j n n old new j j P j P d x x x 2 2 1   The EM algorithm
  • 40. 3/29/2023 236875 Visual Recognition 40  )             j new j P Q Z 1 ˆ l  )  ) l     n new n old j P j P x 0  )  )   n n old new j P N j P x 1 • For the mixing parameters Pnew (j), the constraint SjPnew (j)=1 can be considered by using the Lagrange multiplier l and minimizing the combined function • Setting the derivative of Z with respect to Pnew (j) to zero, • using SjPnew (j)=1 and SjPold (j|xn)=1, we obtain l = N, thus • Since the SjPold (j|xn) term is on the right side, thus this results are ready for iteration computation • Exercise 2: shown on the nets The EM algorithm