SlideShare une entreprise Scribd logo
1  sur  22
Chapter 9
Reviewer : Sunwoo Kim
Christopher M. Bishop
Pattern Recognition and Machine Learning
Yonsei University
Department of Applied Statistics
Chapter 9. Mixture models and EM
2
Use of latent variable & clustering
Consider we have joint distribution of 𝑝(𝑥, 𝑧).
We can get marginal distribution of 𝑝(𝑥) by marginalizing latent variable 𝑧 from the full distribution.
Sometimes, its useful and more convenient to use latent variable 𝒛 here.
In this chapter, we are going to cover some discrete distribution with latent variables.
Keywords are…
1. K-Means Clustering
2. Gaussian Mixture model
3. Expectation Maximization
In fact, we are optimizing parameters of gaussian mixture by using EM algorithm.
Details will be covered soon!
We all are familiar with the idea of clustering! So let’s cover K-means straightly!
Chapter 9.1. K-Means Clustering
3
Theoretical Idea
Procedure of K-Means are so much familiar to us. It was covered in multivariate analysis, data mining, etc…
Here, let’s think K-Means on the perspective of optimization.
Let data points 𝑋𝑁, and k be index of cluster. Each center of cluster can be 𝜇𝑘.
If 𝑋𝑁 belongs to the 𝑘𝑡ℎ
cluster, then 𝑟𝑛𝑘 = 1. O. W. r𝑛𝑗 = 0(𝑗 ≠ 𝑘)
Here, sum of square distance can be
Here, we are trying to minimize overall distance 𝐽. Since function is a quadratic form, straight closed form gives a
global minimum for loss.
In fact, we don’t know 𝑟𝑛𝑘 and 𝜇𝑘 value. Thus, let’s estimate them by minimizing sum of square distance 𝑱.
Here, finding 𝑟𝑛𝑘 is obvious since 𝒌 which yields minimum value of ||𝑿𝒏 − 𝝁𝒌|| is optimal choice for minimizing.
Now, let’s find optimal value for 𝜇𝑘. We can get derivative with respect to 𝜇𝑘.
Chapter 9.1. K-Means Clustering
4
Implemetation
1. It is important to choose the appropriate initial values for 𝜇𝑘.
2. We can also use sequential update by
3. There is a general version of K-Means, the K-medoid algorithm.
4. We straightly assign data to only one specific cluster. This is called a ‘hard
assignment’!
5. Furthermore, we can apply K-Means to the image segmentation task!
** Image segmentation : Process of partitioning a digital image into multiple
segments, to simplify or change image to a simple, meaningful something.
Chapter 9.2. Mixtures of Gaussians
5
Implementation
https://towardsdatascience.com/gaussian-
mixture-models-explained-6986aaf5a95
Consider distribution of multi-mode gaussian!
We have already seen this distribution in chapter 2.
Here, let’s focus on parameter optimization!
Consider distribution of multi-mode gaussian!
We have already seen this distribution in chapter 2.
Here, let’s focus on parameter optimization!
𝒑 𝑿 = 𝒑 𝑿 𝒁 𝒑(𝒁)
Chapter 9.2. Mixtures of Gaussians
6
Implementation
Then how can we use these probabilities to clustering?
We have to assign each values to a specific cluster when we a data is given. Thus, probability becomes
This 𝛾(𝑧𝑘) can be viewed as the responsibility!
Furthermore, we can generate random samples from this distribution.
Detail will be covered in chapter 11!
Note that left figure illustrates how estimated clusters fit the original
ground-truth.
Chapter 9.2. Mixtures of Gaussians
7
Maximum likelihood
Suppose we have a data set of observations {𝑋1, 𝑋2, … , 𝑋𝑁}.
Then, from 𝑝(𝑋), we can define joint probability of dataset. To express product as summation, we can take log!
In this case, we can think of collapse in a data, which is related to a singularity.
Suppose certain data exactly matches the mean of the data. That is 𝑋𝑁 = 𝜇𝑗. Then, probability becomes
When there exist only one data that belongs to that component, 𝜎𝑗 → 0 due to the 𝑋𝑁 = 𝜇𝑗. Then, 𝑵(𝑿𝒏|𝑿𝒏, 𝝈𝒋
𝟐
𝑰) goes infinity.
Here, log likelihood also gets infinity, and we cannot derive appropriate solution!
This does not occur in uni-mode gaussian (common gaussian), because every data belongs to one distribution, and such collapsing issue
is being cancelled out between data.
To overcome this issue, we can use such heuristic methods…
1. Randomly re-setting mean when such issue occurs.
2. Randomly set covariance to be bigger than 0.
Chapter 9.2. Mixtures of Gaussians
8
EM for Gaussian mixtures
We defined parameters and likelihood. Now, we have to find how to optimize such parameters!
Here we use EM(Expectation Maximization) to get desired parameters. General version of EM will be covered soon. Here, let’s take a look how does it
being applied to gaussian mixture.
First, let’s find estimation of 𝜇𝑘 by
𝜕 ln 𝑝 𝑋 𝜋, 𝜇, Σ
𝜕𝜇𝑘
= 0,
Second, let’s find estimation of Σ𝑘 by
𝜕 ln 𝑝 𝑋 𝜋, 𝜇, Σ
𝜕Σ𝑘
= 0,
Last, let’s find estimation of 𝜋𝑘 by
𝜕 ln 𝑝 𝑋 𝜋, 𝜇, Σ
𝜕𝜋𝑘
= 0,
Chapter 9.2. Mixtures of Gaussians
9
EM for Gaussian mixtures
Okay! We’ve got solution of gaussian mixture model! Finished! (Is it???)
Not really. Because right-hand side of equation contains parameter itself!
Here, 𝜸(𝒛𝒏𝒌) also contains 𝝁𝒌. It is like
𝜽𝒊 =
𝟏
𝑵
∑𝜽𝒋𝒙𝒋
Here, EM algorithm occurs in an iterative way!
EM consists of expectation step (E-Step)
And maximization step (M-Step)
Consider we are now in 𝑡 − 𝑡𝑖𝑚𝑒. Thus, we have 𝜇𝑘
𝑡
, Σ𝑘
𝑡
, 𝜋𝑘
(𝑡)
.
In E-Step, we are calculating each 𝛾 𝑧𝑛𝑘
(𝑡)
and other distributional values
by plugging 𝜇𝑘
𝑡
, Σ𝑘
𝑡
, 𝜋𝑘
(𝑡)
values in.
In M-Step, we are updating each parameters by aforementioned equations.
For example, 𝜇𝑘
(𝑡+1)
=
1
𝑁𝑘
(𝑡) ∑𝑛 𝛾 𝑧𝑛𝑘
(𝑡)
𝑋𝑛
Chapter 9.2. Mixtures of Gaussians
10
EM for Gaussian mixtures
Overall process can be written as…
Initial values of 𝜇𝑘
0
, Σ𝑘
0
, 𝜋𝑘
(0)
can be computed by using K-Means.
Here, usually GMM needs much more iteration than K-Means.
Now, let’s focus more on fundamental notion of EM algorithm!
Chapter 9.3. An Alternative View of EM
11
General equation of likelihood
Here, summation exist inside the log, which makes calculation much harder than when summation
is outside the log.
Now, there are terms called ‘complete dataset’ and ‘incomplete dataset’.
If we observe 𝑿 and 𝒁 together, it is called ‘complete dataset’. That is, we intrinsically know which data belongs to which cluster!
Obviously, this is ridiculous. We only observe 𝑿, which is called a ‘incomplete dataset’
For now, consider we observed both 𝑍 and 𝑋. Then here, we don’t need to compute assign certain data to specific cluster, etc…
Our goal is to estimate general parameter 𝜃. We don’t assume any distribution or shape of 𝜃. It is in a general condition!
Here again we get estimation in an iterative way!
Here, 𝜃 gives effect on latent variable 𝑧. Thus, expectation of ln 𝑝(𝑋, 𝑍|𝜃) can be defined as
Chapter 9.3. An Alternative View of EM
12
General equation of likelihood
In fact, we do not observe Z, thus we are using expectation of it.
Here, we can integrate prior and gets MAP by simply changing
expectation of ln 𝑝(𝑋, 𝑍|𝜃) to
𝑄 𝜃, 𝜃𝑜𝑙𝑑
+ ln 𝑝(𝜃)
Now, let’s move on to the example of EM algorithm and applying it to
the general case of various models.
First, take a look at gaussian mixture!
Complete data can be expressed as
Chapter 9.3. An Alternative View of EM
13
Gaussian mixtures revisited
Joint pdf of gaussian mixture with complete data can be expressed as
Left shows incomplete, and upper shows complete data.
As you can see, summation and log has been interchanged!
Since summation exist outside the log, estimation of parameter is extremely easy!
Now, let’s think of the expectation of latent variable.
Here, we can express joint pdf in much simpler form by
Chapter 9.3. An Alternative View of EM
14
Gaussian mixtures revisited
That is, 𝑍𝑘 only depends on 𝑋𝑘, rest 𝑋1 … 𝑋𝑘−1, 𝑋𝑘+1 … does not give influence.
By using equation 𝑝 𝑍|𝑋 =
𝑝 𝑍 𝑝 𝑋 𝑍
𝑝(𝑋)
and the fact of conditional independence, we can compute expectation of latent variable as
Furthermore, we can compute other parameters iteratively by maximizing above
expectation of joint probability!
Hard assignment K-Means can also be shown in this manner.
Assume Gaussian distribution for each data. Then
If we assume variance term 𝜖 → 0, this 𝛾 𝑧𝑛𝑘 → 1.
Thus, assignment becomes
Chapter 9.3. An Alternative View of EM
15
Bernoulli distribution
We can apply latent variable and EM algorithm to Bernoulli distribution either.
This is also called as ‘latent class analysis.’ This gives foundation for a hidden Markov models over discrete variables.
That is, we considered there exist one probability of Bernoulli. That is, x becomes 1 0 0 0 … 0 − 𝐷 − 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛
Here, probability can be defined as
I think it’s important to capture the general idea of mixture Bernoulli.
Here, we are going to consider latent variable in a Bernoulli!
Only one variable survives, but
it is expressed by linear sum of
values in a specific column!
Chapter 9.3. An Alternative View of EM
16
Bernoulli distribution
As you all can see in the covariance term, mixture Bernoulli can capture the relation ship between variables!
That is, the correlation between 𝑿𝟏 = (𝑿𝟏𝟏, 𝑿𝟏𝟐, 𝑿𝟏𝟑, … , 𝑿𝟏𝒑 ) can be captured by using this model! Here also we consider complete data.
Previous case Now
Chapter 9.3. An Alternative View of EM
17
Bernoulli distribution
Here again we define expectation of latent variable (responsibilities) by.
By using it, we can again derive for the parameters of
Parameters can be estimated by
Before moving on, let’s get this straight!
What we did in this section 9.3. is to get the estimation under the condition ‘complete data’.
Here, E-step which finds expectation was to compute
Chapter 9.3. An Alternative View of EM
18
EM for Bayesian linear regression
We can apply EM algorithm to the Bayesian linear regression. We use it in evidence approximation, where we find nuisance parameters
We marginalized out parameter 𝒘 to obtain the desired predictive distribution. Thus, we think parameter 𝒘 as latent variable.
Where large M
denotes the number of
variables in a
regression!
They are not equal,
But they do
converge to be same!
Chapter 9.4. The EM Algorithm in General
19
General idea of EM
What exactly is EM?
EM is a general technique for finding maximum likelihood solutions for probabilistic models having latent variables!
Please focus on our goal.
Our goal is to maximize the likelihood function that is given by
Here, let 𝑞(𝑍) of be the probability of mere 𝑍 itself. Regardless of 𝑞(𝑍), following decomposition holds true.
Now, from this equation, we are going to study why EM algorithm increases the likelihood.
Please keep in mind that 𝑍 is a latent variable and 𝜃 is a parameter we are trying to estimate.
Chapter 9.4. The EM Algorithm in General
20
1st : KL-Divergence and E-Step
As you all know, KL-divergence is non-negative (𝐾𝐿(𝑞| 𝑝 ≥ 0)
If we move ℒ to left hand side of the equation, we can simply get ln 𝑝(𝑋|𝜃) − ℒ 𝑞, 𝜃 = 𝐾𝐿 ≥ 0
We can see 𝑝 𝑋 𝜃 ≥ ℒ(𝑞, 𝜃), which means ℒ 𝑞, 𝜃 gives a lower bound of our objective function.
As we’ve seen in previous sections, we hold 𝜃(𝑜𝑙𝑑)
and find expectation values of 𝑍.
For E-step, we are maximizing ℒ(𝑞, 𝜃) with respect to 𝑍. Under the fixed ln 𝑝(𝑋|𝜃), only way to maximize ℒ(𝑞, 𝜃) is to make KL value equal to zero.
Which means, 𝑞 𝑍 = 𝑝(𝑍|𝑋, 𝜃(𝑜𝑙𝑑)
). We are making posterior equal to prior!
General framework of EM E-Step of EM
Chapter 9.4. The EM Algorithm in General
21
2nd : Finding new 𝜽 in M-Step
As you can see, we are interested in θ for M-Step. Thus, we can get 𝒬(𝜃, 𝜃𝑜𝑙𝑑
), which we used in previous sections.
Now we are having new type of function, and we can derive a new 𝜽 which gives a bigger likelihood! This is a M-Step.
This figure gives great explanation to overall
phenomenon. As we compute blue curve, we
move on to the 𝜃 value which gives maximum
value of blue one. Then, green curve forms
new posterior which is much greater than
previous blue curve!
Likewise, we sequentially update 𝜽 and
corresponding distribution to get desired
maximized value.
Chapter 9.4. The EM Algorithm in General
22
Related examples
For particular case of an i.i.d. dataset, we can re-write 𝑝(𝑍|𝑋, 𝜃) by
This means responsibility for each data only depends on data 𝑋𝑛.
Other variables in dataset does not give influence in computing responsibility.
It is totally okay to compute related variables only.
EM algorithm can also be applied to update posterior distribution!
EM algorithm is efficient in many optimization issues, but there still exist some difficulties in some tasks. Breakthroughs can be
Generalized Expectation Maximization (GEM).
Choice of good 𝑞𝜃(𝑍), using only one single data point in order to update corresponding parameters, etc…
There are many extensions of expectation maximization!

Contenu connexe

Tendances

Markov decision process
Markov decision processMarkov decision process
Markov decision process
Hamed Abdi
 
Expectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture ModelsExpectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture Models
petitegeek
 

Tendances (20)

PRML Chapter 7
PRML Chapter 7PRML Chapter 7
PRML Chapter 7
 
Variational Inference
Variational InferenceVariational Inference
Variational Inference
 
Activation function
Activation functionActivation function
Activation function
 
개념 이해가 쉬운 Variational Autoencoder (VAE)
개념 이해가 쉬운 Variational Autoencoder (VAE)개념 이해가 쉬운 Variational Autoencoder (VAE)
개념 이해가 쉬운 Variational Autoencoder (VAE)
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Back propagation
Back propagationBack propagation
Back propagation
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?
 
Markov decision process
Markov decision processMarkov decision process
Markov decision process
 
PRML 5.5
PRML 5.5PRML 5.5
PRML 5.5
 
boosting algorithm
boosting algorithmboosting algorithm
boosting algorithm
 
GAN - Generative Adversarial Nets
GAN - Generative Adversarial NetsGAN - Generative Adversarial Nets
GAN - Generative Adversarial Nets
 
Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...
Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...
Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...
 
Explaining the idea behind automatic relevance determination and bayesian int...
Explaining the idea behind automatic relevance determination and bayesian int...Explaining the idea behind automatic relevance determination and bayesian int...
Explaining the idea behind automatic relevance determination and bayesian int...
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron ClassifiersArtificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
 
Multi Objective Optimization and Pareto Multi Objective Optimization with cas...
Multi Objective Optimization and Pareto Multi Objective Optimization with cas...Multi Objective Optimization and Pareto Multi Objective Optimization with cas...
Multi Objective Optimization and Pareto Multi Objective Optimization with cas...
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Multi Objective Optimization
Multi Objective OptimizationMulti Objective Optimization
Multi Objective Optimization
 
A Gentle Introduction to the EM Algorithm
A Gentle Introduction to the EM AlgorithmA Gentle Introduction to the EM Algorithm
A Gentle Introduction to the EM Algorithm
 
Expectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture ModelsExpectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture Models
 

Similaire à PRML Chapter 9

Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximization
butest
 
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Mengxi Jiang
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)
NYversity
 
Part 2: Unsupervised Learning Machine Learning Techniques
Part 2: Unsupervised Learning Machine Learning Techniques Part 2: Unsupervised Learning Machine Learning Techniques
Part 2: Unsupervised Learning Machine Learning Techniques
butest
 

Similaire à PRML Chapter 9 (20)

PRML Chapter 10
PRML Chapter 10PRML Chapter 10
PRML Chapter 10
 
PRML Chapter 4
PRML Chapter 4PRML Chapter 4
PRML Chapter 4
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximization
 
PRML Chapter 6
PRML Chapter 6PRML Chapter 6
PRML Chapter 6
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
 
Bootcamp of new world to taken seriously
Bootcamp of new world to taken seriouslyBootcamp of new world to taken seriously
Bootcamp of new world to taken seriously
 
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionAdapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
 
Quantum Deep Learning
Quantum Deep LearningQuantum Deep Learning
Quantum Deep Learning
 
Data mining Part 1
Data mining Part 1Data mining Part 1
Data mining Part 1
 
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
 
Chapter 18,19
Chapter 18,19Chapter 18,19
Chapter 18,19
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
Forecasting day ahead power prices in germany using fixed size least squares ...
Forecasting day ahead power prices in germany using fixed size least squares ...Forecasting day ahead power prices in germany using fixed size least squares ...
Forecasting day ahead power prices in germany using fixed size least squares ...
 
A Computationally Efficient Algorithm to Solve Generalized Method of Moments ...
A Computationally Efficient Algorithm to Solve Generalized Method of Moments ...A Computationally Efficient Algorithm to Solve Generalized Method of Moments ...
A Computationally Efficient Algorithm to Solve Generalized Method of Moments ...
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
 
Linear logisticregression
Linear logisticregressionLinear logisticregression
Linear logisticregression
 
Dec 14 - R2
Dec 14 - R2Dec 14 - R2
Dec 14 - R2
 
Unit III_Ch 17_Probablistic Methods.pptx
Unit III_Ch 17_Probablistic Methods.pptxUnit III_Ch 17_Probablistic Methods.pptx
Unit III_Ch 17_Probablistic Methods.pptx
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)
 
Part 2: Unsupervised Learning Machine Learning Techniques
Part 2: Unsupervised Learning Machine Learning Techniques Part 2: Unsupervised Learning Machine Learning Techniques
Part 2: Unsupervised Learning Machine Learning Techniques
 

Dernier

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 

Dernier (20)

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service AvailableVastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime GiridihGiridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 

PRML Chapter 9

  • 1. Chapter 9 Reviewer : Sunwoo Kim Christopher M. Bishop Pattern Recognition and Machine Learning Yonsei University Department of Applied Statistics
  • 2. Chapter 9. Mixture models and EM 2 Use of latent variable & clustering Consider we have joint distribution of 𝑝(𝑥, 𝑧). We can get marginal distribution of 𝑝(𝑥) by marginalizing latent variable 𝑧 from the full distribution. Sometimes, its useful and more convenient to use latent variable 𝒛 here. In this chapter, we are going to cover some discrete distribution with latent variables. Keywords are… 1. K-Means Clustering 2. Gaussian Mixture model 3. Expectation Maximization In fact, we are optimizing parameters of gaussian mixture by using EM algorithm. Details will be covered soon! We all are familiar with the idea of clustering! So let’s cover K-means straightly!
  • 3. Chapter 9.1. K-Means Clustering 3 Theoretical Idea Procedure of K-Means are so much familiar to us. It was covered in multivariate analysis, data mining, etc… Here, let’s think K-Means on the perspective of optimization. Let data points 𝑋𝑁, and k be index of cluster. Each center of cluster can be 𝜇𝑘. If 𝑋𝑁 belongs to the 𝑘𝑡ℎ cluster, then 𝑟𝑛𝑘 = 1. O. W. r𝑛𝑗 = 0(𝑗 ≠ 𝑘) Here, sum of square distance can be Here, we are trying to minimize overall distance 𝐽. Since function is a quadratic form, straight closed form gives a global minimum for loss. In fact, we don’t know 𝑟𝑛𝑘 and 𝜇𝑘 value. Thus, let’s estimate them by minimizing sum of square distance 𝑱. Here, finding 𝑟𝑛𝑘 is obvious since 𝒌 which yields minimum value of ||𝑿𝒏 − 𝝁𝒌|| is optimal choice for minimizing. Now, let’s find optimal value for 𝜇𝑘. We can get derivative with respect to 𝜇𝑘.
  • 4. Chapter 9.1. K-Means Clustering 4 Implemetation 1. It is important to choose the appropriate initial values for 𝜇𝑘. 2. We can also use sequential update by 3. There is a general version of K-Means, the K-medoid algorithm. 4. We straightly assign data to only one specific cluster. This is called a ‘hard assignment’! 5. Furthermore, we can apply K-Means to the image segmentation task! ** Image segmentation : Process of partitioning a digital image into multiple segments, to simplify or change image to a simple, meaningful something.
  • 5. Chapter 9.2. Mixtures of Gaussians 5 Implementation https://towardsdatascience.com/gaussian- mixture-models-explained-6986aaf5a95 Consider distribution of multi-mode gaussian! We have already seen this distribution in chapter 2. Here, let’s focus on parameter optimization! Consider distribution of multi-mode gaussian! We have already seen this distribution in chapter 2. Here, let’s focus on parameter optimization! 𝒑 𝑿 = 𝒑 𝑿 𝒁 𝒑(𝒁)
  • 6. Chapter 9.2. Mixtures of Gaussians 6 Implementation Then how can we use these probabilities to clustering? We have to assign each values to a specific cluster when we a data is given. Thus, probability becomes This 𝛾(𝑧𝑘) can be viewed as the responsibility! Furthermore, we can generate random samples from this distribution. Detail will be covered in chapter 11! Note that left figure illustrates how estimated clusters fit the original ground-truth.
  • 7. Chapter 9.2. Mixtures of Gaussians 7 Maximum likelihood Suppose we have a data set of observations {𝑋1, 𝑋2, … , 𝑋𝑁}. Then, from 𝑝(𝑋), we can define joint probability of dataset. To express product as summation, we can take log! In this case, we can think of collapse in a data, which is related to a singularity. Suppose certain data exactly matches the mean of the data. That is 𝑋𝑁 = 𝜇𝑗. Then, probability becomes When there exist only one data that belongs to that component, 𝜎𝑗 → 0 due to the 𝑋𝑁 = 𝜇𝑗. Then, 𝑵(𝑿𝒏|𝑿𝒏, 𝝈𝒋 𝟐 𝑰) goes infinity. Here, log likelihood also gets infinity, and we cannot derive appropriate solution! This does not occur in uni-mode gaussian (common gaussian), because every data belongs to one distribution, and such collapsing issue is being cancelled out between data. To overcome this issue, we can use such heuristic methods… 1. Randomly re-setting mean when such issue occurs. 2. Randomly set covariance to be bigger than 0.
  • 8. Chapter 9.2. Mixtures of Gaussians 8 EM for Gaussian mixtures We defined parameters and likelihood. Now, we have to find how to optimize such parameters! Here we use EM(Expectation Maximization) to get desired parameters. General version of EM will be covered soon. Here, let’s take a look how does it being applied to gaussian mixture. First, let’s find estimation of 𝜇𝑘 by 𝜕 ln 𝑝 𝑋 𝜋, 𝜇, Σ 𝜕𝜇𝑘 = 0, Second, let’s find estimation of Σ𝑘 by 𝜕 ln 𝑝 𝑋 𝜋, 𝜇, Σ 𝜕Σ𝑘 = 0, Last, let’s find estimation of 𝜋𝑘 by 𝜕 ln 𝑝 𝑋 𝜋, 𝜇, Σ 𝜕𝜋𝑘 = 0,
  • 9. Chapter 9.2. Mixtures of Gaussians 9 EM for Gaussian mixtures Okay! We’ve got solution of gaussian mixture model! Finished! (Is it???) Not really. Because right-hand side of equation contains parameter itself! Here, 𝜸(𝒛𝒏𝒌) also contains 𝝁𝒌. It is like 𝜽𝒊 = 𝟏 𝑵 ∑𝜽𝒋𝒙𝒋 Here, EM algorithm occurs in an iterative way! EM consists of expectation step (E-Step) And maximization step (M-Step) Consider we are now in 𝑡 − 𝑡𝑖𝑚𝑒. Thus, we have 𝜇𝑘 𝑡 , Σ𝑘 𝑡 , 𝜋𝑘 (𝑡) . In E-Step, we are calculating each 𝛾 𝑧𝑛𝑘 (𝑡) and other distributional values by plugging 𝜇𝑘 𝑡 , Σ𝑘 𝑡 , 𝜋𝑘 (𝑡) values in. In M-Step, we are updating each parameters by aforementioned equations. For example, 𝜇𝑘 (𝑡+1) = 1 𝑁𝑘 (𝑡) ∑𝑛 𝛾 𝑧𝑛𝑘 (𝑡) 𝑋𝑛
  • 10. Chapter 9.2. Mixtures of Gaussians 10 EM for Gaussian mixtures Overall process can be written as… Initial values of 𝜇𝑘 0 , Σ𝑘 0 , 𝜋𝑘 (0) can be computed by using K-Means. Here, usually GMM needs much more iteration than K-Means. Now, let’s focus more on fundamental notion of EM algorithm!
  • 11. Chapter 9.3. An Alternative View of EM 11 General equation of likelihood Here, summation exist inside the log, which makes calculation much harder than when summation is outside the log. Now, there are terms called ‘complete dataset’ and ‘incomplete dataset’. If we observe 𝑿 and 𝒁 together, it is called ‘complete dataset’. That is, we intrinsically know which data belongs to which cluster! Obviously, this is ridiculous. We only observe 𝑿, which is called a ‘incomplete dataset’ For now, consider we observed both 𝑍 and 𝑋. Then here, we don’t need to compute assign certain data to specific cluster, etc… Our goal is to estimate general parameter 𝜃. We don’t assume any distribution or shape of 𝜃. It is in a general condition! Here again we get estimation in an iterative way! Here, 𝜃 gives effect on latent variable 𝑧. Thus, expectation of ln 𝑝(𝑋, 𝑍|𝜃) can be defined as
  • 12. Chapter 9.3. An Alternative View of EM 12 General equation of likelihood In fact, we do not observe Z, thus we are using expectation of it. Here, we can integrate prior and gets MAP by simply changing expectation of ln 𝑝(𝑋, 𝑍|𝜃) to 𝑄 𝜃, 𝜃𝑜𝑙𝑑 + ln 𝑝(𝜃) Now, let’s move on to the example of EM algorithm and applying it to the general case of various models. First, take a look at gaussian mixture! Complete data can be expressed as
  • 13. Chapter 9.3. An Alternative View of EM 13 Gaussian mixtures revisited Joint pdf of gaussian mixture with complete data can be expressed as Left shows incomplete, and upper shows complete data. As you can see, summation and log has been interchanged! Since summation exist outside the log, estimation of parameter is extremely easy! Now, let’s think of the expectation of latent variable. Here, we can express joint pdf in much simpler form by
  • 14. Chapter 9.3. An Alternative View of EM 14 Gaussian mixtures revisited That is, 𝑍𝑘 only depends on 𝑋𝑘, rest 𝑋1 … 𝑋𝑘−1, 𝑋𝑘+1 … does not give influence. By using equation 𝑝 𝑍|𝑋 = 𝑝 𝑍 𝑝 𝑋 𝑍 𝑝(𝑋) and the fact of conditional independence, we can compute expectation of latent variable as Furthermore, we can compute other parameters iteratively by maximizing above expectation of joint probability! Hard assignment K-Means can also be shown in this manner. Assume Gaussian distribution for each data. Then If we assume variance term 𝜖 → 0, this 𝛾 𝑧𝑛𝑘 → 1. Thus, assignment becomes
  • 15. Chapter 9.3. An Alternative View of EM 15 Bernoulli distribution We can apply latent variable and EM algorithm to Bernoulli distribution either. This is also called as ‘latent class analysis.’ This gives foundation for a hidden Markov models over discrete variables. That is, we considered there exist one probability of Bernoulli. That is, x becomes 1 0 0 0 … 0 − 𝐷 − 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛 Here, probability can be defined as I think it’s important to capture the general idea of mixture Bernoulli. Here, we are going to consider latent variable in a Bernoulli! Only one variable survives, but it is expressed by linear sum of values in a specific column!
  • 16. Chapter 9.3. An Alternative View of EM 16 Bernoulli distribution As you all can see in the covariance term, mixture Bernoulli can capture the relation ship between variables! That is, the correlation between 𝑿𝟏 = (𝑿𝟏𝟏, 𝑿𝟏𝟐, 𝑿𝟏𝟑, … , 𝑿𝟏𝒑 ) can be captured by using this model! Here also we consider complete data. Previous case Now
  • 17. Chapter 9.3. An Alternative View of EM 17 Bernoulli distribution Here again we define expectation of latent variable (responsibilities) by. By using it, we can again derive for the parameters of Parameters can be estimated by Before moving on, let’s get this straight! What we did in this section 9.3. is to get the estimation under the condition ‘complete data’. Here, E-step which finds expectation was to compute
  • 18. Chapter 9.3. An Alternative View of EM 18 EM for Bayesian linear regression We can apply EM algorithm to the Bayesian linear regression. We use it in evidence approximation, where we find nuisance parameters We marginalized out parameter 𝒘 to obtain the desired predictive distribution. Thus, we think parameter 𝒘 as latent variable. Where large M denotes the number of variables in a regression! They are not equal, But they do converge to be same!
  • 19. Chapter 9.4. The EM Algorithm in General 19 General idea of EM What exactly is EM? EM is a general technique for finding maximum likelihood solutions for probabilistic models having latent variables! Please focus on our goal. Our goal is to maximize the likelihood function that is given by Here, let 𝑞(𝑍) of be the probability of mere 𝑍 itself. Regardless of 𝑞(𝑍), following decomposition holds true. Now, from this equation, we are going to study why EM algorithm increases the likelihood. Please keep in mind that 𝑍 is a latent variable and 𝜃 is a parameter we are trying to estimate.
  • 20. Chapter 9.4. The EM Algorithm in General 20 1st : KL-Divergence and E-Step As you all know, KL-divergence is non-negative (𝐾𝐿(𝑞| 𝑝 ≥ 0) If we move ℒ to left hand side of the equation, we can simply get ln 𝑝(𝑋|𝜃) − ℒ 𝑞, 𝜃 = 𝐾𝐿 ≥ 0 We can see 𝑝 𝑋 𝜃 ≥ ℒ(𝑞, 𝜃), which means ℒ 𝑞, 𝜃 gives a lower bound of our objective function. As we’ve seen in previous sections, we hold 𝜃(𝑜𝑙𝑑) and find expectation values of 𝑍. For E-step, we are maximizing ℒ(𝑞, 𝜃) with respect to 𝑍. Under the fixed ln 𝑝(𝑋|𝜃), only way to maximize ℒ(𝑞, 𝜃) is to make KL value equal to zero. Which means, 𝑞 𝑍 = 𝑝(𝑍|𝑋, 𝜃(𝑜𝑙𝑑) ). We are making posterior equal to prior! General framework of EM E-Step of EM
  • 21. Chapter 9.4. The EM Algorithm in General 21 2nd : Finding new 𝜽 in M-Step As you can see, we are interested in θ for M-Step. Thus, we can get 𝒬(𝜃, 𝜃𝑜𝑙𝑑 ), which we used in previous sections. Now we are having new type of function, and we can derive a new 𝜽 which gives a bigger likelihood! This is a M-Step. This figure gives great explanation to overall phenomenon. As we compute blue curve, we move on to the 𝜃 value which gives maximum value of blue one. Then, green curve forms new posterior which is much greater than previous blue curve! Likewise, we sequentially update 𝜽 and corresponding distribution to get desired maximized value.
  • 22. Chapter 9.4. The EM Algorithm in General 22 Related examples For particular case of an i.i.d. dataset, we can re-write 𝑝(𝑍|𝑋, 𝜃) by This means responsibility for each data only depends on data 𝑋𝑛. Other variables in dataset does not give influence in computing responsibility. It is totally okay to compute related variables only. EM algorithm can also be applied to update posterior distribution! EM algorithm is efficient in many optimization issues, but there still exist some difficulties in some tasks. Breakthroughs can be Generalized Expectation Maximization (GEM). Choice of good 𝑞𝜃(𝑍), using only one single data point in order to update corresponding parameters, etc… There are many extensions of expectation maximization!