The Future of Software Development - Devin AI Innovative Approach.pdf
RJMCMC in clustering
1. .
.
Clustering by mixture model
Pham The Thong
April 22, 2011
Pham The Thong ( ) Clustering by mixture model April 22, 2011 1 / 44
2. Outline
.
1 RJMCMC in clustering
Clustering overview
Reversible Jump MCMC
.
2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
Unknown Number of Components
Overview
Split/Merge and Birth/Death Mechanism
Algorithm
Result
.
3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
High-Dimensional Data
Overview
Variable Selection
RJMCMC Mechanism
Result
Weakness of the model
Pham The Thong ( ) Clustering by mixture model April 22, 2011 2 / 44
3. RJMCMC in clustering Clustering overview
Outline
.
1 RJMCMC in clustering
Clustering overview
Reversible Jump MCMC
.
2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
Unknown Number of Components
Overview
Split/Merge and Birth/Death Mechanism
Algorithm
Result
.
3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
High-Dimensional Data
Overview
Variable Selection
RJMCMC Mechanism
Result
Weakness of the model
Pham The Thong ( ) Clustering by mixture model April 22, 2011 3 / 44
4. RJMCMC in clustering Clustering overview
Clustering overview
Divide the observations into groups.
Predict group of a new observation.
Model-based clustering: select a probabilistic model
that underlying the observations and make
statistical inferences based on that model. One
popular model is the mixture model.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 4 / 44
5. RJMCMC in clustering Clustering overview
Clustering via mixture model
X = (x1 , · · · , xn ) be independent p-dimensional
observations from G populations.
∑
G
f (xi |w, θ) = wk f (xi |θk )
k=1
f (xi |θk ) is the density of an observation xi from the kth
component.
w = (w1 , · · · , wG )T are component weights.
θ = (θ1 , · · · , θG )T are component parameters.
Clustering is done via allocation vector
y = (y1 , · · · , yn )T : yi = k if the ith observation xi comes
from component k.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 5 / 44
6. RJMCMC in clustering Clustering overview
Some approaches
Model Selection: Compare some model selection
criteria of fixed-G models for various values of G to
choose the best G . Inference on fixed-G model is
often done via EM algorithm or Gibbs sampler.
Nonparametric method: Use Dirichlet Process.
Trans-dimensional Markov Chain Monte Carlo
(MCMC): Allow G to be changed during the
inference process by combining Gibbs sampler with
MCMC moves that can change dimension of the
model. Reversible jump MCMC (RJMCMC) is one
possible scheme.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 6 / 44
7. RJMCMC in clustering Reversible Jump MCMC
Outline
.
1 RJMCMC in clustering
Clustering overview
Reversible Jump MCMC
.
2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
Unknown Number of Components
Overview
Split/Merge and Birth/Death Mechanism
Algorithm
Result
.
3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
High-Dimensional Data
Overview
Variable Selection
RJMCMC Mechanism
Result
Weakness of the model
Pham The Thong ( ) Clustering by mixture model April 22, 2011 7 / 44
8. RJMCMC in clustering Reversible Jump MCMC
Overview
First developed in Green(1995)
Has applications ranged well beyond mixture model
analysis.
Mixture model analysis power first demonstrated in
Richardson&Green(1997). They considered only the
1-dimensional case.
Applied to multidimensional setting in Tadesse et.al.
(2005).
Pham The Thong ( ) Clustering by mixture model April 22, 2011 8 / 44
9. RJMCMC in clustering Reversible Jump MCMC
Some advantages of clustering by
RJMCMC
Avoid the task of model selection.
Provide a coherent Bayesian framework. The cluster
number G is not treated as a special parameter.
Can provide useful summary of data which is
difficult to obtain by other methods.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 9 / 44
10. RJMCMC in clustering Reversible Jump MCMC
General ideas of RJMCMC I
Simulating a Markov Chain that converges to the
full posterior distribution p(G , y, w, θ|X).
Hybrid sampler consist of Gibbs Sampler(the base)
and jump moves (the extension).
Gibbs sampler will sample (y, w, θ). Jump moves
will sample the cluster number G .
The jump moves come in pair: Split/Merge and
Birth/Death
Pham The Thong ( ) Clustering by mixture model April 22, 2011 10 / 44
11. RJMCMC in clustering Reversible Jump MCMC
General ideas of RJMCMC II
Split move: split one component into two
components.
Merge move: combine two components into one
component.
Birth move: create an empty component.
Death move: delete an empty component.
At each iteration, propose to perform Split(Birth)
move with some fixed probability bk and with
probability 1 − bk propose to perform Merge(Death)
move.
In one proposal, calculate all the changes to the
model as if the move was made.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 11 / 44
12. RJMCMC in clustering Reversible Jump MCMC
General ideas of RJMCMC III
Calculate the acceptance probability A, which is the
product of three terms:
the ratio of the posterior of the new model to that of the
old model
the ratio of the probability of the way to go from the
new model back to the old model to that of the way to
go from old model to new model
the Jacobian arises from the change of dimension
To ensure convergence to the desired distribution,
only actually carry out the move with probability
min(1, A).
Pham The Thong ( ) Clustering by mixture model April 22, 2011 12 / 44
13. Richardson&Green(1997) Overview
Outline
.
1 RJMCMC in clustering
Clustering overview
Reversible Jump MCMC
.
2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
Unknown Number of Components
Overview
Split/Merge and Birth/Death Mechanism
Algorithm
Result
.
3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
High-Dimensional Data
Overview
Variable Selection
RJMCMC Mechanism
Result
Weakness of the model
Pham The Thong ( ) Clustering by mixture model April 22, 2011 13 / 44
14. Richardson&Green(1997) Overview
Overview
1-dimensional data.
Goal:
Clustering data.
Estimating component parameters.
Estimating the distribution of data.
Predicting group of new data.
Demonstrated in three real dataset: Enzym, Acid,
and Galaxy.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 14 / 44
15. Richardson&Green(1997) Split/Merge and Birth/Death Mechanism
Outline
.
1 RJMCMC in clustering
Clustering overview
Reversible Jump MCMC
.
2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
Unknown Number of Components
Overview
Split/Merge and Birth/Death Mechanism
Algorithm
Result
.
3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
High-Dimensional Data
Overview
Variable Selection
RJMCMC Mechanism
Result
Weakness of the model
Pham The Thong ( ) Clustering by mixture model April 22, 2011 15 / 44
16. Richardson&Green(1997) Split/Merge and Birth/Death Mechanism
Split/Merge Mechanism
In Split move, select one component (wj ∗ , µj ∗ , σj ∗ )
to split to 2 components (wj1 , µj1 , σj1 ) and
(wj2 , µj2 , σj2 ).
In Merge move, select two components (wj1 , µj1 , σj1 )
and (wj2 , µj2 , σj2 ) to merge into one new component
(wj ∗ , µj ∗ , σj ∗ ).
Equalizing the zeroth, first, second moment of the
new component to those of a combination of the
two old components.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 16 / 44
17. Richardson&Green(1997) Split/Merge and Birth/Death Mechanism
Birth/Death Mechanism
Birth move
Generate wj ∗ , µj ∗ , σj ∗ from some distributions.
Rescale the weights.
Death move
Delete a randomly chosen empty component.
Rescale the weights.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 17 / 44
18. Richardson&Green(1997) Algorithm
Outline
.
1 RJMCMC in clustering
Clustering overview
Reversible Jump MCMC
.
2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
Unknown Number of Components
Overview
Split/Merge and Birth/Death Mechanism
Algorithm
Result
.
3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
High-Dimensional Data
Overview
Variable Selection
RJMCMC Mechanism
Result
Weakness of the model
Pham The Thong ( ) Clustering by mixture model April 22, 2011 18 / 44
19. Richardson&Green(1997) Algorithm
One iteration contains
Gibbs Sampler:
Updating the weights w
Updating the parameters µ, σ
Updating the allocation y
Split/Merge move
Birth/Death move
Pham The Thong ( ) Clustering by mixture model April 22, 2011 19 / 44
20. Richardson&Green(1997) Result
Outline
.
1 RJMCMC in clustering
Clustering overview
Reversible Jump MCMC
.
2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
Unknown Number of Components
Overview
Split/Merge and Birth/Death Mechanism
Algorithm
Result
.
3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
High-Dimensional Data
Overview
Variable Selection
RJMCMC Mechanism
Result
Weakness of the model
Pham The Thong ( ) Clustering by mixture model April 22, 2011 20 / 44
21. Richardson&Green(1997) Result
Post simulation
By processing the raw data come from the simulation,
one can
clustering data by selecting the allocation vector y
that has the highest frequency.
estimating component parameters by their posterior
mean.
estimating the distribution of data.
predicting group of new data.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 21 / 44
22. Richardson&Green(1997) Result
The three dataset
Enzym data: enzymatic activity of one enzyme in
the blood of 245 unrelated people. The interest is
identifying subgroups of slow or fast activity as a
marker of genetic polymorphism in the general
population(i.e. to some extent, people of the same
subgroup may have similar genetic structure
although they are unrelated).
Acid data: acidity level of 155 lakes in Wisconsin.
Galaxy data: velocities of 82 galaxies diverging from
our galaxy.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 22 / 44
23. Richardson&Green(1997) Result
Pham The Thong ( ) Clustering by mixture model April 22, 2011 23 / 44
24. Richardson&Green(1997) Result
Pham The Thong ( ) Clustering by mixture model April 22, 2011 24 / 44
25. Tadesse et.al.(2005) Overview
Outline
.
1 RJMCMC in clustering
Clustering overview
Reversible Jump MCMC
.
2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
Unknown Number of Components
Overview
Split/Merge and Birth/Death Mechanism
Algorithm
Result
.
3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
High-Dimensional Data
Overview
Variable Selection
RJMCMC Mechanism
Result
Weakness of the model
Pham The Thong ( ) Clustering by mixture model April 22, 2011 25 / 44
26. Tadesse et.al.(2005) Overview
Overview
High dimensional data
Goal:
Variable selecting.
Clustering data.
Predicting group of new data.
Applied to microarray data.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 26 / 44
27. Tadesse et.al.(2005) Variable Selection
Outline
.
1 RJMCMC in clustering
Clustering overview
Reversible Jump MCMC
.
2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
Unknown Number of Components
Overview
Split/Merge and Birth/Death Mechanism
Algorithm
Result
.
3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
High-Dimensional Data
Overview
Variable Selection
RJMCMC Mechanism
Result
Weakness of the model
Pham The Thong ( ) Clustering by mixture model April 22, 2011 27 / 44
28. Tadesse et.al.(2005) Variable Selection
Concept
Perhaps not all variables are useful for clustering.
By throwing away non-discriminating variables
(irrelevant variables) and clustering only on
discriminating variables (relevant variables) we may
improve clustering accuracy.
We can think of variable selection as one way to
generalize the basic approach “clustering by the full
set of variables” to “clustering by a subset of
variables”.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 28 / 44
29. Tadesse et.al.(2005) Variable Selection
The model of Tadesse et.al. I
Introduce γ = (γ1 , · · · , γp ): γj = 1 if the jth variable is
a discriminating variable and 0 if it is not.
Use (γ) and (γ c ) to index discriminating variables and
non-discriminating variables.
Three assumptions:
The set of discriminating variables and the set of
non-discriminating variables are independent.
If we look only at (γ c ), the data X(γ c ) have a
normal distribution(hence unsuitable for clustering).
If we look only at (γ), the data X(γ) have a mixture
distribution of G normal components (hence
suitable for clustering).
Pham The Thong ( ) Clustering by mixture model April 22, 2011 29 / 44
30. Tadesse et.al.(2005) Variable Selection
The model of Tadesse et.al. II
(η (γ c ) , Ω(γ c ) ): mean and covariance for the
non-discriminating variables.
(µk(γ) , Σk(γ) ): mean and covariance for the kth
components Ck .
The three assumptions can be written as
∏
n
( )
p(X|G , γ, w, y, µ, Σ, η, Ω) = N xi(γ c ) , η (γ c ) , Ω(γ c )
i=1
∏G ∏ ( )
N xi(γ) , µk(γ) , Σk(γ)
k=1 xi ∈Ck
Pham The Thong ( ) Clustering by mixture model April 22, 2011 30 / 44
31. Tadesse et.al.(2005) Variable Selection
Searching for γ
The problem of variable selection is re-casted as a
problem of searching for the most probable binary
vector γ.
Use a Metropolis search(of which Simulated
Annealing is one type)
At each step randomly choosing one of the following
two transitional moves: flip one bit or swap two bit
of γ(and accept the ) move with probability
new
|X,y,w,G
min 1, p(γ old |X,y,w,G )) .
p(γ
Pham The Thong ( ) Clustering by mixture model April 22, 2011 31 / 44
32. Tadesse et.al.(2005) RJMCMC Mechanism
Outline
.
1 RJMCMC in clustering
Clustering overview
Reversible Jump MCMC
.
2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
Unknown Number of Components
Overview
Split/Merge and Birth/Death Mechanism
Algorithm
Result
.
3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
High-Dimensional Data
Overview
Variable Selection
RJMCMC Mechanism
Result
Weakness of the model
Pham The Thong ( ) Clustering by mixture model April 22, 2011 32 / 44
33. Tadesse et.al.(2005) RJMCMC Mechanism
Difficulties in high dimension
Unlike 1-dimensional case, there is no obvious way
to split a covariance matrix into two covariance
matrix. Even if this could be done[4], the Jacobian
may not have closed-form.
The number of model parameters increases rapidly
with order p 2 . The chain may converge very slowly.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 33 / 44
34. Tadesse et.al.(2005) RJMCMC Mechanism
Approach of Tadesse et.al.
Integrating out the mean vector and the covariance
matrix to obtain a marginalized posterior in which
only G , w, γ,and y are involved.
Despite being quite tedious, the math follows a
standard framework: define conjugate priors for
mean and covariance matrix and then take the
integration.
Only need to split or merge the weights of
components in Split/Merge move. Birth/Death
move are the same as in 1-dimensional case.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 34 / 44
35. Tadesse et.al.(2005) RJMCMC Mechanism
Algorithm
One iteration contains
Metropolis search for γ
Gibbs sampler:
Updating the weights w
Updating the allocation y
Split/Merge move
Birth/Death move
Pham The Thong ( ) Clustering by mixture model April 22, 2011 35 / 44
36. Tadesse et.al.(2005) Result
Outline
.
1 RJMCMC in clustering
Clustering overview
Reversible Jump MCMC
.
2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
Unknown Number of Components
Overview
Split/Merge and Birth/Death Mechanism
Algorithm
Result
.
3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
High-Dimensional Data
Overview
Variable Selection
RJMCMC Mechanism
Result
Weakness of the model
Pham The Thong ( ) Clustering by mixture model April 22, 2011 36 / 44
37. Tadesse et.al.(2005) Result
Post simulation
Since the mean and covariance are integrated out,
there is no estimation for component parameters.
Variable selection:
Method 1: select the vector γ that have the highest
frequency.
Method 2: select all variables j that have p(γj |X, G )
greater than some threshold: p(γj |X, G ) ≥ a.
Clustering and group prediction can be done in the
same way as in the univariate case.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 37 / 44
38. Tadesse et.al.(2005) Result
Microarray data
14 samples (samples are come from tissues).
Variables are genes. There are 762 variables.
By clustering the samples into subgroups, one may
find out which genes are relevant to each subgroup.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 38 / 44
39. Tadesse et.al.(2005) Result
Pham The Thong ( ) Clustering by mixture model April 22, 2011 39 / 44
40. Tadesse et.al.(2005) Result
Pham The Thong ( ) Clustering by mixture model April 22, 2011 40 / 44
41. Tadesse et.al.(2005) Weakness of the model
Outline
.
1 RJMCMC in clustering
Clustering overview
Reversible Jump MCMC
.
2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
Unknown Number of Components
Overview
Split/Merge and Birth/Death Mechanism
Algorithm
Result
.
3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
High-Dimensional Data
Overview
Variable Selection
RJMCMC Mechanism
Result
Weakness of the model
Pham The Thong ( ) Clustering by mixture model April 22, 2011 41 / 44
42. Tadesse et.al.(2005) Weakness of the model
Weakness of the model [5]
The independence assumption would often lead to
the wrongly case in which one irrelevant variable be
identified as a discriminating one because it is
related to some discriminating variables.
It is not known whether one can relax this
assumption while still being able to perform
RJMCMC-based full Bayesian analysis.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 42 / 44
43. Tadesse et.al.(2005) Weakness of the model
References
[1]P.J.Green(1995), Reversible jump Markov chain Monte Carlo
computation and Bayesian model determination, Biometrica
82,4,711-732.
[2]S.Richardson and P.J.Green(1997), On Bayesian Analysis of
Mixtures with an Unknown Number of Components, J.R.Statist.
Soc.B 59, 4,731-792.
[3]M.G.Tadesse, N.Sha, and M. Vannucci(2005), Bayesian Variable
Selection in Clustering High-Dimensional Data,Journal of the
American Statistical Association 100,470,602-617.
[4]Petros Dellaportas and Ioulia Papageorgiou(2006), Multivariate
mixtures of normals with unknown number of components,Statistics
and Computing 16,1,57 - 68.
[5]Maugis et.al.(2009), Variable Selection for Clustering with
Gaussian Mixture Models, Biometrics 65, 701-709.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 43 / 44
44. Tadesse et.al.(2005) Weakness of the model
Thank you for your attention
Pham The Thong ( ) Clustering by mixture model April 22, 2011 44 / 44