SlideShare une entreprise Scribd logo
1  sur  44
Télécharger pour lire hors ligne
.
  .
                   Clustering by mixture model

                            Pham The Thong


                             April 22, 2011




Pham The Thong (        )    Clustering by mixture model   April 22, 2011   1 / 44
Outline
 .
 1   RJMCMC in clustering
       Clustering overview
       Reversible Jump MCMC
 .
 2   Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
     Unknown Number of Components
       Overview
       Split/Merge and Birth/Death Mechanism
       Algorithm
       Result
 .
 3   Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
     High-Dimensional Data
       Overview
       Variable Selection
       RJMCMC Mechanism
       Result
       Weakness of the model

Pham The Thong (         )       Clustering by mixture model           April 22, 2011   2 / 44
RJMCMC in clustering     Clustering overview



Outline
 .
 1   RJMCMC in clustering
       Clustering overview
       Reversible Jump MCMC
 .
 2   Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
     Unknown Number of Components
       Overview
       Split/Merge and Birth/Death Mechanism
       Algorithm
       Result
 .
 3   Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
     High-Dimensional Data
       Overview
       Variable Selection
       RJMCMC Mechanism
       Result
       Weakness of the model

Pham The Thong (         )          Clustering by mixture model          April 22, 2011   3 / 44
RJMCMC in clustering     Clustering overview



Clustering overview



          Divide the observations into groups.
          Predict group of a new observation.
          Model-based clustering: select a probabilistic model
          that underlying the observations and make
          statistical inferences based on that model. One
          popular model is the mixture model.




Pham The Thong (      )         Clustering by mixture model          April 22, 2011   4 / 44
RJMCMC in clustering     Clustering overview



  Clustering via mixture model
  X = (x1 , · · · , xn ) be independent p-dimensional
  observations from G populations.
                                            ∑
                                            G
                   f (xi |w, θ) =                  wk f (xi |θk )
                                            k=1

  f (xi |θk ) is the density of an observation xi from the kth
  component.
  w = (w1 , · · · , wG )T are component weights.
  θ = (θ1 , · · · , θG )T are component parameters.
  Clustering is done via allocation vector
  y = (y1 , · · · , yn )T : yi = k if the ith observation xi comes
  from component k.
Pham The Thong (     )         Clustering by mixture model          April 22, 2011   5 / 44
RJMCMC in clustering     Clustering overview



Some approaches

          Model Selection: Compare some model selection
          criteria of fixed-G models for various values of G to
          choose the best G . Inference on fixed-G model is
          often done via EM algorithm or Gibbs sampler.
          Nonparametric method: Use Dirichlet Process.
          Trans-dimensional Markov Chain Monte Carlo
          (MCMC): Allow G to be changed during the
          inference process by combining Gibbs sampler with
          MCMC moves that can change dimension of the
          model. Reversible jump MCMC (RJMCMC) is one
          possible scheme.
Pham The Thong (     )          Clustering by mixture model          April 22, 2011   6 / 44
RJMCMC in clustering     Reversible Jump MCMC



Outline
 .
 1   RJMCMC in clustering
       Clustering overview
       Reversible Jump MCMC
 .
 2   Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
     Unknown Number of Components
       Overview
       Split/Merge and Birth/Death Mechanism
       Algorithm
       Result
 .
 3   Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
     High-Dimensional Data
       Overview
       Variable Selection
       RJMCMC Mechanism
       Result
       Weakness of the model

Pham The Thong (         )          Clustering by mixture model           April 22, 2011   7 / 44
RJMCMC in clustering     Reversible Jump MCMC



Overview


          First developed in Green(1995)
          Has applications ranged well beyond mixture model
          analysis.
          Mixture model analysis power first demonstrated in
          Richardson&Green(1997). They considered only the
          1-dimensional case.
          Applied to multidimensional setting in Tadesse et.al.
          (2005).


Pham The Thong (      )         Clustering by mixture model           April 22, 2011   8 / 44
RJMCMC in clustering     Reversible Jump MCMC



Some advantages of clustering by
RJMCMC



          Avoid the task of model selection.
          Provide a coherent Bayesian framework. The cluster
          number G is not treated as a special parameter.
          Can provide useful summary of data which is
          difficult to obtain by other methods.




Pham The Thong (     )         Clustering by mixture model           April 22, 2011   9 / 44
RJMCMC in clustering     Reversible Jump MCMC



General ideas of RJMCMC I


          Simulating a Markov Chain that converges to the
          full posterior distribution p(G , y, w, θ|X).
          Hybrid sampler consist of Gibbs Sampler(the base)
          and jump moves (the extension).
          Gibbs sampler will sample (y, w, θ). Jump moves
          will sample the cluster number G .
          The jump moves come in pair: Split/Merge and
          Birth/Death


Pham The Thong (     )         Clustering by mixture model           April 22, 2011   10 / 44
RJMCMC in clustering     Reversible Jump MCMC



General ideas of RJMCMC II
          Split move: split one component into two
          components.
          Merge move: combine two components into one
          component.
          Birth move: create an empty component.
          Death move: delete an empty component.
          At each iteration, propose to perform Split(Birth)
          move with some fixed probability bk and with
          probability 1 − bk propose to perform Merge(Death)
          move.
          In one proposal, calculate all the changes to the
          model as if the move was made.
Pham The Thong (     )         Clustering by mixture model           April 22, 2011   11 / 44
RJMCMC in clustering     Reversible Jump MCMC



General ideas of RJMCMC III

          Calculate the acceptance probability A, which is the
          product of three terms:
                   the ratio of the posterior of the new model to that of the
                   old model
                   the ratio of the probability of the way to go from the
                   new model back to the old model to that of the way to
                   go from old model to new model
                   the Jacobian arises from the change of dimension
          To ensure convergence to the desired distribution,
          only actually carry out the move with probability
          min(1, A).

Pham The Thong (           )         Clustering by mixture model           April 22, 2011   12 / 44
Richardson&Green(1997)    Overview



Outline
 .
 1   RJMCMC in clustering
       Clustering overview
       Reversible Jump MCMC
 .
 2   Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
     Unknown Number of Components
       Overview
       Split/Merge and Birth/Death Mechanism
       Algorithm
       Result
 .
 3   Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
     High-Dimensional Data
       Overview
       Variable Selection
       RJMCMC Mechanism
       Result
       Weakness of the model

Pham The Thong (         )          Clustering by mixture model    April 22, 2011   13 / 44
Richardson&Green(1997)    Overview



Overview


          1-dimensional data.
          Goal:
                   Clustering data.
                   Estimating component parameters.
                   Estimating the distribution of data.
                   Predicting group of new data.
          Demonstrated in three real dataset: Enzym, Acid,
          and Galaxy.



Pham The Thong (           )          Clustering by mixture model   April 22, 2011   14 / 44
Richardson&Green(1997)    Split/Merge and Birth/Death Mechanism



Outline
 .
 1   RJMCMC in clustering
       Clustering overview
       Reversible Jump MCMC
 .
 2   Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
     Unknown Number of Components
       Overview
       Split/Merge and Birth/Death Mechanism
       Algorithm
       Result
 .
 3   Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
     High-Dimensional Data
       Overview
       Variable Selection
       RJMCMC Mechanism
       Result
       Weakness of the model

Pham The Thong (         )          Clustering by mixture model                    April 22, 2011   15 / 44
Richardson&Green(1997)    Split/Merge and Birth/Death Mechanism



Split/Merge Mechanism

          In Split move, select one component (wj ∗ , µj ∗ , σj ∗ )
          to split to 2 components (wj1 , µj1 , σj1 ) and
          (wj2 , µj2 , σj2 ).
          In Merge move, select two components (wj1 , µj1 , σj1 )
          and (wj2 , µj2 , σj2 ) to merge into one new component
          (wj ∗ , µj ∗ , σj ∗ ).
          Equalizing the zeroth, first, second moment of the
          new component to those of a combination of the
          two old components.


Pham The Thong (       )         Clustering by mixture model                    April 22, 2011   16 / 44
Richardson&Green(1997)    Split/Merge and Birth/Death Mechanism



Birth/Death Mechanism



          Birth move
                   Generate wj ∗ , µj ∗ , σj ∗ from some distributions.
                   Rescale the weights.
          Death move
                   Delete a randomly chosen empty component.
                   Rescale the weights.




Pham The Thong (            )          Clustering by mixture model                    April 22, 2011   17 / 44
Richardson&Green(1997)    Algorithm



Outline
 .
 1   RJMCMC in clustering
       Clustering overview
       Reversible Jump MCMC
 .
 2   Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
     Unknown Number of Components
       Overview
       Split/Merge and Birth/Death Mechanism
       Algorithm
       Result
 .
 3   Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
     High-Dimensional Data
       Overview
       Variable Selection
       RJMCMC Mechanism
       Result
       Weakness of the model

Pham The Thong (         )          Clustering by mixture model    April 22, 2011   18 / 44
Richardson&Green(1997)    Algorithm




  One iteration contains
      Gibbs Sampler:
                   Updating the weights w
                   Updating the parameters µ, σ
                   Updating the allocation y
          Split/Merge move
          Birth/Death move




Pham The Thong (          )          Clustering by mixture model   April 22, 2011   19 / 44
Richardson&Green(1997)    Result



Outline
 .
 1   RJMCMC in clustering
       Clustering overview
       Reversible Jump MCMC
 .
 2   Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
     Unknown Number of Components
       Overview
       Split/Merge and Birth/Death Mechanism
       Algorithm
       Result
 .
 3   Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
     High-Dimensional Data
       Overview
       Variable Selection
       RJMCMC Mechanism
       Result
       Weakness of the model

Pham The Thong (         )          Clustering by mixture model    April 22, 2011   20 / 44
Richardson&Green(1997)    Result



Post simulation


  By processing the raw data come from the simulation,
  one can
      clustering data by selecting the allocation vector y
      that has the highest frequency.
      estimating component parameters by their posterior
      mean.
      estimating the distribution of data.
      predicting group of new data.


Pham The Thong (   )          Clustering by mixture model   April 22, 2011   21 / 44
Richardson&Green(1997)    Result



The three dataset

          Enzym data: enzymatic activity of one enzyme in
          the blood of 245 unrelated people. The interest is
          identifying subgroups of slow or fast activity as a
          marker of genetic polymorphism in the general
          population(i.e. to some extent, people of the same
          subgroup may have similar genetic structure
          although they are unrelated).
          Acid data: acidity level of 155 lakes in Wisconsin.
          Galaxy data: velocities of 82 galaxies diverging from
          our galaxy.

Pham The Thong (      )         Clustering by mixture model   April 22, 2011   22 / 44
Richardson&Green(1997)    Result




Pham The Thong (   )          Clustering by mixture model   April 22, 2011   23 / 44
Richardson&Green(1997)    Result




Pham The Thong (   )          Clustering by mixture model   April 22, 2011   24 / 44
Tadesse et.al.(2005)   Overview



Outline
 .
 1   RJMCMC in clustering
       Clustering overview
       Reversible Jump MCMC
 .
 2   Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
     Unknown Number of Components
       Overview
       Split/Merge and Birth/Death Mechanism
       Algorithm
       Result
 .
 3   Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
     High-Dimensional Data
       Overview
       Variable Selection
       RJMCMC Mechanism
       Result
       Weakness of the model

Pham The Thong (         )          Clustering by mixture model    April 22, 2011   25 / 44
Tadesse et.al.(2005)   Overview



Overview



          High dimensional data
          Goal:
                   Variable selecting.
                   Clustering data.
                   Predicting group of new data.
          Applied to microarray data.




Pham The Thong (           )          Clustering by mixture model   April 22, 2011   26 / 44
Tadesse et.al.(2005)   Variable Selection



Outline
 .
 1   RJMCMC in clustering
       Clustering overview
       Reversible Jump MCMC
 .
 2   Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
     Unknown Number of Components
       Overview
       Split/Merge and Birth/Death Mechanism
       Algorithm
       Result
 .
 3   Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
     High-Dimensional Data
       Overview
       Variable Selection
       RJMCMC Mechanism
       Result
       Weakness of the model

Pham The Thong (         )          Clustering by mixture model          April 22, 2011   27 / 44
Tadesse et.al.(2005)   Variable Selection



Concept


          Perhaps not all variables are useful for clustering.
          By throwing away non-discriminating variables
          (irrelevant variables) and clustering only on
          discriminating variables (relevant variables) we may
          improve clustering accuracy.
          We can think of variable selection as one way to
          generalize the basic approach “clustering by the full
          set of variables” to “clustering by a subset of
          variables”.


Pham The Thong (      )          Clustering by mixture model          April 22, 2011   28 / 44
Tadesse et.al.(2005)   Variable Selection



The model of Tadesse et.al. I
  Introduce γ = (γ1 , · · · , γp ): γj = 1 if the jth variable is
  a discriminating variable and 0 if it is not.
  Use (γ) and (γ c ) to index discriminating variables and
  non-discriminating variables.
  Three assumptions:
       The set of discriminating variables and the set of
       non-discriminating variables are independent.
       If we look only at (γ c ), the data X(γ c ) have a
       normal distribution(hence unsuitable for clustering).
       If we look only at (γ), the data X(γ) have a mixture
       distribution of G normal components (hence
       suitable for clustering).
Pham The Thong (    )          Clustering by mixture model          April 22, 2011   29 / 44
Tadesse et.al.(2005)   Variable Selection



The model of Tadesse et.al. II
  (η (γ c ) , Ω(γ c ) ): mean and covariance for the
  non-discriminating variables.
  (µk(γ) , Σk(γ) ): mean and covariance for the kth
  components Ck .
  The three assumptions can be written as
                                               ∏
                                               n
                                                        (                              )
  p(X|G , γ, w, y, µ, Σ, η, Ω) =                       N xi(γ c ) , η (γ c ) , Ω(γ c )
                                              i=1
                                              ∏G        ∏           (                      )
                                                                   N xi(γ) , µk(γ) , Σk(γ)
                                              k=1 xi ∈Ck


Pham The Thong (   )          Clustering by mixture model                   April 22, 2011   30 / 44
Tadesse et.al.(2005)   Variable Selection



Searching for γ

          The problem of variable selection is re-casted as a
          problem of searching for the most probable binary
          vector γ.
          Use a Metropolis search(of which Simulated
          Annealing is one type)
          At each step randomly choosing one of the following
          two transitional moves: flip one bit or swap two bit
          of γ(and accept the )     move with probability
                     new
                          |X,y,w,G
          min 1, p(γ old |X,y,w,G )) .
                  p(γ



Pham The Thong (     )          Clustering by mixture model          April 22, 2011   31 / 44
Tadesse et.al.(2005)   RJMCMC Mechanism



Outline
 .
 1   RJMCMC in clustering
       Clustering overview
       Reversible Jump MCMC
 .
 2   Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
     Unknown Number of Components
       Overview
       Split/Merge and Birth/Death Mechanism
       Algorithm
       Result
 .
 3   Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
     High-Dimensional Data
       Overview
       Variable Selection
       RJMCMC Mechanism
       Result
       Weakness of the model

Pham The Thong (         )          Clustering by mixture model        April 22, 2011   32 / 44
Tadesse et.al.(2005)   RJMCMC Mechanism



Difficulties in high dimension



          Unlike 1-dimensional case, there is no obvious way
          to split a covariance matrix into two covariance
          matrix. Even if this could be done[4], the Jacobian
          may not have closed-form.
          The number of model parameters increases rapidly
          with order p 2 . The chain may converge very slowly.




Pham The Thong (      )          Clustering by mixture model        April 22, 2011   33 / 44
Tadesse et.al.(2005)   RJMCMC Mechanism



Approach of Tadesse et.al.

          Integrating out the mean vector and the covariance
          matrix to obtain a marginalized posterior in which
          only G , w, γ,and y are involved.
          Despite being quite tedious, the math follows a
          standard framework: define conjugate priors for
          mean and covariance matrix and then take the
          integration.
          Only need to split or merge the weights of
          components in Split/Merge move. Birth/Death
          move are the same as in 1-dimensional case.

Pham The Thong (     )          Clustering by mixture model        April 22, 2011   34 / 44
Tadesse et.al.(2005)   RJMCMC Mechanism



Algorithm


  One iteration contains
      Metropolis search for γ
      Gibbs sampler:
                   Updating the weights w
                   Updating the allocation y
          Split/Merge move
          Birth/Death move



Pham The Thong (           )          Clustering by mixture model        April 22, 2011   35 / 44
Tadesse et.al.(2005)   Result



Outline
 .
 1   RJMCMC in clustering
       Clustering overview
       Reversible Jump MCMC
 .
 2   Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
     Unknown Number of Components
       Overview
       Split/Merge and Birth/Death Mechanism
       Algorithm
       Result
 .
 3   Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
     High-Dimensional Data
       Overview
       Variable Selection
       RJMCMC Mechanism
       Result
       Weakness of the model

Pham The Thong (         )          Clustering by mixture model    April 22, 2011   36 / 44
Tadesse et.al.(2005)   Result



Post simulation


          Since the mean and covariance are integrated out,
          there is no estimation for component parameters.
          Variable selection:
                   Method 1: select the vector γ that have the highest
                   frequency.
                   Method 2: select all variables j that have p(γj |X, G )
                   greater than some threshold: p(γj |X, G ) ≥ a.
          Clustering and group prediction can be done in the
          same way as in the univariate case.


Pham The Thong (           )          Clustering by mixture model   April 22, 2011   37 / 44
Tadesse et.al.(2005)   Result



Microarray data



          14 samples (samples are come from tissues).
          Variables are genes. There are 762 variables.
          By clustering the samples into subgroups, one may
          find out which genes are relevant to each subgroup.




Pham The Thong (     )          Clustering by mixture model   April 22, 2011   38 / 44
Tadesse et.al.(2005)   Result




Pham The Thong (   )          Clustering by mixture model   April 22, 2011   39 / 44
Tadesse et.al.(2005)   Result




Pham The Thong (   )          Clustering by mixture model   April 22, 2011   40 / 44
Tadesse et.al.(2005)   Weakness of the model



Outline
 .
 1   RJMCMC in clustering
       Clustering overview
       Reversible Jump MCMC
 .
 2   Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
     Unknown Number of Components
       Overview
       Split/Merge and Birth/Death Mechanism
       Algorithm
       Result
 .
 3   Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
     High-Dimensional Data
       Overview
       Variable Selection
       RJMCMC Mechanism
       Result
       Weakness of the model

Pham The Thong (         )          Clustering by mixture model             April 22, 2011   41 / 44
Tadesse et.al.(2005)   Weakness of the model



Weakness of the model [5]


          The independence assumption would often lead to
          the wrongly case in which one irrelevant variable be
          identified as a discriminating one because it is
          related to some discriminating variables.
          It is not known whether one can relax this
          assumption while still being able to perform
          RJMCMC-based full Bayesian analysis.



Pham The Thong (      )          Clustering by mixture model             April 22, 2011   42 / 44
Tadesse et.al.(2005)   Weakness of the model



References
  [1]P.J.Green(1995), Reversible jump Markov chain Monte Carlo
  computation and Bayesian model determination, Biometrica
  82,4,711-732.
  [2]S.Richardson and P.J.Green(1997), On Bayesian Analysis of
  Mixtures with an Unknown Number of Components, J.R.Statist.
  Soc.B 59, 4,731-792.
  [3]M.G.Tadesse, N.Sha, and M. Vannucci(2005), Bayesian Variable
  Selection in Clustering High-Dimensional Data,Journal of the
  American Statistical Association 100,470,602-617.
  [4]Petros Dellaportas and Ioulia Papageorgiou(2006), Multivariate
  mixtures of normals with unknown number of components,Statistics
  and Computing 16,1,57 - 68.
  [5]Maugis et.al.(2009), Variable Selection for Clustering with
  Gaussian Mixture Models, Biometrics 65, 701-709.

Pham The Thong (     )          Clustering by mixture model             April 22, 2011   43 / 44
Tadesse et.al.(2005)   Weakness of the model




                   Thank you for your attention




Pham The Thong (        )          Clustering by mixture model             April 22, 2011   44 / 44

Contenu connexe

Tendances

Change Detection of Water-Body in Synthetic Aperture Radar Images
Change Detection of Water-Body in Synthetic Aperture Radar ImagesChange Detection of Water-Body in Synthetic Aperture Radar Images
Change Detection of Water-Body in Synthetic Aperture Radar ImagesCSCJournals
 
Fabric Textile Defect Detection, By Selection A Suitable Subset Of Wavelet Co...
Fabric Textile Defect Detection, By Selection A Suitable Subset Of Wavelet Co...Fabric Textile Defect Detection, By Selection A Suitable Subset Of Wavelet Co...
Fabric Textile Defect Detection, By Selection A Suitable Subset Of Wavelet Co...CSCJournals
 
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type  Data MK-Prototypes: A Novel Algorithm for Clustering Mixed Type  Data
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data IJMER
 
Accelerating materials property predictions using machine learning
Accelerating materials property predictions using machine learningAccelerating materials property predictions using machine learning
Accelerating materials property predictions using machine learningGhanshyam Pilania
 
Paper id 21201488
Paper id 21201488Paper id 21201488
Paper id 21201488IJRAT
 
Hybrid Method HVS-MRMR for Variable Selection in Multilayer Artificial Neural...
Hybrid Method HVS-MRMR for Variable Selection in Multilayer Artificial Neural...Hybrid Method HVS-MRMR for Variable Selection in Multilayer Artificial Neural...
Hybrid Method HVS-MRMR for Variable Selection in Multilayer Artificial Neural...IJECEIAES
 
Fuzzy c-Means Clustering Algorithms
Fuzzy c-Means Clustering AlgorithmsFuzzy c-Means Clustering Algorithms
Fuzzy c-Means Clustering AlgorithmsJustin Cletus
 
General Theory of Boundaries
General Theory of BoundariesGeneral Theory of Boundaries
General Theory of BoundariesVicente Fachina
 
LOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATION
LOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATIONLOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATION
LOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATIONorajjournal
 
Classification accuracy analyses using Shannon’s Entropy
Classification accuracy analyses using Shannon’s EntropyClassification accuracy analyses using Shannon’s Entropy
Classification accuracy analyses using Shannon’s EntropyIJERA Editor
 
Dimensionality Reduction Evolution and Validation
Dimensionality Reduction Evolution and ValidationDimensionality Reduction Evolution and Validation
Dimensionality Reduction Evolution and Validationiosrjce
 

Tendances (14)

Change Detection of Water-Body in Synthetic Aperture Radar Images
Change Detection of Water-Body in Synthetic Aperture Radar ImagesChange Detection of Water-Body in Synthetic Aperture Radar Images
Change Detection of Water-Body in Synthetic Aperture Radar Images
 
Fabric Textile Defect Detection, By Selection A Suitable Subset Of Wavelet Co...
Fabric Textile Defect Detection, By Selection A Suitable Subset Of Wavelet Co...Fabric Textile Defect Detection, By Selection A Suitable Subset Of Wavelet Co...
Fabric Textile Defect Detection, By Selection A Suitable Subset Of Wavelet Co...
 
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type  Data MK-Prototypes: A Novel Algorithm for Clustering Mixed Type  Data
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
 
Accelerating materials property predictions using machine learning
Accelerating materials property predictions using machine learningAccelerating materials property predictions using machine learning
Accelerating materials property predictions using machine learning
 
Paper id 21201488
Paper id 21201488Paper id 21201488
Paper id 21201488
 
Hybrid Method HVS-MRMR for Variable Selection in Multilayer Artificial Neural...
Hybrid Method HVS-MRMR for Variable Selection in Multilayer Artificial Neural...Hybrid Method HVS-MRMR for Variable Selection in Multilayer Artificial Neural...
Hybrid Method HVS-MRMR for Variable Selection in Multilayer Artificial Neural...
 
Fuzzy c-Means Clustering Algorithms
Fuzzy c-Means Clustering AlgorithmsFuzzy c-Means Clustering Algorithms
Fuzzy c-Means Clustering Algorithms
 
General Theory of Boundaries
General Theory of BoundariesGeneral Theory of Boundaries
General Theory of Boundaries
 
LOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATION
LOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATIONLOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATION
LOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATION
 
Cs501 cluster analysis
Cs501 cluster analysisCs501 cluster analysis
Cs501 cluster analysis
 
Classification accuracy analyses using Shannon’s Entropy
Classification accuracy analyses using Shannon’s EntropyClassification accuracy analyses using Shannon’s Entropy
Classification accuracy analyses using Shannon’s Entropy
 
K044055762
K044055762K044055762
K044055762
 
Dimensionality Reduction Evolution and Validation
Dimensionality Reduction Evolution and ValidationDimensionality Reduction Evolution and Validation
Dimensionality Reduction Evolution and Validation
 
Ajas11 alok
Ajas11 alokAjas11 alok
Ajas11 alok
 

Similaire à RJMCMC in clustering

Two-Stage Eagle Strategy with Differential Evolution
Two-Stage Eagle Strategy with Differential EvolutionTwo-Stage Eagle Strategy with Differential Evolution
Two-Stage Eagle Strategy with Differential EvolutionXin-She Yang
 
Xmeasures - Accuracy evaluation of overlapping and multi-resolution clusterin...
Xmeasures - Accuracy evaluation of overlapping and multi-resolution clusterin...Xmeasures - Accuracy evaluation of overlapping and multi-resolution clusterin...
Xmeasures - Accuracy evaluation of overlapping and multi-resolution clusterin...Artem Lutov
 
AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENE...
AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENE...AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENE...
AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENE...Zac Darcy
 
Analyzing the impact of genetic parameters on gene grouping genetic algorithm...
Analyzing the impact of genetic parameters on gene grouping genetic algorithm...Analyzing the impact of genetic parameters on gene grouping genetic algorithm...
Analyzing the impact of genetic parameters on gene grouping genetic algorithm...Alexander Decker
 
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...IJITCA Journal
 
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...IJITCA Journal
 
Trends in Computer Science and Information Technology
Trends in Computer Science and Information TechnologyTrends in Computer Science and Information Technology
Trends in Computer Science and Information Technologypeertechzpublication
 
Particle Swarm Optimization for Nano-Particles Extraction from Supporting Mat...
Particle Swarm Optimization for Nano-Particles Extraction from Supporting Mat...Particle Swarm Optimization for Nano-Particles Extraction from Supporting Mat...
Particle Swarm Optimization for Nano-Particles Extraction from Supporting Mat...CSCJournals
 
An Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal ClustersAn Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal ClustersIJCSEA Journal
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
 
Enhanced Genetic Algorithm with K-Means for the Clustering Problem
Enhanced Genetic Algorithm with K-Means for the Clustering ProblemEnhanced Genetic Algorithm with K-Means for the Clustering Problem
Enhanced Genetic Algorithm with K-Means for the Clustering ProblemAnders Viken
 
A Genetic Algorithm on Optimization Test Functions
A Genetic Algorithm on Optimization Test FunctionsA Genetic Algorithm on Optimization Test Functions
A Genetic Algorithm on Optimization Test FunctionsIJMERJOURNAL
 
Relevance Vector Machines for Earthquake Response Spectra
Relevance Vector Machines for Earthquake Response Spectra Relevance Vector Machines for Earthquake Response Spectra
Relevance Vector Machines for Earthquake Response Spectra drboon
 
Relevance Vector Machines for Earthquake Response Spectra
Relevance Vector Machines for Earthquake Response Spectra Relevance Vector Machines for Earthquake Response Spectra
Relevance Vector Machines for Earthquake Response Spectra drboon
 
BIOMERTICAL TECHNIQUE FOR STABILITY ANALYSIS SHIV SHANKAR LONIYA 03.pptx
BIOMERTICAL TECHNIQUE FOR STABILITY ANALYSIS SHIV SHANKAR LONIYA 03.pptxBIOMERTICAL TECHNIQUE FOR STABILITY ANALYSIS SHIV SHANKAR LONIYA 03.pptx
BIOMERTICAL TECHNIQUE FOR STABILITY ANALYSIS SHIV SHANKAR LONIYA 03.pptxShivshankarLoniya
 
'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysistuxette
 

Similaire à RJMCMC in clustering (20)

EiB Seminar from Esteban Vegas, Ph.D.
EiB Seminar from Esteban Vegas, Ph.D. EiB Seminar from Esteban Vegas, Ph.D.
EiB Seminar from Esteban Vegas, Ph.D.
 
Two-Stage Eagle Strategy with Differential Evolution
Two-Stage Eagle Strategy with Differential EvolutionTwo-Stage Eagle Strategy with Differential Evolution
Two-Stage Eagle Strategy with Differential Evolution
 
Xmeasures - Accuracy evaluation of overlapping and multi-resolution clusterin...
Xmeasures - Accuracy evaluation of overlapping and multi-resolution clusterin...Xmeasures - Accuracy evaluation of overlapping and multi-resolution clusterin...
Xmeasures - Accuracy evaluation of overlapping and multi-resolution clusterin...
 
AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENE...
AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENE...AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENE...
AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENE...
 
Analyzing the impact of genetic parameters on gene grouping genetic algorithm...
Analyzing the impact of genetic parameters on gene grouping genetic algorithm...Analyzing the impact of genetic parameters on gene grouping genetic algorithm...
Analyzing the impact of genetic parameters on gene grouping genetic algorithm...
 
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
 
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
 
Trends in Computer Science and Information Technology
Trends in Computer Science and Information TechnologyTrends in Computer Science and Information Technology
Trends in Computer Science and Information Technology
 
Particle Swarm Optimization for Nano-Particles Extraction from Supporting Mat...
Particle Swarm Optimization for Nano-Particles Extraction from Supporting Mat...Particle Swarm Optimization for Nano-Particles Extraction from Supporting Mat...
Particle Swarm Optimization for Nano-Particles Extraction from Supporting Mat...
 
Particle filter
Particle filterParticle filter
Particle filter
 
An Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal ClustersAn Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal Clusters
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering Algorithm
 
Enhanced Genetic Algorithm with K-Means for the Clustering Problem
Enhanced Genetic Algorithm with K-Means for the Clustering ProblemEnhanced Genetic Algorithm with K-Means for the Clustering Problem
Enhanced Genetic Algorithm with K-Means for the Clustering Problem
 
A Genetic Algorithm on Optimization Test Functions
A Genetic Algorithm on Optimization Test FunctionsA Genetic Algorithm on Optimization Test Functions
A Genetic Algorithm on Optimization Test Functions
 
Relevance Vector Machines for Earthquake Response Spectra
Relevance Vector Machines for Earthquake Response Spectra Relevance Vector Machines for Earthquake Response Spectra
Relevance Vector Machines for Earthquake Response Spectra
 
Relevance Vector Machines for Earthquake Response Spectra
Relevance Vector Machines for Earthquake Response Spectra Relevance Vector Machines for Earthquake Response Spectra
Relevance Vector Machines for Earthquake Response Spectra
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
BIOMERTICAL TECHNIQUE FOR STABILITY ANALYSIS SHIV SHANKAR LONIYA 03.pptx
BIOMERTICAL TECHNIQUE FOR STABILITY ANALYSIS SHIV SHANKAR LONIYA 03.pptxBIOMERTICAL TECHNIQUE FOR STABILITY ANALYSIS SHIV SHANKAR LONIYA 03.pptx
BIOMERTICAL TECHNIQUE FOR STABILITY ANALYSIS SHIV SHANKAR LONIYA 03.pptx
 
'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis
 
Sota
SotaSota
Sota
 

Dernier

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 

Dernier (20)

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 

RJMCMC in clustering

  • 1. . . Clustering by mixture model Pham The Thong April 22, 2011 Pham The Thong ( ) Clustering by mixture model April 22, 2011 1 / 44
  • 2. Outline . 1 RJMCMC in clustering Clustering overview Reversible Jump MCMC . 2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an Unknown Number of Components Overview Split/Merge and Birth/Death Mechanism Algorithm Result . 3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering High-Dimensional Data Overview Variable Selection RJMCMC Mechanism Result Weakness of the model Pham The Thong ( ) Clustering by mixture model April 22, 2011 2 / 44
  • 3. RJMCMC in clustering Clustering overview Outline . 1 RJMCMC in clustering Clustering overview Reversible Jump MCMC . 2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an Unknown Number of Components Overview Split/Merge and Birth/Death Mechanism Algorithm Result . 3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering High-Dimensional Data Overview Variable Selection RJMCMC Mechanism Result Weakness of the model Pham The Thong ( ) Clustering by mixture model April 22, 2011 3 / 44
  • 4. RJMCMC in clustering Clustering overview Clustering overview Divide the observations into groups. Predict group of a new observation. Model-based clustering: select a probabilistic model that underlying the observations and make statistical inferences based on that model. One popular model is the mixture model. Pham The Thong ( ) Clustering by mixture model April 22, 2011 4 / 44
  • 5. RJMCMC in clustering Clustering overview Clustering via mixture model X = (x1 , · · · , xn ) be independent p-dimensional observations from G populations. ∑ G f (xi |w, θ) = wk f (xi |θk ) k=1 f (xi |θk ) is the density of an observation xi from the kth component. w = (w1 , · · · , wG )T are component weights. θ = (θ1 , · · · , θG )T are component parameters. Clustering is done via allocation vector y = (y1 , · · · , yn )T : yi = k if the ith observation xi comes from component k. Pham The Thong ( ) Clustering by mixture model April 22, 2011 5 / 44
  • 6. RJMCMC in clustering Clustering overview Some approaches Model Selection: Compare some model selection criteria of fixed-G models for various values of G to choose the best G . Inference on fixed-G model is often done via EM algorithm or Gibbs sampler. Nonparametric method: Use Dirichlet Process. Trans-dimensional Markov Chain Monte Carlo (MCMC): Allow G to be changed during the inference process by combining Gibbs sampler with MCMC moves that can change dimension of the model. Reversible jump MCMC (RJMCMC) is one possible scheme. Pham The Thong ( ) Clustering by mixture model April 22, 2011 6 / 44
  • 7. RJMCMC in clustering Reversible Jump MCMC Outline . 1 RJMCMC in clustering Clustering overview Reversible Jump MCMC . 2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an Unknown Number of Components Overview Split/Merge and Birth/Death Mechanism Algorithm Result . 3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering High-Dimensional Data Overview Variable Selection RJMCMC Mechanism Result Weakness of the model Pham The Thong ( ) Clustering by mixture model April 22, 2011 7 / 44
  • 8. RJMCMC in clustering Reversible Jump MCMC Overview First developed in Green(1995) Has applications ranged well beyond mixture model analysis. Mixture model analysis power first demonstrated in Richardson&Green(1997). They considered only the 1-dimensional case. Applied to multidimensional setting in Tadesse et.al. (2005). Pham The Thong ( ) Clustering by mixture model April 22, 2011 8 / 44
  • 9. RJMCMC in clustering Reversible Jump MCMC Some advantages of clustering by RJMCMC Avoid the task of model selection. Provide a coherent Bayesian framework. The cluster number G is not treated as a special parameter. Can provide useful summary of data which is difficult to obtain by other methods. Pham The Thong ( ) Clustering by mixture model April 22, 2011 9 / 44
  • 10. RJMCMC in clustering Reversible Jump MCMC General ideas of RJMCMC I Simulating a Markov Chain that converges to the full posterior distribution p(G , y, w, θ|X). Hybrid sampler consist of Gibbs Sampler(the base) and jump moves (the extension). Gibbs sampler will sample (y, w, θ). Jump moves will sample the cluster number G . The jump moves come in pair: Split/Merge and Birth/Death Pham The Thong ( ) Clustering by mixture model April 22, 2011 10 / 44
  • 11. RJMCMC in clustering Reversible Jump MCMC General ideas of RJMCMC II Split move: split one component into two components. Merge move: combine two components into one component. Birth move: create an empty component. Death move: delete an empty component. At each iteration, propose to perform Split(Birth) move with some fixed probability bk and with probability 1 − bk propose to perform Merge(Death) move. In one proposal, calculate all the changes to the model as if the move was made. Pham The Thong ( ) Clustering by mixture model April 22, 2011 11 / 44
  • 12. RJMCMC in clustering Reversible Jump MCMC General ideas of RJMCMC III Calculate the acceptance probability A, which is the product of three terms: the ratio of the posterior of the new model to that of the old model the ratio of the probability of the way to go from the new model back to the old model to that of the way to go from old model to new model the Jacobian arises from the change of dimension To ensure convergence to the desired distribution, only actually carry out the move with probability min(1, A). Pham The Thong ( ) Clustering by mixture model April 22, 2011 12 / 44
  • 13. Richardson&Green(1997) Overview Outline . 1 RJMCMC in clustering Clustering overview Reversible Jump MCMC . 2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an Unknown Number of Components Overview Split/Merge and Birth/Death Mechanism Algorithm Result . 3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering High-Dimensional Data Overview Variable Selection RJMCMC Mechanism Result Weakness of the model Pham The Thong ( ) Clustering by mixture model April 22, 2011 13 / 44
  • 14. Richardson&Green(1997) Overview Overview 1-dimensional data. Goal: Clustering data. Estimating component parameters. Estimating the distribution of data. Predicting group of new data. Demonstrated in three real dataset: Enzym, Acid, and Galaxy. Pham The Thong ( ) Clustering by mixture model April 22, 2011 14 / 44
  • 15. Richardson&Green(1997) Split/Merge and Birth/Death Mechanism Outline . 1 RJMCMC in clustering Clustering overview Reversible Jump MCMC . 2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an Unknown Number of Components Overview Split/Merge and Birth/Death Mechanism Algorithm Result . 3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering High-Dimensional Data Overview Variable Selection RJMCMC Mechanism Result Weakness of the model Pham The Thong ( ) Clustering by mixture model April 22, 2011 15 / 44
  • 16. Richardson&Green(1997) Split/Merge and Birth/Death Mechanism Split/Merge Mechanism In Split move, select one component (wj ∗ , µj ∗ , σj ∗ ) to split to 2 components (wj1 , µj1 , σj1 ) and (wj2 , µj2 , σj2 ). In Merge move, select two components (wj1 , µj1 , σj1 ) and (wj2 , µj2 , σj2 ) to merge into one new component (wj ∗ , µj ∗ , σj ∗ ). Equalizing the zeroth, first, second moment of the new component to those of a combination of the two old components. Pham The Thong ( ) Clustering by mixture model April 22, 2011 16 / 44
  • 17. Richardson&Green(1997) Split/Merge and Birth/Death Mechanism Birth/Death Mechanism Birth move Generate wj ∗ , µj ∗ , σj ∗ from some distributions. Rescale the weights. Death move Delete a randomly chosen empty component. Rescale the weights. Pham The Thong ( ) Clustering by mixture model April 22, 2011 17 / 44
  • 18. Richardson&Green(1997) Algorithm Outline . 1 RJMCMC in clustering Clustering overview Reversible Jump MCMC . 2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an Unknown Number of Components Overview Split/Merge and Birth/Death Mechanism Algorithm Result . 3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering High-Dimensional Data Overview Variable Selection RJMCMC Mechanism Result Weakness of the model Pham The Thong ( ) Clustering by mixture model April 22, 2011 18 / 44
  • 19. Richardson&Green(1997) Algorithm One iteration contains Gibbs Sampler: Updating the weights w Updating the parameters µ, σ Updating the allocation y Split/Merge move Birth/Death move Pham The Thong ( ) Clustering by mixture model April 22, 2011 19 / 44
  • 20. Richardson&Green(1997) Result Outline . 1 RJMCMC in clustering Clustering overview Reversible Jump MCMC . 2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an Unknown Number of Components Overview Split/Merge and Birth/Death Mechanism Algorithm Result . 3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering High-Dimensional Data Overview Variable Selection RJMCMC Mechanism Result Weakness of the model Pham The Thong ( ) Clustering by mixture model April 22, 2011 20 / 44
  • 21. Richardson&Green(1997) Result Post simulation By processing the raw data come from the simulation, one can clustering data by selecting the allocation vector y that has the highest frequency. estimating component parameters by their posterior mean. estimating the distribution of data. predicting group of new data. Pham The Thong ( ) Clustering by mixture model April 22, 2011 21 / 44
  • 22. Richardson&Green(1997) Result The three dataset Enzym data: enzymatic activity of one enzyme in the blood of 245 unrelated people. The interest is identifying subgroups of slow or fast activity as a marker of genetic polymorphism in the general population(i.e. to some extent, people of the same subgroup may have similar genetic structure although they are unrelated). Acid data: acidity level of 155 lakes in Wisconsin. Galaxy data: velocities of 82 galaxies diverging from our galaxy. Pham The Thong ( ) Clustering by mixture model April 22, 2011 22 / 44
  • 23. Richardson&Green(1997) Result Pham The Thong ( ) Clustering by mixture model April 22, 2011 23 / 44
  • 24. Richardson&Green(1997) Result Pham The Thong ( ) Clustering by mixture model April 22, 2011 24 / 44
  • 25. Tadesse et.al.(2005) Overview Outline . 1 RJMCMC in clustering Clustering overview Reversible Jump MCMC . 2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an Unknown Number of Components Overview Split/Merge and Birth/Death Mechanism Algorithm Result . 3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering High-Dimensional Data Overview Variable Selection RJMCMC Mechanism Result Weakness of the model Pham The Thong ( ) Clustering by mixture model April 22, 2011 25 / 44
  • 26. Tadesse et.al.(2005) Overview Overview High dimensional data Goal: Variable selecting. Clustering data. Predicting group of new data. Applied to microarray data. Pham The Thong ( ) Clustering by mixture model April 22, 2011 26 / 44
  • 27. Tadesse et.al.(2005) Variable Selection Outline . 1 RJMCMC in clustering Clustering overview Reversible Jump MCMC . 2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an Unknown Number of Components Overview Split/Merge and Birth/Death Mechanism Algorithm Result . 3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering High-Dimensional Data Overview Variable Selection RJMCMC Mechanism Result Weakness of the model Pham The Thong ( ) Clustering by mixture model April 22, 2011 27 / 44
  • 28. Tadesse et.al.(2005) Variable Selection Concept Perhaps not all variables are useful for clustering. By throwing away non-discriminating variables (irrelevant variables) and clustering only on discriminating variables (relevant variables) we may improve clustering accuracy. We can think of variable selection as one way to generalize the basic approach “clustering by the full set of variables” to “clustering by a subset of variables”. Pham The Thong ( ) Clustering by mixture model April 22, 2011 28 / 44
  • 29. Tadesse et.al.(2005) Variable Selection The model of Tadesse et.al. I Introduce γ = (γ1 , · · · , γp ): γj = 1 if the jth variable is a discriminating variable and 0 if it is not. Use (γ) and (γ c ) to index discriminating variables and non-discriminating variables. Three assumptions: The set of discriminating variables and the set of non-discriminating variables are independent. If we look only at (γ c ), the data X(γ c ) have a normal distribution(hence unsuitable for clustering). If we look only at (γ), the data X(γ) have a mixture distribution of G normal components (hence suitable for clustering). Pham The Thong ( ) Clustering by mixture model April 22, 2011 29 / 44
  • 30. Tadesse et.al.(2005) Variable Selection The model of Tadesse et.al. II (η (γ c ) , Ω(γ c ) ): mean and covariance for the non-discriminating variables. (µk(γ) , Σk(γ) ): mean and covariance for the kth components Ck . The three assumptions can be written as ∏ n ( ) p(X|G , γ, w, y, µ, Σ, η, Ω) = N xi(γ c ) , η (γ c ) , Ω(γ c ) i=1 ∏G ∏ ( ) N xi(γ) , µk(γ) , Σk(γ) k=1 xi ∈Ck Pham The Thong ( ) Clustering by mixture model April 22, 2011 30 / 44
  • 31. Tadesse et.al.(2005) Variable Selection Searching for γ The problem of variable selection is re-casted as a problem of searching for the most probable binary vector γ. Use a Metropolis search(of which Simulated Annealing is one type) At each step randomly choosing one of the following two transitional moves: flip one bit or swap two bit of γ(and accept the ) move with probability new |X,y,w,G min 1, p(γ old |X,y,w,G )) . p(γ Pham The Thong ( ) Clustering by mixture model April 22, 2011 31 / 44
  • 32. Tadesse et.al.(2005) RJMCMC Mechanism Outline . 1 RJMCMC in clustering Clustering overview Reversible Jump MCMC . 2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an Unknown Number of Components Overview Split/Merge and Birth/Death Mechanism Algorithm Result . 3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering High-Dimensional Data Overview Variable Selection RJMCMC Mechanism Result Weakness of the model Pham The Thong ( ) Clustering by mixture model April 22, 2011 32 / 44
  • 33. Tadesse et.al.(2005) RJMCMC Mechanism Difficulties in high dimension Unlike 1-dimensional case, there is no obvious way to split a covariance matrix into two covariance matrix. Even if this could be done[4], the Jacobian may not have closed-form. The number of model parameters increases rapidly with order p 2 . The chain may converge very slowly. Pham The Thong ( ) Clustering by mixture model April 22, 2011 33 / 44
  • 34. Tadesse et.al.(2005) RJMCMC Mechanism Approach of Tadesse et.al. Integrating out the mean vector and the covariance matrix to obtain a marginalized posterior in which only G , w, γ,and y are involved. Despite being quite tedious, the math follows a standard framework: define conjugate priors for mean and covariance matrix and then take the integration. Only need to split or merge the weights of components in Split/Merge move. Birth/Death move are the same as in 1-dimensional case. Pham The Thong ( ) Clustering by mixture model April 22, 2011 34 / 44
  • 35. Tadesse et.al.(2005) RJMCMC Mechanism Algorithm One iteration contains Metropolis search for γ Gibbs sampler: Updating the weights w Updating the allocation y Split/Merge move Birth/Death move Pham The Thong ( ) Clustering by mixture model April 22, 2011 35 / 44
  • 36. Tadesse et.al.(2005) Result Outline . 1 RJMCMC in clustering Clustering overview Reversible Jump MCMC . 2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an Unknown Number of Components Overview Split/Merge and Birth/Death Mechanism Algorithm Result . 3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering High-Dimensional Data Overview Variable Selection RJMCMC Mechanism Result Weakness of the model Pham The Thong ( ) Clustering by mixture model April 22, 2011 36 / 44
  • 37. Tadesse et.al.(2005) Result Post simulation Since the mean and covariance are integrated out, there is no estimation for component parameters. Variable selection: Method 1: select the vector γ that have the highest frequency. Method 2: select all variables j that have p(γj |X, G ) greater than some threshold: p(γj |X, G ) ≥ a. Clustering and group prediction can be done in the same way as in the univariate case. Pham The Thong ( ) Clustering by mixture model April 22, 2011 37 / 44
  • 38. Tadesse et.al.(2005) Result Microarray data 14 samples (samples are come from tissues). Variables are genes. There are 762 variables. By clustering the samples into subgroups, one may find out which genes are relevant to each subgroup. Pham The Thong ( ) Clustering by mixture model April 22, 2011 38 / 44
  • 39. Tadesse et.al.(2005) Result Pham The Thong ( ) Clustering by mixture model April 22, 2011 39 / 44
  • 40. Tadesse et.al.(2005) Result Pham The Thong ( ) Clustering by mixture model April 22, 2011 40 / 44
  • 41. Tadesse et.al.(2005) Weakness of the model Outline . 1 RJMCMC in clustering Clustering overview Reversible Jump MCMC . 2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an Unknown Number of Components Overview Split/Merge and Birth/Death Mechanism Algorithm Result . 3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering High-Dimensional Data Overview Variable Selection RJMCMC Mechanism Result Weakness of the model Pham The Thong ( ) Clustering by mixture model April 22, 2011 41 / 44
  • 42. Tadesse et.al.(2005) Weakness of the model Weakness of the model [5] The independence assumption would often lead to the wrongly case in which one irrelevant variable be identified as a discriminating one because it is related to some discriminating variables. It is not known whether one can relax this assumption while still being able to perform RJMCMC-based full Bayesian analysis. Pham The Thong ( ) Clustering by mixture model April 22, 2011 42 / 44
  • 43. Tadesse et.al.(2005) Weakness of the model References [1]P.J.Green(1995), Reversible jump Markov chain Monte Carlo computation and Bayesian model determination, Biometrica 82,4,711-732. [2]S.Richardson and P.J.Green(1997), On Bayesian Analysis of Mixtures with an Unknown Number of Components, J.R.Statist. Soc.B 59, 4,731-792. [3]M.G.Tadesse, N.Sha, and M. Vannucci(2005), Bayesian Variable Selection in Clustering High-Dimensional Data,Journal of the American Statistical Association 100,470,602-617. [4]Petros Dellaportas and Ioulia Papageorgiou(2006), Multivariate mixtures of normals with unknown number of components,Statistics and Computing 16,1,57 - 68. [5]Maugis et.al.(2009), Variable Selection for Clustering with Gaussian Mixture Models, Biometrics 65, 701-709. Pham The Thong ( ) Clustering by mixture model April 22, 2011 43 / 44
  • 44. Tadesse et.al.(2005) Weakness of the model Thank you for your attention Pham The Thong ( ) Clustering by mixture model April 22, 2011 44 / 44