SlideShare une entreprise Scribd logo
1  sur  53
Télécharger pour lire hors ligne
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models




Mixture models



       5    Mixture models
              Mixture models
              MCMC approaches
              Label switching
              MCMC for variable dimension models




                                                                          291 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models




Missing variable models
       Complexity of a model may originate from the fact that some piece
       of information is missing


       Example
       Arnason–Schwarz model with missing zones
       Probit model with missing normal variate


       Generic representation

                                          f (x|θ) =             g(x, z|θ) dz
                                                            Z



                                                                               292 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     Mixture models



Mixture models
       Models of mixtures of distributions:

                                        x ∼ fj with probability pj ,

       for j = 1, 2, . . . , k, with overall density

                                         p1 f1 (x) + · · · + pk fk (x) .

        Usual case: parameterised components
                                                     k
                                                          pi f (x|θi )
                                                    i=1

       where weights pi ’s are distinguished from other parameters

                                                                           293 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     Mixture models



Motivations


               Dataset made of several latent/missing/unobserved
               strata/subpopulations. Mixture structure due to the missing
               origin/allocation of each observation to a specific
               subpopulation/stratum. Inference on either the allocations
               (clustering) or on the parameters (θi , pi ) or on the number of
               groups
               Semiparametric perspective where mixtures are basis
               approximations of unknown distributions




                                                                                  294 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     Mixture models



License
       Dataset derived from license plate image
       Grey levels concentrated on 256 values later jittered



                                          0.30
                                          0.25
                                          0.20
                                          0.15
                                          0.10
                                          0.05
                                          0.00




                                                 −4    −2      0          2   4   295 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     Mixture models



Likelihood

       For a sample of independent random variables (x1 , · · · , xn ),
       likelihood
                                    n
                                         {p1 f1 (xi ) + · · · + pk fk (xi )} .
                                   i=1


       Expanding this product involves

                                                            kn

       elementary terms: prohibitive to compute in large samples.
       But likelihood still computable [pointwise] in O(kn) time.


                                                                                 296 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     Mixture models



Normal mean benchmark
       Normal mixture

                                    p N (µ1 , 1) + (1 − p) N (µ2 , 1)

       with only unknown means (2-D representation possible)

       Identifiability

     Parameters µ1 and µ2

                                                                     4
     identifiable: µ1 cannot be
                                                                     3
     confused with µ2 when p is                                 µ2

                                                                     2


     different from 0.5.
                                                                     1




     Presence of a spurious mode,
                                                                     0
                                                                     −1




     understood by letting p go to 0.5                                    −1   0   1        2   3   4

                                                                                       µ1




                                                                                                        297 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     Mixture models



Bayesian Inference

       For any prior π (θ, p), posterior distribution of (θ, p) available up
       to a multiplicative constant
                                                                         
                                                    n     k
                             π(θ, p|x) ∝                     pj f (xi |θj ) π (θ, p) .
                                                  i=1 j=1


       at a cost of order O(kn)


       Difficulty
       Despite this, derivation of posterior characteristics like posterior
       expectations only possible in an exponential time of order O(k n )!


                                                                                           298 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     Mixture models



Missing variable representation
       Associate to each xi a missing/latent variable zi that indicates its
       component:
                             zi |p ∼ Mk (p1 , . . . , pk )
       and
                                               xi |zi , θ ∼ f (·|θzi ) .
       Completed likelihood
                                                              n
                                      ℓ(θ, p|x, z) =              pzi f (xi |θzi ) ,
                                                            i=1

       and
                                                        n
                              π(θ, p|x, z) ∝                pzi f (xi |θzi ) π (θ, p) ,
                                                      i=1

       where z = (z1 , . . . , zn ).
                                                                                          299 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     Mixture models



Partition sets
       Denote by Z = {1, . . . , k}n set of the k n possible vectors z.
       Z decomposed into a partition of sets

                                                    Z = ∪r Zj
                                                         j=1

       For a given allocation size vector (n1 , . . . , nk ), where
       n1 + . . . + nk = n, partition sets
                                           n                              n
                         Zj =        z:         Izi =1 = n1 , . . . ,           Izi =k = nk   ,
                                          i=1                             i=1

       for all allocations with the given allocation size (n1 , . . . , nk ) and
       where labels j = j(n1 , . . . , nk ) defined by lexicographical ordering
       on the (n1 , . . . , nk )’s.

                                                                                                  300 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     Mixture models



Posterior closed form representations
                                                    r
                             π (θ, p|x) =                       ω (z) π (θ, p|x, z) ,
                                                   i=1 z∈Zi

       where ω (z) represents marginal posterior probability of the
       allocation z conditional on x [derived by integrating out the
       parameters θ and p]

       Bayes estimator of (θ, p)
                                         r
                                                     ω (z) Eπ [θ, p|x, z] .
                                       i=1 z∈Zi



                                         c Too costly: 2n terms
                                                                                        301 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC approaches



General Gibbs sampling for mixture models

       Take advantage of the missing data structure:
       Algorithm
               Initialization: choose p(0) and θ (0) arbitrarily
               Step t. For t = 1, . . .
                                       (t)
                    1   Generate zi          (i = 1, . . . , n) from (j = 1, . . . , k)
                                     (t)          (t−1)      (t−1)           (t−1)          (t−1)
                               P    zi     = j|pj         , θj       , xi ∝ pj       f xi |θj
                    2   Generate p(t) from π(p|z(t) ),
                    3   Generate θ (t) from π(θ|z(t) , x).




                                                                                                    302 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC approaches



Exponential families

       When
                            f (x|θ) = h(x) exp(R(θ) · T (x) > −ψ(θ))
       simulation of both p and θ usually straightforward:
       Conjugate prior on θj given byBack to definition

                                  πj (θ) ∝ exp(R(θ) · αj − βj ψ(θ)) ,

       where αj ∈ Rk and βj > 0 are hyperparameters and

                                              p ∼ D (γ1 , . . . , γk )

       [Dirichlet distribution]


                                                                          303 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC approaches



Gibbs sampling for exponential family mixtures
       Algorithm
               Initialization. Choose p(0) and θ (0) ,
               Step t. For t = 1, . . .
                                       (t)
                    1   Generate zi           (i = 1, . . . , n, j = 1, . . . , k) from

                                      (t)          (t−1)      (t−1)               (t−1)           (t−1)
                               P zi         = j|pj         , θj       , xi ∝ pj           f xi |θj

                                            (t)       n                   (t)         n
                    2   Compute nj =                  i=1 Iz (t) =j ,     sj =        i=1 Iz (t) =j t(xi )
                                                                  i                           i
                    3   Generate p(t) from D (γ1 + n1 , . . . , γk + nk ),
                                  (t)
                    4   Generate θj (j = 1, . . . , k) from

                                                                                (t)
                          π(θj |z(t) , x) ∝ exp R(θj ) · (α + sj ) − ψ(θj )(nj + β) .

                                                                                                             304 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC approaches



Normal mean example




       For mixture of two normal distributions with unknown means,

                                     pN (µ, τ 2 ) + (1 − p)N (θ, σ 2 ) ,

       and a normal prior N (δ, 1/λ) on µ1 and µ2 ,




                                                                           305 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC approaches



Normal mean example (cont’d)
       Algorithm
                                                      (0)              (0)
               Initialization. Choose µ1                    and µ2 ,
               Step t. For t = 1, . . .
                                       (t)
                    1   Generate zi           (i = 1, . . . , n) from

                                                                                       1                      2
                             (t)                                (t)                            (t−1)
                        P zi       = 1 = 1−P zi                       = 2 ∝ p exp −      xi − µ1
                                                                                       2
                                                  n                                   n
                                        (t)
                    2   Compute nj =                   Iz(t) =j and (sx )(t) =
                                                                      j                 Iz(t) =j xi
                                                            i                              i
                                                 i=1                               i=1

                                       (t)                                   λδ + (sx )(t)
                                                                                    j            1
                    3   Generate µj (j = 1, 2) from N                               (t)
                                                                                           ,        (t)
                                                                                                          .
                                                                               λ + nj        λ + nj

                                                                                                                  306 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
        MCMC approaches



Normal mean example (cont’d)




                                                                      4
        4




                                                                      3
        3




                                                                      2
        2




                                                                 µ2
   µ2




                                                                      1
        1




                                                                      0
        0




                                                                      −1
        −1




                                                                           −1   0   1        2   3     4
             −1     0        1        2         3        4
                                                                                        µ1
                                 µ1


   (a) initialised at random                                      (b) initialised close to the
                                                                  lower mode
                                                                                                     307 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC approaches



License
       Consider k = 3 components, a D3 (1/2, 1/2, 1/2) prior for the
       weights, a N (x, σ 2 /3) prior on the means µi and a G a(10, σ 2 )
                         ˆ                                          ˆ
                                −2
       prior on the precisions σi , where x and σ 2 are the empirical mean
                                                 ˆ
       and variance of License
                                                          [Empirical Bayes]
                        0.30
                        0.25
                        0.20
                        0.15
                        0.10
                        0.05
                        0.00




                                 −4             −2              0         2   4




                                                                                  308 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC approaches



Metropolis–Hastings alternative

       For the Gibbs sampler, completion of z increases the dimension of
       the simulation space and reduces the mobility of the parameter
       chain.

       Metropolis–Hastings algorithm available since posterior available in
       closed form, as long as q provides a correct exploration of the
       posterior surface, since

                                       π(θ ′ , p′ |x) q(θ, p|θ ′ , p′ )
                                                                        ∧1
                                       π(θ, p|x) q(θ ′ , p′ |θ, p)

       computable in O(kn) time


                                                                             309 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC approaches



Random walk Metropolis–Hastings
       Proposal distribution for the new value
                                         (t−1)
                              θj = θj            + uj where uj ∼ N (0, ζ 2 )




                                                                               4
   In mean mixture case, Gaussian random




                                                                               3
   walk proposal is




                                                                               2
                                 (t−1)



                                                                          µ2
              µ1 ∼ N           µ1        , ζ2        and


                                                                               1
                                                                               0
                                 (t−1)
              µ2 ∼ N           µ2        , ζ2
                                                                               −1
                                                                                    −1   0   1        2   3   4

                                                                                                 µ1




                                                                                                                  310 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC approaches



Random walk Metropolis–Hastings for means
       Algorithm
               Initialization:
                          (0)      (0)
               Choose µ1 and µ2
               Iteration t (t ≥ 1):
                                                            (t−1)
                    1   Generate µ1 from N                µ1        , ζ2 ,
                                                            (t−1)
                    2   Generate µ2 from N                µ2        , ζ2 ,
                    3   Compute
                                                                           (t−1)       (t−1)
                                          r = π (µ1 , µ2 |x) π µ1                  , µ2         |x

                                                                                 (t)      (t)
                    4   Generate u ∼ U[0,1] : if u < r, then µ1 , µ2                             = (µ1 , µ2 )
                                  (t)    (t)            (t−1)      (t−1)
                        else µ1 , µ2            = µ1            , µ2         .

                                                                                                                311 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC approaches



Random walk extensions
       Difficulties with constrained parameters, like p such that
                                   k
                                   i=1 pk = 1.

       Resolution by overparameterisation
                                                          k
                                        pj = wj                 wl ,      wj > 0 ,
                                                          l=1

       and proposed move on the wj ’s
                                                  (t−1)
                        log(wj ) = log(wj                 ) + uj where uj ∼ N (0, ζ 2 )




               Watch out for the Jacobian in the log transform
                                                                                          312 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     Label switching



Identifiability

       A mixture model is invariant under permutations of the indices of
       the components.
       E.g., mixtures
                          0.3N (0, 1) + 0.7N (2.3, 1)
       and
                                       0.7N (2.3, 1) + 0.3N (0, 1)
       are exactly the same!

       c The component parameters θi are not identifiable
       marginally since they are exchangeable



                                                                           313 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     Label switching



Connected difficulties


          1    Number of modes of the likelihood of order O(k!):
                c Maximization and even [MCMC] exploration of the
               posterior surface harder
          2    Under exchangeable priors on (θ, p) [prior invariant under
               permutation of the indices], all posterior marginals are
               identical:
                c Posterior expectation of θ1 equal to posterior expectation
               of θ2 .




                                                                               314 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     Label switching



License
       Since Gibbs output does not produce exchangeability, the Gibbs
       sampler has not explored the whole parameter space: it lacks
       energy to switch simultaneously enough component allocations at
       once




                                                                             0.2 0.3 0.4 0.5
                  3
                  2
                  1
          µi




                                                                        pi
                  −1 0




                                  0   100   200       300   400   500                          −1         0          1         2       3
                                                  n                                                                 µi




                                                                             0.4 0.6 0.8 1.0
                0.2 0.3 0.4 0.5




                                                                        σi
           pi




                                  0   100   200       300   400   500                           0.2           0.3        0.4         0.5
                                                  n                                                                 pi
                0.4 0.6 0.8 1.0




                                                                             −1 0 1 2 3
          σi




                                                                        pi




                                  0   100   200       300   400   500                               0.4         0.6            0.8         1.0
                                                  n                                                                 σi




                                                                                                                                                 315 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     Label switching



Label switching paradox



       We should observe the exchangeability of the components [label
       switching] to conclude about convergence of the Gibbs sampler.

       If we observe it, then we do not know how to estimate the
       parameters.

       If we do not, then we are uncertain about the convergence!!!




                                                                          316 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     Label switching



Constraints
       Usual reply to lack of identifiability: impose constraints like
       µ1 ≤ . . . ≤ µk in the prior

       Mostly incompatible with the topology of the posterior surface:
       posterior expectations then depend on the choice of the
       constraints.


       Computational detail
       The constraint does not need to be imposed during the simulation
       but can instead be imposed after simulation, by reordering the
       MCMC output according to the constraint. This avoids possible
       negative effects on convergence.


                                                                          317 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     Label switching



Relabeling towards the mode
       Selection of one of the k! modal regions of the posterior once
       simulation is over, by computing the approximate MAP
                                   ∗
                        (θ, p)(i       )
                                           with     i∗ = arg max π (θ, p)(i) |x
                                                                 i=1,...,M



       Pivotal Reordering
       At iteration i ∈ {1, . . . , M },
         1 Compute the optimal permutation


                                                                             ∗
                              τi = arg min d τ (θ (i) , p(i) ), (θ (i ) , p(i ) )
                                                                                 ∗

                                            τ ∈Sk


               where d(·, ·) distance in the parameter space.
          2    Set (θ (i) , p(i) ) = τi ((θ (i) , p(i) )).

                                                                                     318 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     Label switching



Re-ban on improper priors
       Difficult to use improper priors in the setting of mixtures because
       independent improper priors,
                                        k
                        π (θ) =             πi (θi ) ,      with          πi (θi )dθi = ∞
                                      i=1

       end up, for all n’s, with the property

                                               π(θ, p|x)dθdp = ∞ .


       Reason
       There are (k − 1)n terms among the k n terms in the expansion
       that allocate no observation at all to the i-th component.
                                                                                            319 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     Label switching



Tempering
       Facilitate exploration of π by flattening the target: simulate from
       πα (x) ∝ π(x)α for α > 0 large enough


               Determine where the modal regions of π are (possibly with
               parallel versions using different α’s)
               Recycle simulations from π(x)α into simulations from π by
               importance sampling
               Simple modification of the Metropolis–Hastings algorithm,
               with new acceptance
                                                              α
                                         π(θ ′ , p′ |x)           q(θ, p|θ ′ , p′ )
                                                                                      ∧1
                                         π(θ, p|x)                q(θ ′ , p′ |θ, p)


                                                                                           320 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     Label switching



Tempering with the mean mixture

                           1                                  0.5                           0.2
     4




                                           4




                                                                              4
     3




                                           3




                                                                              3
     2




                                           2




                                                                              2
     1




                                           1




                                                                              1
     0




                                           0




                                                                              0
     −1




                                           −1




                                                                              −1
           −1    0     1       2    3     4     −1     0     1      2     3   4    −1   0   1     2   3   4




                                                                                                          321 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC for variable dimension models



MCMC for variable dimension models

           One of the things we do
           not know is
           the number of things we
           do not know
           —P. Green, 1996—


       Example
               the number of components in a mixture
               the number of covariates in a regression model
               the number of different capture probabilities in a
               capture-recapture model
               the number of lags in a time-series model
                                                                          322 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC for variable dimension models



Variable dimension models
       Variable dimension model defined as a collection of models
       (k = 1. . . . , K),

                                       Mk = {f (·|θk ); θk ∈ Θk } ,

       associated with a collection of priors on the parameters of these
       models,
                                     πk (θk ) ,
       and a prior distribution on the indices of these models,

                                           {̺(k) , k = 1, . . . , K} .

       Global notation:
                                          π(Mk , θk ) = ̺(k) πk (θk )

                                                                           323 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC for variable dimension models



Bayesian inference for variable dimension models
       Two perspectives:
        1 consider the variable dimension model as a whole and

          estimate quantities meaningful for the whole like predictives

                           Pr(Mk |x1 , . . . , xn )              fk (x|θk )dx πk (θk |x1 , . . . , xn )dθ .
                       k

               & quantities only meaningful for submodels (like moments of
               θk ), computed from πk (θk |x1 , . . . , xn ). [Usual setup]
          2    resort to testing by choosing the best submodel via
                                                                 Z
                                                            pi            fi (x|θi )πi (θi )dθi
                                                                     Θi
                                      p(Mi |x) = X                   Z
                                                             pj               fj (x|θj )πj (θj )dθj
                                                        j                Θj



                                                                                                              324 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC for variable dimension models



Green’s reversible jumps

       Computational burden in exploring [possibly infinite] complex
       parameter space: Green’s method set up a proper
       measure–theoretic framework for designing moves between
       models/spaces Mk /Θk of varying dimensions [no one-to-one
       correspondence]

       Create a reversible kernel K on H =                                k {k}   × Θk such that

                                   K(x, dy)π(x)dx =                           K(y, dx)π(y)dy
                         A     B                                   B      A

       for the invariant density π [x is of the form (k, θ(k) )] and for all
       sets A, B [un-detailed balance]

                                                                                                   325 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC for variable dimension models



Green’s reversible kernel


       Since Markov kernel K necessarily of the form [either stay at the
       same value or move to one of the states]
                                         ∞
                     K(x, B) =                     ρm (x, y)qm (x, dy) + ω(x)IB (x)
                                       m=1

       where qm (x, dy) transition measure to model Mm and ρm (x, y)
       corresponding acceptance probability, only need to consider
       proposals between two models, M1 and M2 , say.




                                                                                      326 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC for variable dimension models



Green’s reversibility constraint


       If transition kernels between those models are K1→2 (θ1 , dθ) and
       K2→1 (θ2 , dθ), formal use of the detailed balance condition

                          π(dθ1 ) K1→2 (θ1 , dθ) = π(dθ2 ) K2→1 (θ2 , dθ) ,




               To preserve stationarity, necessary symmetry between
               moves/proposals from M1 to M2 and from M2 to M1




                                                                              327 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC for variable dimension models



Two-model transitions


       How to move from model M1 to M2 , with Markov chain being in
       state θ1 ∈ M1 [i.e. k = 1]?

       Most often M1 and M2 are of different dimensions, e.g.
                          dim(M2 ) > dim(M1 ).

       In that case, need to supplement both spaces Θk1 and Θk2 with
       adequate artificial spaces to create a one-to-one mapping between
       them, most often by augmenting the space of the smaller model.




                                                                          328 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC for variable dimension models



Two-model completions

       E.g., move from θ2 ∈ Θ2 to Θ1 chosen to be a deterministic
       transform of θ2
                               θ1 = Ψ2→1 (θ2 ) ,
       Reverse proposal expressed as

                                             θ2 = Ψ1→2 (θ1 , v1→2 )

       where v1→2 r.v. of dimension dim(M2 ) − dim(M1 ), generated as

                                            v1→2 ∼ ϕ1→2 (v1→2 ) .




                                                                          329 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC for variable dimension models



Two-model acceptance probability
       In this case, θ2 has density [under stationarity]
                                                                                               −1
                                                                          ∂Ψ1→2 (θ1 , v1→2 )
                 q1→2 (θ2 ) = π1 (θ1 ) ϕ1→2 (v1→2 )                                                 ,
                                                                            ∂(θ1 , v1→2 )

       by the Jacobian rule.
       To make it π2 (θ2 ) we thus need to accept this value with
       probability

                                               π(M2 , θ2 )        ∂Ψ1→2 (θ1 , v1→2 )
          α(θ1 , v1→2 ) = 1 ∧                                                        .
                                         π(M1 , θ1 ) ϕ1→2 (v1→2 )   ∂(θ1 , v1→2 )

               This is restricted to the case when only moves between M1
               and M2 are considered

                                                                                                        330 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC for variable dimension models



Interpretation

       The representation puts us back in a fixed dimension setting:
               M1 × V1→2 and M2 in one-to-one relation.
               reversibility imposes that θ1 is derived as

                                               (θ1 , v1→2 ) = Ψ−1 (θ2 )
                                                               1→2

               appears like a regular Metropolis–Hastings move from the
               couple (θ1 , v1→2 ) to θ2 when stationary distributions are
               π(M1 , θ1 ) × ϕ1→2 (v1→2 ) and π(M2 , θ2 ), and when proposal
               distribution is deterministic (??)



                                                                               331 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC for variable dimension models



Pseudo-deterministic reasoning
       Consider the proposals

       θ2 ∼ N (Ψ1→2 (θ1 , v1→2 ), ε)                        and           Ψ1→2 (θ1 , v1→2 ) ∼ N (θ2 , ε)

       Reciprocal proposal has density
                    exp −(θ2 − Ψ1→2 (θ1 , v1→2 ))2 /2ε   ∂Ψ1→2 (θ1 , v1→2 )
                                √                      ×
                                  2πε                      ∂(θ1 , v1→2 )

       by the Jacobian rule.
       Thus Metropolis–Hastings acceptance probability is
                                      π(M2 , θ2 )        ∂Ψ1→2 (θ1 , v1→2 )
                          1∧
                                π(M1 , θ1 ) ϕ1→2 (v1→2 )   ∂(θ1 , v1→2 )

       Does not depend on ε: Let ε go to 0

                                                                                                      332 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC for variable dimension models



Generic reversible jump acceptance probability
       If several models are considered simultaneously, with probability
       ̟1→2 of choosing move to M2 while in M1 , as in
                                           XZ
                                           ∞
                           K(x, B) =                 ρm (x, y)qm (x, dy) + ω(x)IB (x)
                                           m=1


       acceptance probability of θ2 = Ψ1→2 (θ1 , v1→2 ) is
                                             π(M2 , θ2 ) ̟2→1        ∂Ψ1→2 (θ1 , v1→2 )
          α(θ1 , v1→2 ) = 1 ∧
                                       π(M1 , θ1 ) ̟1→2 ϕ1→2 (v1→2 )   ∂(θ1 , v1→2 )

       while acceptance probability of θ1 with (θ1 , v1→2 ) = Ψ−1 (θ2 ) is
                                                               1→2
                                                                                        −1
                                     π(M1 , θ1 ) ̟1→2 ϕ1→2 (v1→2 ) ∂Ψ1→2 (θ1 , v1→2 )
        α(θ1 , v1→2 ) = 1 ∧
                                           π(M2 , θ2 ) ̟2→1          ∂(θ1 , v1→2 )


                                                                                             333 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC for variable dimension models



Green’s sampler

       Algorithm
       Iteration t (t ≥ 1): if x(t) = (m, θ(m) ),
          1    Select model Mn with probability πmn
          2    Generate umn ∼ ϕmn (u) and set
               (θ(n) , vnm ) = Ψm→n (θ(m) , umn )
          3    Take x(t+1) = (n, θ(n) ) with probability

                             π(n, θ(n) ) πnm ϕnm (vnm )                   ∂Ψm→n (θ(m) , umn )
                    min                                                                       ,1
                             π(m, θ(m) ) πmn ϕmn (umn )                     ∂(θ(m) , umn )

               and take x(t+1) = x(t) otherwise.


                                                                                                   334 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC for variable dimension models



Mixture of normal distributions



                                                                                           
                                                                  k                        
                                                                                        2
                        Mk =            (pjk , µjk , σjk );               pjk N (µjk , σjk )
                                                                                           
                                                                 j=1



       Restrict moves from Mk to adjacent models, like Mk+1 and
       Mk−1 , with probabilities πk(k+1) and πk(k−1) .




                                                                                                335 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC for variable dimension models



Mixture birth

       Take Ψk→k+1 as a birth step: i.e. add a new normal component in
       the mixture, by generating the parameters of the new component
       from the prior distribution

           (µk+1 , σk+1 ) ∼ π(µ, σ)                   and       pk+1 ∼ Be(a1 , a2 + . . . + ak )

       if (p1 , . . . , pk ) ∼ Mk (a1 , . . . , ak )

       Jacobian is (1 − pk+1 )k−1

       Death step then derived from the reversibility constraint by
       removing one of the k components at random.


                                                                                                   336 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC for variable dimension models



Mixture acceptance probability
       Birth acceptance probability
                      π(k+1)k (k + 1)!             π(k + 1, θk+1 )
            min                                                             ,1
                      πk(k+1) (k + 1)k! π(k, θk ) (k + 1)ϕk(k+1) (uk(k+1) )
                               π(k+1)k ̺(k + 1) ℓk+1 (θk+1 ) (1 − pk+1 )k−1
                = min                                                       ,1   ,
                               πk(k+1) ̺(k)               ℓk (θk )

       where ℓk likelihood of the k component mixture model Mk and
       ̺(k) prior probability of model Mk .
       Combinatorial terms: there are (k + 1)! ways of defining a (k + 1)
       component mixture by adding one component, while, given a (k + 1)
       component mixture, there are (k + 1) choices for a component to die and
       then k! associated mixtures for the remaining components.

                                                                                     337 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC for variable dimension models



License




                                                                                                                                                         0.30
                                                                                       4
            14
            12




                                                                                       2




                                                                                                                                                         0.25
                                                                                  µi
       k

            10




                                                                                       0




                                                                                                                                                         0.20
            8




                                                                                       −2
                  0e+00   2e+04   4e+04                6e+04    8e+04    1e+05                0e+00   2e+05   4e+05            6e+05   8e+05   1e+06




                                                                                                                                                         0.15
                                          iterations                                                              iterations
            3.0




                                                                                       0.30




                                                                                                                                                         0.10
            2.5
            2.0




                                                                                       0.20
            1.5
       σi




                                                                                  pi




                                                                                                                                                         0.05
            1.0




                                                                                       0.10
            0.5




                                                                                                                                                         0.00
                                                                                       0.00
            0.0




                  0e+00   2e+05   4e+05                 6e+05    8e+05    1e+06               0e+00   2e+05   4e+05            6e+05   8e+05   1e+06

                                          iterations                                                              iterations


                                                                                                                                                  338 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC for variable dimension models



More coordinated moves



       Use of local moves that preserve structure of the original model.

       Split move from Mk to Mk+1 : replaces a random component, say
       the jth, with two new components, say the jth and the (j + 1)th,
       that are centered at the earlier jth component. And opposite
       merge move obtained by joining two components together.




                                                                           339 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC for variable dimension models



Splitting with moment preservation
       Split parameters for instance created under a moment preservation
       condition:
                     pjk           = pj(k+1) + p(j+1)(k+1) ,
                     pjk µjk       = pj(k+1) µj(k+1) + p(j+1)(k+1) µ(j+1)(k+1) ,
                          2                   2                     2
                     pjk σjk       = pj(k+1) σj(k+1) + p(j+1)(k+1) σ(j+1)(k+1) .
                                                                          M4




   Opposite merge move                                                    M5




   obtained by reversibility
   constraint                                                             M3




                                                                                   340 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC for variable dimension models



Splitting details


       Generate the auxiliary variable uk(k+1) as

                                    u1 , u3 ∼ U(0, 1), u2 ∼ N (0, τ 2 )

       and take
         pj(k+1) = u1 pjk ,                             p(j+1)(k+1) = (1 − u1 )pjk ,
                                                                               p      u2
         µj(k+1) = µjk + u2 ,                           µ(j+1)(k+1) = µjk − pjkj(k+1)
                                                                                 −p      ,
                                                                                      j)(k+1)
           2            2                                                 pjk −pj(k+1) u3 2
          σj(k+1) = u3 σjk ,                            σ(j+1)(k+1) =      pjk −pj(k+1) σjk   .




                                                                                                  341 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC for variable dimension models



Jacobian


       Corresponding Jacobian
            0                                                                                1
             u1        1 − u1       ···           ···           ···             ···
           Bpjk         −pjk        ···           ···           ···             ···          C
           B                                                                                 C
           B 0            0          1             1            ···             ···          C
           B                                                                                 C     pjk
       det B 0                                                                               C=             2
                                              −pj(k+1)
           B              0          1      pjk −pj(k+1)
                                                                ···             ···          C             σjk
           B                                                               pjk −pj(k+1) u3   C  (1 − u1 )2
           B 0             0         0             0             u3                          C
           @                                                                pjk −pj(k+1)     A
                                                                 2          −pj(k+1)
             0             0         0             0            σjk                    σ2
                                                                          pjk −pj(k+1) jk




                                                                                                           342 / 459
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Mixture models
     MCMC for variable dimension models



Acceptance probability

       Corresponding split acceptance probability
                      π(k+1)k ̺(k + 1) πk+1 (θk+1 )ℓk+1 (θk+1 )    pjk
             min                                                          σ2 , 1
                      πk(k+1) ̺(k)         πk (θk )ℓk (θk )     (1 − u1 )2 jk

       where π(k+1)k and πk(k+1) denote split and merge probabilities
       when in models Mk and Mk+1

       Factorial terms vanish: for a split move there are k possible choices of
       the split component and then (k + 1)! possible orderings of the θk+1
       vector while, for a merge, there are (k + 1)k possible choices for the
       components to be merged and then k! ways of ordering the resulting θk .



                                                                                   343 / 459

Contenu connexe

Tendances

Uncertainty in deep learning
Uncertainty in deep learningUncertainty in deep learning
Uncertainty in deep learningYujiro Katagiri
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes FactorsChristian Robert
 
Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsGilles Louppe
 
A discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functionsA discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functionsLARCA UPC
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modulesChristian Robert
 
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...Umberto Picchini
 
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...Xin-She Yang
 
Understanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized treesUnderstanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized treesGilles Louppe
 
Principle of Maximum Entropy
Principle of Maximum EntropyPrinciple of Maximum Entropy
Principle of Maximum EntropyJiawang Liu
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceChristian Robert
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussionChristian Robert
 
A Geometric Note on a Type of Multiple Testing-07-24-2015
A Geometric Note on a Type of Multiple Testing-07-24-2015A Geometric Note on a Type of Multiple Testing-07-24-2015
A Geometric Note on a Type of Multiple Testing-07-24-2015Junfeng Liu
 
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013Christian Robert
 

Tendances (20)

Uncertainty in deep learning
Uncertainty in deep learningUncertainty in deep learning
Uncertainty in deep learning
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes Factors
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random Forests
 
A discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functionsA discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functions
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modules
 
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
 
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
 
Understanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized treesUnderstanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized trees
 
Principle of Maximum Entropy
Principle of Maximum EntropyPrinciple of Maximum Entropy
Principle of Maximum Entropy
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
A Geometric Note on a Type of Multiple Testing-07-24-2015
A Geometric Note on a Type of Multiple Testing-07-24-2015A Geometric Note on a Type of Multiple Testing-07-24-2015
A Geometric Note on a Type of Multiple Testing-07-24-2015
 
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
 

En vedette

Exploratory Statistics with R
Exploratory Statistics with RExploratory Statistics with R
Exploratory Statistics with RChristian Robert
 
45th SIS Meeting, Padova, Italy
45th SIS Meeting, Padova, Italy45th SIS Meeting, Padova, Italy
45th SIS Meeting, Padova, ItalyChristian Robert
 
Course on Bayesian computational methods
Course on Bayesian computational methodsCourse on Bayesian computational methods
Course on Bayesian computational methodsChristian Robert
 
RSS discussion of Girolami and Calderhead, October 13, 2010
RSS discussion of Girolami and Calderhead, October 13, 2010RSS discussion of Girolami and Calderhead, October 13, 2010
RSS discussion of Girolami and Calderhead, October 13, 2010Christian Robert
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical MethodsChristian Robert
 
MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methodsChristian Robert
 
Yes III: Computational methods for model choice
Yes III: Computational methods for model choiceYes III: Computational methods for model choice
Yes III: Computational methods for model choiceChristian Robert
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical MethodsChristian Robert
 
Introduction to advanced Monte Carlo methods
Introduction to advanced Monte Carlo methodsIntroduction to advanced Monte Carlo methods
Introduction to advanced Monte Carlo methodsChristian Robert
 
Bayesian model choice (and some alternatives)
Bayesian model choice (and some alternatives)Bayesian model choice (and some alternatives)
Bayesian model choice (and some alternatives)Christian Robert
 
Introducing Monte Carlo Methods with R
Introducing Monte Carlo Methods with RIntroducing Monte Carlo Methods with R
Introducing Monte Carlo Methods with RChristian Robert
 
Bayesian model choice in cosmology
Bayesian model choice in cosmologyBayesian model choice in cosmology
Bayesian model choice in cosmologyChristian Robert
 
Bayesian Data Analysis for Ecology
Bayesian Data Analysis for EcologyBayesian Data Analysis for Ecology
Bayesian Data Analysis for EcologyChristian Robert
 
NOISE: Statistiques exploratoires avec R
NOISE: Statistiques exploratoires avec RNOISE: Statistiques exploratoires avec R
NOISE: Statistiques exploratoires avec RChristian Robert
 

En vedette (17)

Bayesian Core: Chapter 7
Bayesian Core: Chapter 7Bayesian Core: Chapter 7
Bayesian Core: Chapter 7
 
Exploratory Statistics with R
Exploratory Statistics with RExploratory Statistics with R
Exploratory Statistics with R
 
45th SIS Meeting, Padova, Italy
45th SIS Meeting, Padova, Italy45th SIS Meeting, Padova, Italy
45th SIS Meeting, Padova, Italy
 
Course on Bayesian computational methods
Course on Bayesian computational methodsCourse on Bayesian computational methods
Course on Bayesian computational methods
 
RSS discussion of Girolami and Calderhead, October 13, 2010
RSS discussion of Girolami and Calderhead, October 13, 2010RSS discussion of Girolami and Calderhead, October 13, 2010
RSS discussion of Girolami and Calderhead, October 13, 2010
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical Methods
 
MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methods
 
Yes III: Computational methods for model choice
Yes III: Computational methods for model choiceYes III: Computational methods for model choice
Yes III: Computational methods for model choice
 
Rimini discussion
Rimini discussionRimini discussion
Rimini discussion
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical Methods
 
Introduction to advanced Monte Carlo methods
Introduction to advanced Monte Carlo methodsIntroduction to advanced Monte Carlo methods
Introduction to advanced Monte Carlo methods
 
Bayesian model choice (and some alternatives)
Bayesian model choice (and some alternatives)Bayesian model choice (and some alternatives)
Bayesian model choice (and some alternatives)
 
Introducing Monte Carlo Methods with R
Introducing Monte Carlo Methods with RIntroducing Monte Carlo Methods with R
Introducing Monte Carlo Methods with R
 
Bayesian model choice in cosmology
Bayesian model choice in cosmologyBayesian model choice in cosmology
Bayesian model choice in cosmology
 
On Samples And Sampling
On Samples And SamplingOn Samples And Sampling
On Samples And Sampling
 
Bayesian Data Analysis for Ecology
Bayesian Data Analysis for EcologyBayesian Data Analysis for Ecology
Bayesian Data Analysis for Ecology
 
NOISE: Statistiques exploratoires avec R
NOISE: Statistiques exploratoires avec RNOISE: Statistiques exploratoires avec R
NOISE: Statistiques exploratoires avec R
 

Similaire à Bayesian Core: Chapter 6

San Antonio short course, March 2010
San Antonio short course, March 2010San Antonio short course, March 2010
San Antonio short course, March 2010Christian Robert
 
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics SeminarChristian Robert
 
Cs221 probability theory
Cs221 probability theoryCs221 probability theory
Cs221 probability theorydarwinrlo
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...zukun
 
Expectation propagation
Expectation propagationExpectation propagation
Expectation propagationDong Guo
 
Computational tools for Bayesian model choice
Computational tools for Bayesian model choiceComputational tools for Bayesian model choice
Computational tools for Bayesian model choiceChristian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
Mcmc & lkd free I
Mcmc & lkd free IMcmc & lkd free I
Mcmc & lkd free IDeb Roy
 
Monash University short course, part I
Monash University short course, part IMonash University short course, part I
Monash University short course, part IChristian Robert
 
Bayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet ProcessBayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet ProcessAlessandro Panella
 
from model uncertainty to ABC
from model uncertainty to ABCfrom model uncertainty to ABC
from model uncertainty to ABCChristian Robert
 
Fitness inheritance in the Bayesian optimization algorithm
Fitness inheritance in the Bayesian optimization algorithmFitness inheritance in the Bayesian optimization algorithm
Fitness inheritance in the Bayesian optimization algorithmMartin Pelikan
 

Similaire à Bayesian Core: Chapter 6 (20)

BAYSM'14, Wien, Austria
BAYSM'14, Wien, AustriaBAYSM'14, Wien, Austria
BAYSM'14, Wien, Austria
 
Jsm09 talk
Jsm09 talkJsm09 talk
Jsm09 talk
 
San Antonio short course, March 2010
San Antonio short course, March 2010San Antonio short course, March 2010
San Antonio short course, March 2010
 
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
 
Cs221 probability theory
Cs221 probability theoryCs221 probability theory
Cs221 probability theory
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
 
Savage-Dickey paradox
Savage-Dickey paradoxSavage-Dickey paradox
Savage-Dickey paradox
 
Expectation propagation
Expectation propagationExpectation propagation
Expectation propagation
 
Computational tools for Bayesian model choice
Computational tools for Bayesian model choiceComputational tools for Bayesian model choice
Computational tools for Bayesian model choice
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
Bayesian Core: Chapter 8
Bayesian Core: Chapter 8Bayesian Core: Chapter 8
Bayesian Core: Chapter 8
 
FEC 512.03
FEC 512.03FEC 512.03
FEC 512.03
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
Bayes 6
Bayes 6Bayes 6
Bayes 6
 
Mcmc & lkd free I
Mcmc & lkd free IMcmc & lkd free I
Mcmc & lkd free I
 
Monash University short course, part I
Monash University short course, part IMonash University short course, part I
Monash University short course, part I
 
Bayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet ProcessBayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet Process
 
from model uncertainty to ABC
from model uncertainty to ABCfrom model uncertainty to ABC
from model uncertainty to ABC
 
MaxEnt 2009 talk
MaxEnt 2009 talkMaxEnt 2009 talk
MaxEnt 2009 talk
 
Fitness inheritance in the Bayesian optimization algorithm
Fitness inheritance in the Bayesian optimization algorithmFitness inheritance in the Bayesian optimization algorithm
Fitness inheritance in the Bayesian optimization algorithm
 

Plus de Christian Robert

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceChristian Robert
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinChristian Robert
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?Christian Robert
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Christian Robert
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Christian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihoodChristian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsChristian Robert
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distancesChristian Robert
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferenceChristian Robert
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018Christian Robert
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distancesChristian Robert
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimationChristian Robert
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerChristian Robert
 
ABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsChristian Robert
 

Plus de Christian Robert (19)

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conference
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimation
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like sampler
 
ABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified models
 

Dernier

What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 

Dernier (20)

What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 

Bayesian Core: Chapter 6

  • 1. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models Mixture models 5 Mixture models Mixture models MCMC approaches Label switching MCMC for variable dimension models 291 / 459
  • 2. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models Missing variable models Complexity of a model may originate from the fact that some piece of information is missing Example Arnason–Schwarz model with missing zones Probit model with missing normal variate Generic representation f (x|θ) = g(x, z|θ) dz Z 292 / 459
  • 3. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models Mixture models Mixture models Models of mixtures of distributions: x ∼ fj with probability pj , for j = 1, 2, . . . , k, with overall density p1 f1 (x) + · · · + pk fk (x) . Usual case: parameterised components k pi f (x|θi ) i=1 where weights pi ’s are distinguished from other parameters 293 / 459
  • 4. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models Mixture models Motivations Dataset made of several latent/missing/unobserved strata/subpopulations. Mixture structure due to the missing origin/allocation of each observation to a specific subpopulation/stratum. Inference on either the allocations (clustering) or on the parameters (θi , pi ) or on the number of groups Semiparametric perspective where mixtures are basis approximations of unknown distributions 294 / 459
  • 5. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models Mixture models License Dataset derived from license plate image Grey levels concentrated on 256 values later jittered 0.30 0.25 0.20 0.15 0.10 0.05 0.00 −4 −2 0 2 4 295 / 459
  • 6. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models Mixture models Likelihood For a sample of independent random variables (x1 , · · · , xn ), likelihood n {p1 f1 (xi ) + · · · + pk fk (xi )} . i=1 Expanding this product involves kn elementary terms: prohibitive to compute in large samples. But likelihood still computable [pointwise] in O(kn) time. 296 / 459
  • 7. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models Mixture models Normal mean benchmark Normal mixture p N (µ1 , 1) + (1 − p) N (µ2 , 1) with only unknown means (2-D representation possible) Identifiability Parameters µ1 and µ2 4 identifiable: µ1 cannot be 3 confused with µ2 when p is µ2 2 different from 0.5. 1 Presence of a spurious mode, 0 −1 understood by letting p go to 0.5 −1 0 1 2 3 4 µ1 297 / 459
  • 8. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models Mixture models Bayesian Inference For any prior π (θ, p), posterior distribution of (θ, p) available up to a multiplicative constant   n k π(θ, p|x) ∝  pj f (xi |θj ) π (θ, p) . i=1 j=1 at a cost of order O(kn) Difficulty Despite this, derivation of posterior characteristics like posterior expectations only possible in an exponential time of order O(k n )! 298 / 459
  • 9. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models Mixture models Missing variable representation Associate to each xi a missing/latent variable zi that indicates its component: zi |p ∼ Mk (p1 , . . . , pk ) and xi |zi , θ ∼ f (·|θzi ) . Completed likelihood n ℓ(θ, p|x, z) = pzi f (xi |θzi ) , i=1 and n π(θ, p|x, z) ∝ pzi f (xi |θzi ) π (θ, p) , i=1 where z = (z1 , . . . , zn ). 299 / 459
  • 10. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models Mixture models Partition sets Denote by Z = {1, . . . , k}n set of the k n possible vectors z. Z decomposed into a partition of sets Z = ∪r Zj j=1 For a given allocation size vector (n1 , . . . , nk ), where n1 + . . . + nk = n, partition sets n n Zj = z: Izi =1 = n1 , . . . , Izi =k = nk , i=1 i=1 for all allocations with the given allocation size (n1 , . . . , nk ) and where labels j = j(n1 , . . . , nk ) defined by lexicographical ordering on the (n1 , . . . , nk )’s. 300 / 459
  • 11. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models Mixture models Posterior closed form representations r π (θ, p|x) = ω (z) π (θ, p|x, z) , i=1 z∈Zi where ω (z) represents marginal posterior probability of the allocation z conditional on x [derived by integrating out the parameters θ and p] Bayes estimator of (θ, p) r ω (z) Eπ [θ, p|x, z] . i=1 z∈Zi c Too costly: 2n terms 301 / 459
  • 12. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC approaches General Gibbs sampling for mixture models Take advantage of the missing data structure: Algorithm Initialization: choose p(0) and θ (0) arbitrarily Step t. For t = 1, . . . (t) 1 Generate zi (i = 1, . . . , n) from (j = 1, . . . , k) (t) (t−1) (t−1) (t−1) (t−1) P zi = j|pj , θj , xi ∝ pj f xi |θj 2 Generate p(t) from π(p|z(t) ), 3 Generate θ (t) from π(θ|z(t) , x). 302 / 459
  • 13. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC approaches Exponential families When f (x|θ) = h(x) exp(R(θ) · T (x) > −ψ(θ)) simulation of both p and θ usually straightforward: Conjugate prior on θj given byBack to definition πj (θ) ∝ exp(R(θ) · αj − βj ψ(θ)) , where αj ∈ Rk and βj > 0 are hyperparameters and p ∼ D (γ1 , . . . , γk ) [Dirichlet distribution] 303 / 459
  • 14. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC approaches Gibbs sampling for exponential family mixtures Algorithm Initialization. Choose p(0) and θ (0) , Step t. For t = 1, . . . (t) 1 Generate zi (i = 1, . . . , n, j = 1, . . . , k) from (t) (t−1) (t−1) (t−1) (t−1) P zi = j|pj , θj , xi ∝ pj f xi |θj (t) n (t) n 2 Compute nj = i=1 Iz (t) =j , sj = i=1 Iz (t) =j t(xi ) i i 3 Generate p(t) from D (γ1 + n1 , . . . , γk + nk ), (t) 4 Generate θj (j = 1, . . . , k) from (t) π(θj |z(t) , x) ∝ exp R(θj ) · (α + sj ) − ψ(θj )(nj + β) . 304 / 459
  • 15. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC approaches Normal mean example For mixture of two normal distributions with unknown means, pN (µ, τ 2 ) + (1 − p)N (θ, σ 2 ) , and a normal prior N (δ, 1/λ) on µ1 and µ2 , 305 / 459
  • 16. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC approaches Normal mean example (cont’d) Algorithm (0) (0) Initialization. Choose µ1 and µ2 , Step t. For t = 1, . . . (t) 1 Generate zi (i = 1, . . . , n) from 1 2 (t) (t) (t−1) P zi = 1 = 1−P zi = 2 ∝ p exp − xi − µ1 2 n n (t) 2 Compute nj = Iz(t) =j and (sx )(t) = j Iz(t) =j xi i i i=1 i=1 (t) λδ + (sx )(t) j 1 3 Generate µj (j = 1, 2) from N (t) , (t) . λ + nj λ + nj 306 / 459
  • 17. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC approaches Normal mean example (cont’d) 4 4 3 3 2 2 µ2 µ2 1 1 0 0 −1 −1 −1 0 1 2 3 4 −1 0 1 2 3 4 µ1 µ1 (a) initialised at random (b) initialised close to the lower mode 307 / 459
  • 18. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC approaches License Consider k = 3 components, a D3 (1/2, 1/2, 1/2) prior for the weights, a N (x, σ 2 /3) prior on the means µi and a G a(10, σ 2 ) ˆ ˆ −2 prior on the precisions σi , where x and σ 2 are the empirical mean ˆ and variance of License [Empirical Bayes] 0.30 0.25 0.20 0.15 0.10 0.05 0.00 −4 −2 0 2 4 308 / 459
  • 19. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC approaches Metropolis–Hastings alternative For the Gibbs sampler, completion of z increases the dimension of the simulation space and reduces the mobility of the parameter chain. Metropolis–Hastings algorithm available since posterior available in closed form, as long as q provides a correct exploration of the posterior surface, since π(θ ′ , p′ |x) q(θ, p|θ ′ , p′ ) ∧1 π(θ, p|x) q(θ ′ , p′ |θ, p) computable in O(kn) time 309 / 459
  • 20. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC approaches Random walk Metropolis–Hastings Proposal distribution for the new value (t−1) θj = θj + uj where uj ∼ N (0, ζ 2 ) 4 In mean mixture case, Gaussian random 3 walk proposal is 2 (t−1) µ2 µ1 ∼ N µ1 , ζ2 and 1 0 (t−1) µ2 ∼ N µ2 , ζ2 −1 −1 0 1 2 3 4 µ1 310 / 459
  • 21. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC approaches Random walk Metropolis–Hastings for means Algorithm Initialization: (0) (0) Choose µ1 and µ2 Iteration t (t ≥ 1): (t−1) 1 Generate µ1 from N µ1 , ζ2 , (t−1) 2 Generate µ2 from N µ2 , ζ2 , 3 Compute (t−1) (t−1) r = π (µ1 , µ2 |x) π µ1 , µ2 |x (t) (t) 4 Generate u ∼ U[0,1] : if u < r, then µ1 , µ2 = (µ1 , µ2 ) (t) (t) (t−1) (t−1) else µ1 , µ2 = µ1 , µ2 . 311 / 459
  • 22. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC approaches Random walk extensions Difficulties with constrained parameters, like p such that k i=1 pk = 1. Resolution by overparameterisation k pj = wj wl , wj > 0 , l=1 and proposed move on the wj ’s (t−1) log(wj ) = log(wj ) + uj where uj ∼ N (0, ζ 2 ) Watch out for the Jacobian in the log transform 312 / 459
  • 23. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models Label switching Identifiability A mixture model is invariant under permutations of the indices of the components. E.g., mixtures 0.3N (0, 1) + 0.7N (2.3, 1) and 0.7N (2.3, 1) + 0.3N (0, 1) are exactly the same! c The component parameters θi are not identifiable marginally since they are exchangeable 313 / 459
  • 24. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models Label switching Connected difficulties 1 Number of modes of the likelihood of order O(k!): c Maximization and even [MCMC] exploration of the posterior surface harder 2 Under exchangeable priors on (θ, p) [prior invariant under permutation of the indices], all posterior marginals are identical: c Posterior expectation of θ1 equal to posterior expectation of θ2 . 314 / 459
  • 25. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models Label switching License Since Gibbs output does not produce exchangeability, the Gibbs sampler has not explored the whole parameter space: it lacks energy to switch simultaneously enough component allocations at once 0.2 0.3 0.4 0.5 3 2 1 µi pi −1 0 0 100 200 300 400 500 −1 0 1 2 3 n µi 0.4 0.6 0.8 1.0 0.2 0.3 0.4 0.5 σi pi 0 100 200 300 400 500 0.2 0.3 0.4 0.5 n pi 0.4 0.6 0.8 1.0 −1 0 1 2 3 σi pi 0 100 200 300 400 500 0.4 0.6 0.8 1.0 n σi 315 / 459
  • 26. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models Label switching Label switching paradox We should observe the exchangeability of the components [label switching] to conclude about convergence of the Gibbs sampler. If we observe it, then we do not know how to estimate the parameters. If we do not, then we are uncertain about the convergence!!! 316 / 459
  • 27. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models Label switching Constraints Usual reply to lack of identifiability: impose constraints like µ1 ≤ . . . ≤ µk in the prior Mostly incompatible with the topology of the posterior surface: posterior expectations then depend on the choice of the constraints. Computational detail The constraint does not need to be imposed during the simulation but can instead be imposed after simulation, by reordering the MCMC output according to the constraint. This avoids possible negative effects on convergence. 317 / 459
  • 28. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models Label switching Relabeling towards the mode Selection of one of the k! modal regions of the posterior once simulation is over, by computing the approximate MAP ∗ (θ, p)(i ) with i∗ = arg max π (θ, p)(i) |x i=1,...,M Pivotal Reordering At iteration i ∈ {1, . . . , M }, 1 Compute the optimal permutation ∗ τi = arg min d τ (θ (i) , p(i) ), (θ (i ) , p(i ) ) ∗ τ ∈Sk where d(·, ·) distance in the parameter space. 2 Set (θ (i) , p(i) ) = τi ((θ (i) , p(i) )). 318 / 459
  • 29. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models Label switching Re-ban on improper priors Difficult to use improper priors in the setting of mixtures because independent improper priors, k π (θ) = πi (θi ) , with πi (θi )dθi = ∞ i=1 end up, for all n’s, with the property π(θ, p|x)dθdp = ∞ . Reason There are (k − 1)n terms among the k n terms in the expansion that allocate no observation at all to the i-th component. 319 / 459
  • 30. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models Label switching Tempering Facilitate exploration of π by flattening the target: simulate from πα (x) ∝ π(x)α for α > 0 large enough Determine where the modal regions of π are (possibly with parallel versions using different α’s) Recycle simulations from π(x)α into simulations from π by importance sampling Simple modification of the Metropolis–Hastings algorithm, with new acceptance α π(θ ′ , p′ |x) q(θ, p|θ ′ , p′ ) ∧1 π(θ, p|x) q(θ ′ , p′ |θ, p) 320 / 459
  • 31. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models Label switching Tempering with the mean mixture 1 0.5 0.2 4 4 4 3 3 3 2 2 2 1 1 1 0 0 0 −1 −1 −1 −1 0 1 2 3 4 −1 0 1 2 3 4 −1 0 1 2 3 4 321 / 459
  • 32. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC for variable dimension models MCMC for variable dimension models One of the things we do not know is the number of things we do not know —P. Green, 1996— Example the number of components in a mixture the number of covariates in a regression model the number of different capture probabilities in a capture-recapture model the number of lags in a time-series model 322 / 459
  • 33. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC for variable dimension models Variable dimension models Variable dimension model defined as a collection of models (k = 1. . . . , K), Mk = {f (·|θk ); θk ∈ Θk } , associated with a collection of priors on the parameters of these models, πk (θk ) , and a prior distribution on the indices of these models, {̺(k) , k = 1, . . . , K} . Global notation: π(Mk , θk ) = ̺(k) πk (θk ) 323 / 459
  • 34. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC for variable dimension models Bayesian inference for variable dimension models Two perspectives: 1 consider the variable dimension model as a whole and estimate quantities meaningful for the whole like predictives Pr(Mk |x1 , . . . , xn ) fk (x|θk )dx πk (θk |x1 , . . . , xn )dθ . k & quantities only meaningful for submodels (like moments of θk ), computed from πk (θk |x1 , . . . , xn ). [Usual setup] 2 resort to testing by choosing the best submodel via Z pi fi (x|θi )πi (θi )dθi Θi p(Mi |x) = X Z pj fj (x|θj )πj (θj )dθj j Θj 324 / 459
  • 35. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC for variable dimension models Green’s reversible jumps Computational burden in exploring [possibly infinite] complex parameter space: Green’s method set up a proper measure–theoretic framework for designing moves between models/spaces Mk /Θk of varying dimensions [no one-to-one correspondence] Create a reversible kernel K on H = k {k} × Θk such that K(x, dy)π(x)dx = K(y, dx)π(y)dy A B B A for the invariant density π [x is of the form (k, θ(k) )] and for all sets A, B [un-detailed balance] 325 / 459
  • 36. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC for variable dimension models Green’s reversible kernel Since Markov kernel K necessarily of the form [either stay at the same value or move to one of the states] ∞ K(x, B) = ρm (x, y)qm (x, dy) + ω(x)IB (x) m=1 where qm (x, dy) transition measure to model Mm and ρm (x, y) corresponding acceptance probability, only need to consider proposals between two models, M1 and M2 , say. 326 / 459
  • 37. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC for variable dimension models Green’s reversibility constraint If transition kernels between those models are K1→2 (θ1 , dθ) and K2→1 (θ2 , dθ), formal use of the detailed balance condition π(dθ1 ) K1→2 (θ1 , dθ) = π(dθ2 ) K2→1 (θ2 , dθ) , To preserve stationarity, necessary symmetry between moves/proposals from M1 to M2 and from M2 to M1 327 / 459
  • 38. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC for variable dimension models Two-model transitions How to move from model M1 to M2 , with Markov chain being in state θ1 ∈ M1 [i.e. k = 1]? Most often M1 and M2 are of different dimensions, e.g. dim(M2 ) > dim(M1 ). In that case, need to supplement both spaces Θk1 and Θk2 with adequate artificial spaces to create a one-to-one mapping between them, most often by augmenting the space of the smaller model. 328 / 459
  • 39. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC for variable dimension models Two-model completions E.g., move from θ2 ∈ Θ2 to Θ1 chosen to be a deterministic transform of θ2 θ1 = Ψ2→1 (θ2 ) , Reverse proposal expressed as θ2 = Ψ1→2 (θ1 , v1→2 ) where v1→2 r.v. of dimension dim(M2 ) − dim(M1 ), generated as v1→2 ∼ ϕ1→2 (v1→2 ) . 329 / 459
  • 40. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC for variable dimension models Two-model acceptance probability In this case, θ2 has density [under stationarity] −1 ∂Ψ1→2 (θ1 , v1→2 ) q1→2 (θ2 ) = π1 (θ1 ) ϕ1→2 (v1→2 ) , ∂(θ1 , v1→2 ) by the Jacobian rule. To make it π2 (θ2 ) we thus need to accept this value with probability π(M2 , θ2 ) ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1 ∧ . π(M1 , θ1 ) ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) This is restricted to the case when only moves between M1 and M2 are considered 330 / 459
  • 41. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC for variable dimension models Interpretation The representation puts us back in a fixed dimension setting: M1 × V1→2 and M2 in one-to-one relation. reversibility imposes that θ1 is derived as (θ1 , v1→2 ) = Ψ−1 (θ2 ) 1→2 appears like a regular Metropolis–Hastings move from the couple (θ1 , v1→2 ) to θ2 when stationary distributions are π(M1 , θ1 ) × ϕ1→2 (v1→2 ) and π(M2 , θ2 ), and when proposal distribution is deterministic (??) 331 / 459
  • 42. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC for variable dimension models Pseudo-deterministic reasoning Consider the proposals θ2 ∼ N (Ψ1→2 (θ1 , v1→2 ), ε) and Ψ1→2 (θ1 , v1→2 ) ∼ N (θ2 , ε) Reciprocal proposal has density exp −(θ2 − Ψ1→2 (θ1 , v1→2 ))2 /2ε ∂Ψ1→2 (θ1 , v1→2 ) √ × 2πε ∂(θ1 , v1→2 ) by the Jacobian rule. Thus Metropolis–Hastings acceptance probability is π(M2 , θ2 ) ∂Ψ1→2 (θ1 , v1→2 ) 1∧ π(M1 , θ1 ) ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) Does not depend on ε: Let ε go to 0 332 / 459
  • 43. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC for variable dimension models Generic reversible jump acceptance probability If several models are considered simultaneously, with probability ̟1→2 of choosing move to M2 while in M1 , as in XZ ∞ K(x, B) = ρm (x, y)qm (x, dy) + ω(x)IB (x) m=1 acceptance probability of θ2 = Ψ1→2 (θ1 , v1→2 ) is π(M2 , θ2 ) ̟2→1 ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1 ∧ π(M1 , θ1 ) ̟1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) while acceptance probability of θ1 with (θ1 , v1→2 ) = Ψ−1 (θ2 ) is 1→2 −1 π(M1 , θ1 ) ̟1→2 ϕ1→2 (v1→2 ) ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1 ∧ π(M2 , θ2 ) ̟2→1 ∂(θ1 , v1→2 ) 333 / 459
  • 44. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC for variable dimension models Green’s sampler Algorithm Iteration t (t ≥ 1): if x(t) = (m, θ(m) ), 1 Select model Mn with probability πmn 2 Generate umn ∼ ϕmn (u) and set (θ(n) , vnm ) = Ψm→n (θ(m) , umn ) 3 Take x(t+1) = (n, θ(n) ) with probability π(n, θ(n) ) πnm ϕnm (vnm ) ∂Ψm→n (θ(m) , umn ) min ,1 π(m, θ(m) ) πmn ϕmn (umn ) ∂(θ(m) , umn ) and take x(t+1) = x(t) otherwise. 334 / 459
  • 45. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC for variable dimension models Mixture of normal distributions    k  2 Mk = (pjk , µjk , σjk ); pjk N (µjk , σjk )   j=1 Restrict moves from Mk to adjacent models, like Mk+1 and Mk−1 , with probabilities πk(k+1) and πk(k−1) . 335 / 459
  • 46. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC for variable dimension models Mixture birth Take Ψk→k+1 as a birth step: i.e. add a new normal component in the mixture, by generating the parameters of the new component from the prior distribution (µk+1 , σk+1 ) ∼ π(µ, σ) and pk+1 ∼ Be(a1 , a2 + . . . + ak ) if (p1 , . . . , pk ) ∼ Mk (a1 , . . . , ak ) Jacobian is (1 − pk+1 )k−1 Death step then derived from the reversibility constraint by removing one of the k components at random. 336 / 459
  • 47. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC for variable dimension models Mixture acceptance probability Birth acceptance probability π(k+1)k (k + 1)! π(k + 1, θk+1 ) min ,1 πk(k+1) (k + 1)k! π(k, θk ) (k + 1)ϕk(k+1) (uk(k+1) ) π(k+1)k ̺(k + 1) ℓk+1 (θk+1 ) (1 − pk+1 )k−1 = min ,1 , πk(k+1) ̺(k) ℓk (θk ) where ℓk likelihood of the k component mixture model Mk and ̺(k) prior probability of model Mk . Combinatorial terms: there are (k + 1)! ways of defining a (k + 1) component mixture by adding one component, while, given a (k + 1) component mixture, there are (k + 1) choices for a component to die and then k! associated mixtures for the remaining components. 337 / 459
  • 48. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC for variable dimension models License 0.30 4 14 12 2 0.25 µi k 10 0 0.20 8 −2 0e+00 2e+04 4e+04 6e+04 8e+04 1e+05 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 0.15 iterations iterations 3.0 0.30 0.10 2.5 2.0 0.20 1.5 σi pi 0.05 1.0 0.10 0.5 0.00 0.00 0.0 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 iterations iterations 338 / 459
  • 49. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC for variable dimension models More coordinated moves Use of local moves that preserve structure of the original model. Split move from Mk to Mk+1 : replaces a random component, say the jth, with two new components, say the jth and the (j + 1)th, that are centered at the earlier jth component. And opposite merge move obtained by joining two components together. 339 / 459
  • 50. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC for variable dimension models Splitting with moment preservation Split parameters for instance created under a moment preservation condition: pjk = pj(k+1) + p(j+1)(k+1) , pjk µjk = pj(k+1) µj(k+1) + p(j+1)(k+1) µ(j+1)(k+1) , 2 2 2 pjk σjk = pj(k+1) σj(k+1) + p(j+1)(k+1) σ(j+1)(k+1) . M4 Opposite merge move M5 obtained by reversibility constraint M3 340 / 459
  • 51. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC for variable dimension models Splitting details Generate the auxiliary variable uk(k+1) as u1 , u3 ∼ U(0, 1), u2 ∼ N (0, τ 2 ) and take pj(k+1) = u1 pjk , p(j+1)(k+1) = (1 − u1 )pjk , p u2 µj(k+1) = µjk + u2 , µ(j+1)(k+1) = µjk − pjkj(k+1) −p , j)(k+1) 2 2 pjk −pj(k+1) u3 2 σj(k+1) = u3 σjk , σ(j+1)(k+1) = pjk −pj(k+1) σjk . 341 / 459
  • 52. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC for variable dimension models Jacobian Corresponding Jacobian 0 1 u1 1 − u1 ··· ··· ··· ··· Bpjk −pjk ··· ··· ··· ··· C B C B 0 0 1 1 ··· ··· C B C pjk det B 0 C= 2 −pj(k+1) B 0 1 pjk −pj(k+1) ··· ··· C σjk B pjk −pj(k+1) u3 C (1 − u1 )2 B 0 0 0 0 u3 C @ pjk −pj(k+1) A 2 −pj(k+1) 0 0 0 0 σjk σ2 pjk −pj(k+1) jk 342 / 459
  • 53. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Mixture models MCMC for variable dimension models Acceptance probability Corresponding split acceptance probability π(k+1)k ̺(k + 1) πk+1 (θk+1 )ℓk+1 (θk+1 ) pjk min σ2 , 1 πk(k+1) ̺(k) πk (θk )ℓk (θk ) (1 − u1 )2 jk where π(k+1)k and πk(k+1) denote split and merge probabilities when in models Mk and Mk+1 Factorial terms vanish: for a split move there are k possible choices of the split component and then (k + 1)! possible orderings of the θk+1 vector while, for a merge, there are (k + 1)k possible choices for the components to be merged and then k! ways of ordering the resulting θk . 343 / 459