SlideShare une entreprise Scribd logo
1  sur  201
Télécharger pour lire hors ligne
On some computational methods for Bayesian model choice




             On some computational methods for Bayesian
                          model choice

                                            Christian P. Robert

                               CREST-INSEE and Universit´ Paris Dauphine
                                                        e
                               http://www.ceremade.dauphine.fr/~xian


                       YES III workshop, Eindhoven,
                              October 7, 2009
          Joint works with J.-M. Marin, M. Beaumont, N. Chopin,
                       J.-M. Cornuet, and D. Wraith
On some computational methods for Bayesian model choice




Outline


      1    Introduction

      2    Importance sampling solutions

      3    Cross-model solutions

      4    Nested sampling

      5    ABC model choice
On some computational methods for Bayesian model choice
  Introduction
     Bayes tests



Construction of Bayes tests

      Definition (Test)
      Given an hypothesis H0 : θ ∈ Θ0 on the parameter θ ∈ Θ0 of a
      statistical model, a test is a statistical procedure that takes its
      values in {0, 1}.

      Theorem (Bayes test)
      The Bayes estimator associated with π and with the 0 − 1 loss is

                                         1    if π(θ ∈ Θ0 |x) > π(θ ∈ Θ0 |x),
                        δ π (x) =
                                         0    otherwise,
On some computational methods for Bayesian model choice
  Introduction
     Bayes tests



Construction of Bayes tests

      Definition (Test)
      Given an hypothesis H0 : θ ∈ Θ0 on the parameter θ ∈ Θ0 of a
      statistical model, a test is a statistical procedure that takes its
      values in {0, 1}.

      Theorem (Bayes test)
      The Bayes estimator associated with π and with the 0 − 1 loss is

                                         1    if π(θ ∈ Θ0 |x) > π(θ ∈ Θ0 |x),
                        δ π (x) =
                                         0    otherwise,
On some computational methods for Bayesian model choice
  Introduction
     Bayes factor



Bayes factor

      Definition (Bayes factors)
      For testing hypotheses H0 : θ ∈ Θ0 vs. Ha : θ ∈ Θ0 , under prior

                                      π(Θ0 )π0 (θ) + π(Θc )π1 (θ) ,
                                                        0

      central quantity

                                                                   f (x|θ)π0 (θ)dθ
                               π(Θ0 |x)            π(Θ0 )     Θ0
                      B01    =                            =
                               π(Θc |x)
                                  0                π(Θc )
                                                      0            f (x|θ)π1 (θ)dθ
                                                              Θc
                                                               0


                                                                            [Jeffreys, 1939]
On some computational methods for Bayesian model choice
  Introduction
     Bayes factor



Self-contained concept

      Outside decision-theoretic environment:
                 eliminates impact of π(Θ0 ) but depends on the choice of
                 (π0 , π1 )
                 Bayesian/marginal equivalent to the likelihood ratio
                 Jeffreys’ scale of evidence:
                                  π
                     if   log10 (B10 )   between 0 and 0.5, evidence against H0 weak,
                                  π
                     if   log10 (B10 )   0.5 and 1, evidence substantial,
                                  π
                     if   log10 (B10 )   1 and 2, evidence strong and
                                  π
                     if   log10 (B10 )   above 2, evidence decisive
                 Requires the computation of the marginal/evidence under
                 both hypotheses/models
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Model choice and model comparison



      Choice between models
      Several models available for the same observation

                                    Mi : x ∼ fi (x|θi ),   i∈I

      where I can be finite or infinite
      Replace hypotheses with models but keep marginal likelihoods and
      Bayes factors
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Model choice and model comparison



      Choice between models
      Several models available for the same observation

                                    Mi : x ∼ fi (x|θi ),   i∈I

      where I can be finite or infinite
      Replace hypotheses with models but keep marginal likelihoods and
      Bayes factors
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Bayesian model choice
      Probabilise the entire model/parameter space
          allocate probabilities pi to all models Mi
          define priors πi (θi ) for each parameter space Θi
          compute

                                                        pi          fi (x|θi )πi (θi )dθi
                                                               Θi
                               π(Mi |x) =
                                                          pj         fj (x|θj )πj (θj )dθj
                                                    j           Θj


                 take largest π(Mi |x) to determine “best” model,
                 or use averaged predictive

                                        π(Mj |x)               fj (x |θj )πj (θj |x)dθj
                                    j                     Θj
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Bayesian model choice
      Probabilise the entire model/parameter space
          allocate probabilities pi to all models Mi
          define priors πi (θi ) for each parameter space Θi
          compute

                                                        pi          fi (x|θi )πi (θi )dθi
                                                               Θi
                               π(Mi |x) =
                                                          pj         fj (x|θj )πj (θj )dθj
                                                    j           Θj


                 take largest π(Mi |x) to determine “best” model,
                 or use averaged predictive

                                        π(Mj |x)               fj (x |θj )πj (θj |x)dθj
                                    j                     Θj
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Bayesian model choice
      Probabilise the entire model/parameter space
          allocate probabilities pi to all models Mi
          define priors πi (θi ) for each parameter space Θi
          compute

                                                        pi          fi (x|θi )πi (θi )dθi
                                                               Θi
                               π(Mi |x) =
                                                          pj         fj (x|θj )πj (θj )dθj
                                                    j           Θj


                 take largest π(Mi |x) to determine “best” model,
                 or use averaged predictive

                                        π(Mj |x)               fj (x |θj )πj (θj |x)dθj
                                    j                     Θj
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Bayesian model choice
      Probabilise the entire model/parameter space
          allocate probabilities pi to all models Mi
          define priors πi (θi ) for each parameter space Θi
          compute

                                                        pi          fi (x|θi )πi (θi )dθi
                                                               Θi
                               π(Mi |x) =
                                                          pj         fj (x|θj )πj (θj )dθj
                                                    j           Θj


                 take largest π(Mi |x) to determine “best” model,
                 or use averaged predictive

                                        π(Mj |x)               fj (x |θj )πj (θj |x)dθj
                                    j                     Θj
On some computational methods for Bayesian model choice
  Introduction
     Evidence



Evidence



      All these problems end up with a similar quantity, the evidence

                                     Zk =            πk (θk )Lk (θk ) dθk ,
                                                Θk

      aka the marginal likelihood.
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Bayes factor approximation

      When approximating the Bayes factor

                                                      f0 (x|θ0 )π0 (θ0 )dθ0
                                                 Θ0
                                    B01 =
                                                      f1 (x|θ1 )π1 (θ1 )dθ1
                                                 Θ1

      use of importance functions                     0   and   1   and

                                       n−1
                                        0
                                                 n0         i       i
                                                 i=1 f0 (x|θ0 )π0 (θ0 )/
                                                                                  i
                                                                              0 (θ0 )
                           B01 =
                                       n−1
                                        1
                                                 n1         i       i
                                                 i=1 f1 (x|θ1 )π1 (θ1 )/
                                                                                  i
                                                                              1 (θ1 )
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Diabetes in Pima Indian women

      Example (R benchmark)
      “A population of women who were at least 21 years old, of Pima
      Indian heritage and living near Phoenix (AZ), was tested for
      diabetes according to WHO criteria. The data were collected by
      the US National Institute of Diabetes and Digestive and Kidney
      Diseases.”
      200 Pima Indian women with observed variables
              plasma glucose concentration in oral glucose tolerance test
              diastolic blood pressure
              diabetes pedigree function
              presence/absence of diabetes
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Probit modelling on Pima Indian women


      Probability of diabetes function of above variables

                             P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
      Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a
      g-prior modelling:

                                        β ∼ N3 (0, n XT X)−1
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Probit modelling on Pima Indian women


      Probability of diabetes function of above variables

                             P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
      Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a
      g-prior modelling:

                                        β ∼ N3 (0, n XT X)−1
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



MCMC 101 for probit models

      Use of either a random walk proposal

                                                   β =β+

      in a Metropolis-Hastings algorithm (since the likelihood is
      available)
      or of a Gibbs sampler that takes advantage of the missing/latent
      variable
                     z|y, x, β ∼ N (xT β, 1) Iy × I1−y
                                              z≥0     z≤0

      (since β|y, X, z is distributed as a standard normal)
                                                  [Gibbs three times faster]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



MCMC 101 for probit models

      Use of either a random walk proposal

                                                   β =β+

      in a Metropolis-Hastings algorithm (since the likelihood is
      available)
      or of a Gibbs sampler that takes advantage of the missing/latent
      variable
                     z|y, x, β ∼ N (xT β, 1) Iy × I1−y
                                              z≥0     z≤0

      (since β|y, X, z is distributed as a standard normal)
                                                  [Gibbs three times faster]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



MCMC 101 for probit models

      Use of either a random walk proposal

                                                   β =β+

      in a Metropolis-Hastings algorithm (since the likelihood is
      available)
      or of a Gibbs sampler that takes advantage of the missing/latent
      variable
                     z|y, x, β ∼ N (xT β, 1) Iy × I1−y
                                              z≥0     z≤0

      (since β|y, X, z is distributed as a standard normal)
                                                  [Gibbs three times faster]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Diabetes in Pima Indian women
                                  ˆ         ˆ
      MCMC output for βmax = 1.5, β = 1.15, k = 40, and 20, 000
      simulations.




                                                                         500 1000 1500 2000 2500
        1.3
        1.2




                                                             Frequency
        1.1
        1.0
        0.9




                                                                         0
                               0   5000   10000   15000   20000                                         0.9    1.0       1.1        1.2        1.3
                                                                         400
        10 20 30 40 50 60 70




                                                                         300
                                                                         200
                                                                         100
                                                                         0




                               0   5000   10000   15000   20000                                    12    21   29   37   45     53   61    69
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Importance sampling for the Pima Indian dataset


      Use of the importance function inspired from the MLE estimate
      distribution
                                        ˆ ˆ
                                β ∼ N (β, Σ)

      R Importance sampling code
      model1=summary(glm(y~-1+X1,family=binomial(link="probit")))
      is1=rmvnorm(Niter,mean=model1$coeff[,1],sigma=2*model1$cov.unscaled)
      is2=rmvnorm(Niter,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)
      bfis=mean(exp(probitlpost(is1,y,X1)-dmvlnorm(is1,mean=model1$coeff[,1],
           sigma=2*model1$cov.unscaled))) / mean(exp(probitlpost(is2,y,X2)-
           dmvlnorm(is2,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)))
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Importance sampling for the Pima Indian dataset


      Use of the importance function inspired from the MLE estimate
      distribution
                                        ˆ ˆ
                                β ∼ N (β, Σ)

      R Importance sampling code
      model1=summary(glm(y~-1+X1,family=binomial(link="probit")))
      is1=rmvnorm(Niter,mean=model1$coeff[,1],sigma=2*model1$cov.unscaled)
      is2=rmvnorm(Niter,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)
      bfis=mean(exp(probitlpost(is1,y,X1)-dmvlnorm(is1,mean=model1$coeff[,1],
           sigma=2*model1$cov.unscaled))) / mean(exp(probitlpost(is2,y,X2)-
           dmvlnorm(is2,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)))
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Diabetes in Pima Indian women
      Comparison of the variation of the Bayes factor approximations
      based on 100 replicas for 20, 000 simulations for a simulation from
      the prior and the above importance sampler
             5
             4




                                                                  q
             3
             2




                                  Monte Carlo             Importance sampling
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Bridge sampling


      Special case:
      If
                                           π1 (θ1 |x) ∝ π1 (θ1 |x)
                                                        ˜
                                           π2 (θ2 |x) ∝ π2 (θ2 |x)
                                                        ˜
      live on the same space (Θ1 = Θ2 ), then
                                            n
                                       1         π1 (θi |x)
                                                 ˜
                           B12 ≈                              θi ∼ π2 (θ|x)
                                       n         π2 (θi |x)
                                                 ˜
                                           i=1

                           [Gelman & Meng, 1998; Chen, Shao & Ibrahim, 2000]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Bridge sampling variance



      The bridge sampling estimator does poorly if
                                                                            2
                              var(B12 )  1                π1 (θ) − π2 (θ)
                                  2     ≈ E
                                B12      n                     π2 (θ)

      is large, i.e. if π1 and π2 have little overlap...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Bridge sampling variance



      The bridge sampling estimator does poorly if
                                                                            2
                              var(B12 )  1                π1 (θ) − π2 (θ)
                                  2     ≈ E
                                B12      n                     π2 (θ)

      is large, i.e. if π1 and π2 have little overlap...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



(Further) bridge sampling

      In addition

                                       π2 (θ|x)α(θ)π1 (θ|x)dθ
                                       ˜
                  B12 =                                              ∀ α(·)
                                       π1 (θ|x)α(θ)π2 (θ|x)dθ
                                       ˜

                                        n1
                                  1
                                              π2 (θ1i |x)α(θ1i )
                                              ˜
                                  n1
                                        i=1
                           ≈             n2                        θji ∼ πj (θ|x)
                                  1
                                              π1 (θ2i |x)α(θ2i )
                                              ˜
                                  n2
                                        i=1
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



An infamous example

      When
                                                             1
                                         α(θ) =
                                                      π1 (θ|x)˜2 (θ|x)
                                                      ˜       π
      harmonic mean approximation to B12
                                         n1
                                   1
                                              1/˜1 (θ1i |x)
                                                π
                                   n1
                                        i=1
                          B21 =          n2                         θji ∼ πj (θ|x)
                                   1
                                              1/˜2 (θ2i |x)
                                                π
                                   n2
                                        i=1
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



The original harmonic mean estimator



      When θki ∼ πk (θ|x),
                                                     T
                                                1            1
                                                T         L(θkt |x)
                                                    t=1

      is an unbiased estimator of 1/mk (x)
                                                                      [Newton & Raftery, 1994]

      Highly dangerous: Most often leads to an infinite variance!!!
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



The original harmonic mean estimator



      When θki ∼ πk (θ|x),
                                                     T
                                                1            1
                                                T         L(θkt |x)
                                                    t=1

      is an unbiased estimator of 1/mk (x)
                                                                      [Newton & Raftery, 1994]

      Highly dangerous: Most often leads to an infinite variance!!!
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



“The Worst Monte Carlo Method Ever”

      “The good news is that the Law of Large Numbers guarantees that
      this estimator is consistent ie, it will very likely be very close to the
      correct answer if you use a sufficiently large number of points from
      the posterior distribution.
      The bad news is that the number of points required for this
      estimator to get close to the right answer will often be greater
      than the number of atoms in the observable universe. The even
      worse news is that it’s easy for people to not realize this, and to
      na¨ıvely accept estimates that are nowhere close to the correct
      value of the marginal likelihood.”
                                       [Radford Neal’s blog, Aug. 23, 2008]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



“The Worst Monte Carlo Method Ever”

      “The good news is that the Law of Large Numbers guarantees that
      this estimator is consistent ie, it will very likely be very close to the
      correct answer if you use a sufficiently large number of points from
      the posterior distribution.
      The bad news is that the number of points required for this
      estimator to get close to the right answer will often be greater
      than the number of atoms in the observable universe. The even
      worse news is that it’s easy for people to not realize this, and to
      na¨ıvely accept estimates that are nowhere close to the correct
      value of the marginal likelihood.”
                                       [Radford Neal’s blog, Aug. 23, 2008]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Optimal bridge sampling

      The optimal choice of auxiliary function is
                                                       n1 + n2
                                    α =
                                              n1 π1 (θ|x) + n2 π2 (θ|x)

      leading to
                                             n1
                                       1                    π2 (θ1i |x)
                                                             ˜
                                       n1         n1 π1 (θ1i |x) + n2 π2 (θ1i |x)
                                            i=1
                           B12 ≈             n2
                                       1                    π1 (θ2i |x)
                                                             ˜
                                       n2         n1 π1 (θ2i |x) + n2 π2 (θ2i |x)
                                            i=1

                                                                                    Back later!
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Optimal bridge sampling (2)



      Reason:

       Var(B12 )    1                      π1 (θ)π2 (θ)[n1 π1 (θ) + n2 π2 (θ)]α(θ)2 dθ
           2     ≈                                                              2        −1
         B12       n1 n2                                  π1 (θ)π2 (θ)α(θ) dθ

      (by the δ method)
      Drag: Dependence on the unknown normalising constants solved
      iteratively
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Optimal bridge sampling (2)



      Reason:

       Var(B12 )    1                      π1 (θ)π2 (θ)[n1 π1 (θ) + n2 π2 (θ)]α(θ)2 dθ
           2     ≈                                                              2        −1
         B12       n1 n2                                  π1 (θ)π2 (θ)α(θ) dθ

      (by the δ method)
      Drag: Dependence on the unknown normalising constants solved
      iteratively
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Extension to varying dimensions


      When π1 and π2 work on different dimensions, for instance
      θ2 = (θ1 , ψ), introduction of a pseudo-posterior density,
      ω(ψ|θ1 , x), to augment π1 (θ1 |x) into joint distribution
      π1 (θ1 |x) × ω(ψ|θ1 , x) on θ2 so that

                                  π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ
                                  ˜
               B12 =                                                                   ,
                                  π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ
                                  ˜
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Extension to varying dimensions


      When π1 and π2 work on different dimensions, for instance
      θ2 = (θ1 , ψ), introduction of a pseudo-posterior density,
      ω(ψ|θ1 , x), to augment π1 (θ1 |x) into joint distribution
      π1 (θ1 |x) × ω(ψ|θ1 , x) on θ2 so that

                                  π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ
                                  ˜
               B12 =                                                                   ,
                                  π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ
                                  ˜
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Illustration for the Pima Indian dataset

      Use of the MLE induced conditional of β3 given (β1 , β2 ) as a
      pseudo-posterior and mixture of both MLE approximations on β3
      in bridge sampling estimate
      R Bridge sampling code
      cova=model2$cov.unscaled
      expecta=model2$coeff[,1]
      covw=cova[3,3]-t(cova[1:2,3])%*%ginv(cova[1:2,1:2])%*%cova[1:2,3]

      probit1=hmprobit(Niter,y,X1)
      probit2=hmprobit(Niter,y,X2)
      pseudo=rnorm(Niter,meanw(probit1),sqrt(covw))
      probit1p=cbind(probit1,pseudo)

      bfbs=mean(exp(probitlpost(probit2[,1:2],y,X1)+dnorm(probit2[,3],meanw(probit2[,1:2]),
           sqrt(covw),log=T))/ (dmvnorm(probit2,expecta,cova)+dnorm(probit2[,3],expecta[3],
           cova[3,3])))/ mean(exp(probitlpost(probit1p,y,X2))/(dmvnorm(probit1p,expecta,cova)+dnorm(pseudo,
           expecta[3],cova[3,3])))
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Illustration for the Pima Indian dataset

      Use of the MLE induced conditional of β3 given (β1 , β2 ) as a
      pseudo-posterior and mixture of both MLE approximations on β3
      in bridge sampling estimate
      R Bridge sampling code
      cova=model2$cov.unscaled
      expecta=model2$coeff[,1]
      covw=cova[3,3]-t(cova[1:2,3])%*%ginv(cova[1:2,1:2])%*%cova[1:2,3]

      probit1=hmprobit(Niter,y,X1)
      probit2=hmprobit(Niter,y,X2)
      pseudo=rnorm(Niter,meanw(probit1),sqrt(covw))
      probit1p=cbind(probit1,pseudo)

      bfbs=mean(exp(probitlpost(probit2[,1:2],y,X1)+dnorm(probit2[,3],meanw(probit2[,1:2]),
           sqrt(covw),log=T))/ (dmvnorm(probit2,expecta,cova)+dnorm(probit2[,3],expecta[3],
           cova[3,3])))/ mean(exp(probitlpost(probit1p,y,X2))/(dmvnorm(probit1p,expecta,cova)+dnorm(pseudo,
           expecta[3],cova[3,3])))
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Diabetes in Pima Indian women (cont’d)
      Comparison of the variation of the Bayes factor approximations
      based on 100 replicas for 20, 000 simulations for a simulation from
      the prior, the above bridge sampler and the above importance
      sampler
             5
             4




                                                            q


                                                                   q
             3




                                                            q
             2




                                  MC                      Bridge   IS
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Ratio importance sampling


      Another identity:
                                                    Eϕ [˜1 (θ)/ϕ(θ)]
                                                        π
                                         B12 =
                                                    Eϕ [˜2 (θ)/ϕ(θ)]
                                                        π
      for any density ϕ with sufficiently large support
                                                    [Torrie & Valleau, 1977]
      Use of a single sample θ1 , . . . , θn from ϕ

                                                      i=1 π1 (θi )/ϕ(θi )
                                                          ˜
                                       B12 =
                                                      i=1 π2 (θi )/ϕ(θi )
                                                          ˜
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Ratio importance sampling


      Another identity:
                                                    Eϕ [˜1 (θ)/ϕ(θ)]
                                                        π
                                         B12 =
                                                    Eϕ [˜2 (θ)/ϕ(θ)]
                                                        π
      for any density ϕ with sufficiently large support
                                                    [Torrie & Valleau, 1977]
      Use of a single sample θ1 , . . . , θn from ϕ

                                                      i=1 π1 (θi )/ϕ(θi )
                                                          ˜
                                       B12 =
                                                      i=1 π2 (θi )/ϕ(θi )
                                                          ˜
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Ratio importance sampling (2)


      Approximate variance:
                                                                               2
                          var(B12 )  1                    (π1 (θ) − π2 (θ))2
                              2     ≈ Eϕ
                            B12      n                          ϕ(θ)2

      Optimal choice:

                                                     | π1 (θ) − π2 (θ) |
                                  ϕ∗ (θ) =
                                                    | π1 (η) − π2 (η) | dη

                                                             [Chen, Shao & Ibrahim, 2000]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Ratio importance sampling (2)


      Approximate variance:
                                                                               2
                          var(B12 )  1                    (π1 (θ) − π2 (θ))2
                              2     ≈ Eϕ
                            B12      n                          ϕ(θ)2

      Optimal choice:

                                                     | π1 (θ) − π2 (θ) |
                                  ϕ∗ (θ) =
                                                    | π1 (η) − π2 (η) | dη

                                                             [Chen, Shao & Ibrahim, 2000]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Improving upon bridge sampler

      Theorem 5.5.3: The asymptotic variance of the optimal ratio
      importance sampling estimator is smaller than the asymptotic
      variance of the optimal bridge sampling estimator
                                          [Chen, Shao, & Ibrahim, 2000]
      Does not require the normalising constant

                                              | π1 (η) − π2 (η) | dη

      but a simulation from

                                       ϕ∗ (θ) ∝| π1 (θ) − π2 (θ) | .
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Improving upon bridge sampler

      Theorem 5.5.3: The asymptotic variance of the optimal ratio
      importance sampling estimator is smaller than the asymptotic
      variance of the optimal bridge sampling estimator
                                          [Chen, Shao, & Ibrahim, 2000]
      Does not require the normalising constant

                                              | π1 (η) − π2 (η) | dη

      but a simulation from

                                       ϕ∗ (θ) ∝| π1 (θ) − π2 (θ) | .
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Varying dimensions



Generalisation to point null situations

      When
                                                               π1 (θ1 )dθ1
                                                               ˜
                                                          Θ1
                                          B12 =
                                                               π2 (θ2 )dθ2
                                                               ˜
                                                          Θ2

      and Θ2 = Θ1 × Ψ, we get θ2 = (θ1 , ψ) and

                                                          π1 (θ1 )ω(ψ|θ1 )
                                                          ˜
                                     B12 = Eπ2
                                                             π2 (θ1 , ψ)
                                                             ˜

        holds for any conditional density ω(ψ|θ1 ).
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Varying dimensions



X-dimen’al bridge sampling

      Generalisation of the previous identity:
      For any α,
                             Eπ [˜1 (θ1 )ω(ψ|θ1 )α(θ1 , ψ)]
                                  π
                      B12 = 2
                               Eπ1 ×ω [˜2 (θ1 , ψ)α(θ1 , ψ)]
                                       π
        and, for any density ϕ,

                                          Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)]
                                              π
                                  B12 =
                                            Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)]
                                                 π

                                           [Chen, Shao, & Ibrahim, 2000]
      Optimal choice: ω(ψ|θ1 ) = π2 (ψ|θ1 )
                                                         [Theorem 5.8.2]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Varying dimensions



X-dimen’al bridge sampling

      Generalisation of the previous identity:
      For any α,
                             Eπ [˜1 (θ1 )ω(ψ|θ1 )α(θ1 , ψ)]
                                  π
                      B12 = 2
                               Eπ1 ×ω [˜2 (θ1 , ψ)α(θ1 , ψ)]
                                       π
        and, for any density ϕ,

                                          Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)]
                                              π
                                  B12 =
                                            Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)]
                                                 π

                                           [Chen, Shao, & Ibrahim, 2000]
      Optimal choice: ω(ψ|θ1 ) = π2 (ψ|θ1 )
                                                         [Theorem 5.8.2]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Approximating Zk from a posterior sample


      Use of the [harmonic mean] identity

                    ϕ(θk )                               ϕ(θk )       πk (θk )Lk (θk )       1
      Eπk                        x =                                                   dθk =
                πk (θk )Lk (θk )                     πk (θk )Lk (θk )       Zk               Zk

      no matter what the proposal ϕ(·) is.
                          [Gelfand & Dey, 1994; Bartolucci et al., 2006]
      Direct exploitation of the MCMC output
                                                                                         RB-RJ
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Approximating Zk from a posterior sample


      Use of the [harmonic mean] identity

                    ϕ(θk )                               ϕ(θk )       πk (θk )Lk (θk )       1
      Eπk                        x =                                                   dθk =
                πk (θk )Lk (θk )                     πk (θk )Lk (θk )       Zk               Zk

      no matter what the proposal ϕ(·) is.
                          [Gelfand & Dey, 1994; Bartolucci et al., 2006]
      Direct exploitation of the MCMC output
                                                                                         RB-RJ
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Comparison with regular importance sampling


      Harmonic mean: Constraint opposed to usual importance sampling
      constraints: ϕ(θ) must have lighter (rather than fatter) tails than
      πk (θk )Lk (θk ) for the approximation
                                                          T               (t)
                                                  1                 ϕ(θk )
                                  Z1k = 1                           (t)         (t)
                                                  T             πk (θk )Lk (θk )
                                                          t=1

      to have a finite variance.
      E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Comparison with regular importance sampling


      Harmonic mean: Constraint opposed to usual importance sampling
      constraints: ϕ(θ) must have lighter (rather than fatter) tails than
      πk (θk )Lk (θk ) for the approximation
                                                          T               (t)
                                                  1                 ϕ(θk )
                                  Z1k = 1                           (t)         (t)
                                                  T             πk (θk )Lk (θk )
                                                          t=1

      to have a finite variance.
      E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Comparison with regular importance sampling (cont’d)



      Compare Z1k with a standard importance sampling approximation
                                                     T        (t)         (t)
                                            1             πk (θk )Lk (θk )
                                    Z2k   =                         (t)
                                            T                 ϕ(θk )
                                                   t=1

                         (t)
      where the θk ’s are generated from the density ϕ(·) (with fatter
      tails like t’s)
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



HPD indicator as ϕ
      Use the convex hull of MCMC simulations corresponding to the
      10% HPD region (easily derived!) and ϕ as indicator:
                                  1
                           ϕ(θ) =       I   (t)
                                  T t d(θ,θ )≤
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



HPD indicator as ϕ
      Use the convex hull of MCMC simulations corresponding to the
      10% HPD region (easily derived!) and ϕ as indicator:
                                  1
                           ϕ(θ) =       I   (t)
                                  T t d(θ,θ )≤
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Diabetes in Pima Indian women (cont’d)
      Comparison of the variation of the Bayes factor approximations
      based on 100 replicas for 20, 000 simulations for a simulation from
      the above harmonic mean sampler and importance samplers
             3.102 3.104 3.106 3.108 3.110 3.112 3.114 3.116




                                                                     q




                                                                                       q




                                                               Harmonic mean   Importance sampling
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Mixtures to bridge



Approximating Zk using a mixture representation



                                                                              Bridge sampling redux

      Design a specific mixture for simulation [importance sampling]
      purposes, with density

                                  ϕk (θk ) ∝ ω1 πk (θk )Lk (θk ) + ϕ(θk ) ,

      where ϕ(·) is arbitrary (but normalised)
      Note: ω1 is not a probability weight
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Mixtures to bridge



Approximating Zk using a mixture representation



                                                                              Bridge sampling redux

      Design a specific mixture for simulation [importance sampling]
      purposes, with density

                                  ϕk (θk ) ∝ ω1 πk (θk )Lk (θk ) + ϕ(θk ) ,

      where ϕ(·) is arbitrary (but normalised)
      Note: ω1 is not a probability weight
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Mixtures to bridge



Approximating Z using a mixture representation (cont’d)

      Corresponding MCMC (=Gibbs) sampler
      At iteration t
          1   Take δ (t) = 1 with probability

                          (t−1)           (t−1)                  (t−1)         (t−1)          (t−1)
              ω1 πk (θk           )Lk (θk         )       ω1 πk (θk      )Lk (θk       ) + ϕ(θk       )

              and δ (t) = 2 otherwise;
                                                (t)                   (t−1)
          2   If δ (t) = 1, generate θk ∼ MCMC(θk          , θk ) where
              MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated
              with the posterior πk (θk |x) ∝ πk (θk )Lk (θk );
                                                (t)
          3   If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Mixtures to bridge



Approximating Z using a mixture representation (cont’d)

      Corresponding MCMC (=Gibbs) sampler
      At iteration t
          1   Take δ (t) = 1 with probability

                          (t−1)           (t−1)                  (t−1)         (t−1)          (t−1)
              ω1 πk (θk           )Lk (θk         )       ω1 πk (θk      )Lk (θk       ) + ϕ(θk       )

              and δ (t) = 2 otherwise;
                                                (t)                   (t−1)
          2   If δ (t) = 1, generate θk ∼ MCMC(θk          , θk ) where
              MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated
              with the posterior πk (θk |x) ∝ πk (θk )Lk (θk );
                                                (t)
          3   If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Mixtures to bridge



Approximating Z using a mixture representation (cont’d)

      Corresponding MCMC (=Gibbs) sampler
      At iteration t
          1   Take δ (t) = 1 with probability

                          (t−1)           (t−1)                  (t−1)         (t−1)          (t−1)
              ω1 πk (θk           )Lk (θk         )       ω1 πk (θk      )Lk (θk       ) + ϕ(θk       )

              and δ (t) = 2 otherwise;
                                                (t)                   (t−1)
          2   If δ (t) = 1, generate θk ∼ MCMC(θk          , θk ) where
              MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated
              with the posterior πk (θk |x) ∝ πk (θk )Lk (θk );
                                                (t)
          3   If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Mixtures to bridge



Evidence approximation by mixtures
      Rao-Blackwellised estimate
                          T
           ˆ 1
           ξ=
                                        (t)
                                ω1 πk (θk )Lk (θk )
                                                     (t)               (t)          (t)
                                                              ω1 πk (θk )Lk (θk ) + ϕ(θk ) ,
                                                                                                 (t)
              T
                          t=1

      converges to ω1 Zk /{ω1 Zk + 1}
               3k
                           ˆ       ˆ        ˆ
      Deduce Zˆ from ω1 Z3k /{ω1 Z3k + 1} = ξ ie

                           T           (t)     (t)                      (t)          (t)          (t)
                           t=1 ω1 πk (θk )Lk (θk )              ω1 π(θk )Lk (θk ) + ϕ(θk )
           ˆ
           Z3k =
                                   T      (t)                    (t)          (t)          (t)
                                   t=1 ϕ(θk )             ω1 πk (θk )Lk (θk ) + ϕ(θk )

                                                                                    [Bridge sampler]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Mixtures to bridge



Evidence approximation by mixtures
      Rao-Blackwellised estimate
                          T
           ˆ 1
           ξ=
                                        (t)
                                ω1 πk (θk )Lk (θk )
                                                     (t)               (t)          (t)
                                                              ω1 πk (θk )Lk (θk ) + ϕ(θk ) ,
                                                                                                 (t)
              T
                          t=1

      converges to ω1 Zk /{ω1 Zk + 1}
               3k
                           ˆ       ˆ        ˆ
      Deduce Zˆ from ω1 Z3k /{ω1 Z3k + 1} = ξ ie

                           T           (t)     (t)                      (t)          (t)          (t)
                           t=1 ω1 πk (θk )Lk (θk )              ω1 π(θk )Lk (θk ) + ϕ(θk )
           ˆ
           Z3k =
                                   T      (t)                    (t)          (t)          (t)
                                   t=1 ϕ(θk )             ω1 πk (θk )Lk (θk ) + ϕ(θk )

                                                                                    [Bridge sampler]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Chib’s representation


      Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and
      θk ∼ πk (θk ),
                                        fk (x|θk ) πk (θk )
                        Zk = mk (x) =
                                            πk (θk |x)
      Use of an approximation to the posterior
                                                                  ∗       ∗
                                                          fk (x|θk ) πk (θk )
                                  Zk = mk (x) =                               .
                                                               ˆ ∗
                                                              πk (θk |x)
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Chib’s representation


      Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and
      θk ∼ πk (θk ),
                                        fk (x|θk ) πk (θk )
                        Zk = mk (x) =
                                            πk (θk |x)
      Use of an approximation to the posterior
                                                                  ∗       ∗
                                                          fk (x|θk ) πk (θk )
                                  Zk = mk (x) =                               .
                                                               ˆ ∗
                                                              πk (θk |x)
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Case of latent variables



      For missing variable z as in mixture models, natural Rao-Blackwell
      estimate
                                       T
                            ∗       1          ∗      (t)
                       πk (θk |x) =       πk (θk |x, zk ) ,
                                    T
                                                          t=1
                         (t)
      where the zk ’s are Gibbs sampled latent variables
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Label switching


      A mixture model [special case of missing variable model] is
      invariant under permutations of the indices of the components.
      E.g., mixtures
                          0.3N (0, 1) + 0.7N (2.3, 1)
      and
                                       0.7N (2.3, 1) + 0.3N (0, 1)
      are exactly the same!
       c The component parameters θi are not identifiable
      marginally since they are exchangeable
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Label switching


      A mixture model [special case of missing variable model] is
      invariant under permutations of the indices of the components.
      E.g., mixtures
                          0.3N (0, 1) + 0.7N (2.3, 1)
      and
                                       0.7N (2.3, 1) + 0.3N (0, 1)
      are exactly the same!
       c The component parameters θi are not identifiable
      marginally since they are exchangeable
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Connected difficulties


          1   Number of modes of the likelihood of order O(k!):
               c Maximization and even [MCMC] exploration of the
              posterior surface harder
          2   Under exchangeable priors on (θ, p) [prior invariant under
              permutation of the indices], all posterior marginals are
              identical:
               c Posterior expectation of θ1 equal to posterior expectation
              of θ2
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Connected difficulties


          1   Number of modes of the likelihood of order O(k!):
               c Maximization and even [MCMC] exploration of the
              posterior surface harder
          2   Under exchangeable priors on (θ, p) [prior invariant under
              permutation of the indices], all posterior marginals are
              identical:
               c Posterior expectation of θ1 equal to posterior expectation
              of θ2
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



License


      Since Gibbs output does not produce exchangeability, the Gibbs
      sampler has not explored the whole parameter space: it lacks
      energy to switch simultaneously enough component allocations at
      once

                                                                             0.2 0.3 0.4 0.5
                  −1 0 1 2 3
          µi




                                  0   100   200

                                                  n
                                                      300   400   500
                                                                        pi                     −1         0
                                                                                                                    µ
                                                                                                                     1

                                                                                                                     i
                                                                                                                               2       3




                                                                             0.4 0.6 0.8 1.0
                0.2 0.3 0.4 0.5




                                                                        σi
           pi




                                  0   100   200       300   400   500                           0.2           0.3        0.4         0.5
                                                  n                                                                 pi
                0.4 0.6 0.8 1.0




                                                                             −1 0 1 2 3
          σi




                                                                        µi




                                  0   100   200       300   400   500                               0.4         0.6            0.8         1.0
                                                  n                                                                 σi
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Label switching paradox




      We should observe the exchangeability of the components [label
      switching] to conclude about convergence of the Gibbs sampler.
      If we observe it, then we do not know how to estimate the
      parameters.
      If we do not, then we are uncertain about the convergence!!!
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Label switching paradox




      We should observe the exchangeability of the components [label
      switching] to conclude about convergence of the Gibbs sampler.
      If we observe it, then we do not know how to estimate the
      parameters.
      If we do not, then we are uncertain about the convergence!!!
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Label switching paradox




      We should observe the exchangeability of the components [label
      switching] to conclude about convergence of the Gibbs sampler.
      If we observe it, then we do not know how to estimate the
      parameters.
      If we do not, then we are uncertain about the convergence!!!
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Compensation for label switching
                                         (t)
      For mixture models, zk usually fails to visit all configurations in a
      balanced way, despite the symmetry predicted by the theory
                                                              1
                       πk (θk |x) = πk (σ(θk )|x) =                      πk (σ(θk )|x)
                                                              k!
                                                                   σ∈S

      for all σ’s in Sk , set of all permutations of {1, . . . , k}.
      Consequences on numerical approximation, biased by an order k!
      Recover the theoretical symmetry by using
                                                          T
                              ∗              1                        ∗        (t)
                         πk (θk |x) =                          πk (σ(θk )|x, zk ) .
                                            T k!
                                                    σ∈Sk t=1

                                                    [Berkhof, Mechelen, & Gelman, 2003]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Compensation for label switching
                                         (t)
      For mixture models, zk usually fails to visit all configurations in a
      balanced way, despite the symmetry predicted by the theory
                                                              1
                       πk (θk |x) = πk (σ(θk )|x) =                      πk (σ(θk )|x)
                                                              k!
                                                                   σ∈S

      for all σ’s in Sk , set of all permutations of {1, . . . , k}.
      Consequences on numerical approximation, biased by an order k!
      Recover the theoretical symmetry by using
                                                          T
                              ∗              1                        ∗        (t)
                         πk (θk |x) =                          πk (σ(θk )|x, zk ) .
                                            T k!
                                                    σ∈Sk t=1

                                                    [Berkhof, Mechelen, & Gelman, 2003]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Galaxy dataset

      n = 82 galaxies as a mixture of k normal distributions with both
      mean and variance unknown.
                                                          [Roeder, 1992]
                                                                         Average density
                                                         0.8
                                                         0.6
                                    Relative Frequency

                                                         0.4
                                                         0.2
                                                         0.0




                                                               −2   −1      0              1   2   3

                                                                                data
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Galaxy dataset (k)
                                              ∗
      Using only the original estimate, with θk as the MAP estimator,
                                       log(mk (x)) = −105.1396
                                           ˆ
      for k = 3 (based on 103 simulations), while introducing the
      permutations leads to
                                       log(mk (x)) = −103.3479
                                           ˆ
      Note that
                                  −105.1396 + log(3!) = −103.3479

        k                 2           3             4         5         6         7         8
        mk (x)         -115.68     -103.35       -102.66   -101.93   -102.88   -105.48   -108.44

      Estimations of the marginal likelihoods by the symmetrised Chib’s
      approximation (based on 105 Gibbs iterations and, for k > 5, 100
      permutations selected at random in Sk ).
                                [Lee, Marin, Mengersen & Robert, 2008]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Galaxy dataset (k)
                                              ∗
      Using only the original estimate, with θk as the MAP estimator,
                                       log(mk (x)) = −105.1396
                                           ˆ
      for k = 3 (based on 103 simulations), while introducing the
      permutations leads to
                                       log(mk (x)) = −103.3479
                                           ˆ
      Note that
                                  −105.1396 + log(3!) = −103.3479

        k                 2           3             4         5         6         7         8
        mk (x)         -115.68     -103.35       -102.66   -101.93   -102.88   -105.48   -108.44

      Estimations of the marginal likelihoods by the symmetrised Chib’s
      approximation (based on 105 Gibbs iterations and, for k > 5, 100
      permutations selected at random in Sk ).
                                [Lee, Marin, Mengersen & Robert, 2008]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Galaxy dataset (k)
                                              ∗
      Using only the original estimate, with θk as the MAP estimator,
                                       log(mk (x)) = −105.1396
                                           ˆ
      for k = 3 (based on 103 simulations), while introducing the
      permutations leads to
                                       log(mk (x)) = −103.3479
                                           ˆ
      Note that
                                  −105.1396 + log(3!) = −103.3479

        k                 2           3             4         5         6         7         8
        mk (x)         -115.68     -103.35       -102.66   -101.93   -102.88   -105.48   -108.44

      Estimations of the marginal likelihoods by the symmetrised Chib’s
      approximation (based on 105 Gibbs iterations and, for k > 5, 100
      permutations selected at random in Sk ).
                                [Lee, Marin, Mengersen & Robert, 2008]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Case of the probit model

      For the completion by z,
                                                     1
                                     π (θ|x) =
                                     ˆ                        π(θ|x, z (t) )
                                                     T    t

      is a simple average of normal densities
      R Bridge sampling code
      gibbs1=gibbsprobit(Niter,y,X1)
      gibbs2=gibbsprobit(Niter,y,X2)
      bfchi=mean(exp(dmvlnorm(t(t(gibbs2$mu)-model2$coeff[,1]),mean=rep(0,3),
              sigma=gibbs2$Sigma2)-probitlpost(model2$coeff[,1],y,X2)))/
            mean(exp(dmvlnorm(t(t(gibbs1$mu)-model1$coeff[,1]),mean=rep(0,2),
              sigma=gibbs1$Sigma2)-probitlpost(model1$coeff[,1],y,X1)))
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Case of the probit model

      For the completion by z,
                                                     1
                                     π (θ|x) =
                                     ˆ                        π(θ|x, z (t) )
                                                     T    t

      is a simple average of normal densities
      R Bridge sampling code
      gibbs1=gibbsprobit(Niter,y,X1)
      gibbs2=gibbsprobit(Niter,y,X2)
      bfchi=mean(exp(dmvlnorm(t(t(gibbs2$mu)-model2$coeff[,1]),mean=rep(0,3),
              sigma=gibbs2$Sigma2)-probitlpost(model2$coeff[,1],y,X2)))/
            mean(exp(dmvlnorm(t(t(gibbs1$mu)-model1$coeff[,1]),mean=rep(0,2),
              sigma=gibbs1$Sigma2)-probitlpost(model1$coeff[,1],y,X1)))
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Diabetes in Pima Indian women (cont’d)
      Comparison of the variation of the Bayes factor approximations
      based on 100 replicas for 20, 000 simulations for a simulation from
      the above Chib’s and importance samplers

                                                                   q
              0.0255




                                        q
                                        q                          q
                                        q
              0.0250
              0.0245
              0.0240




                                        q
                                        q



                                  Chib's method           importance sampling
On some computational methods for Bayesian model choice
  Importance sampling solutions
     The Savage–Dickey ratio



The Savage–Dickey ratio
      Special representation of the Bayes factor used for simulation
      Given a test H0 : θ = θ0 in a model f (x|θ, ψ) with a nuisance
      parameter ψ, under priors π0 (ψ) and π1 (θ, ψ) such that

                                             π1 (ψ|θ0 ) = π0 (ψ)

      then
                                                          π1 (θ0 |x)
                                              B01 =                  ,
                                                           π1 (θ0 )
      with the obvious notations

                 π1 (θ) =          π1 (θ, ψ)dψ ,          π1 (θ|x) =     π1 (θ, ψ|x)dψ ,

                                        [Dickey, 1971; Verdinelli & Wasserman, 1995]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     The Savage–Dickey ratio



Measure-theoretic difficulty

      The representation depends on the choice of versions of conditional
      densities:
                   π0 (ψ)f (x|θ0 , ψ) dψ
      B01 =                                          [by definition]
                 π1 (θ, ψ)f (x|θ, ψ) dψdθ
                  π1 (ψ|θ0 )f (x|θ0 , ψ) dψ π1 (θ0 )
             =                                       [specific version of π1 (ψ|θ0 )]
                 π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (θ0 )
                 π1 (θ0 , ψ)f (x|θ0 , ψ) dψ
             =                                       [specific version of π1 (θ0 , ψ)]
                       m1 (x)π1 (θ0 )
               π1 (θ0 |x)
             =
                π1 (θ0 )

                       c Dickey’s (1971) condition is not a condition
On some computational methods for Bayesian model choice
  Importance sampling solutions
     The Savage–Dickey ratio



Similar measure-theoretic difficulty


      Verdinelli-Wasserman extension:
                                       π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ)
                            B01 =                E
                                        π1 (θ0 )                 π1 (ψ|θ0 )

      depends on similar choices of versions
      Monte Carlo implementation relies on continuous versions of all
      densities without making mention of it
                                          [Chen, Shao & Ibrahim, 2000]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     The Savage–Dickey ratio



Similar measure-theoretic difficulty


      Verdinelli-Wasserman extension:
                                       π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ)
                            B01 =                E
                                        π1 (θ0 )                 π1 (ψ|θ0 )

      depends on similar choices of versions
      Monte Carlo implementation relies on continuous versions of all
      densities without making mention of it
                                          [Chen, Shao & Ibrahim, 2000]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     The Savage–Dickey ratio



Computational implementation
      Starting from the (new) prior

                                         π1 (θ, ψ) = π1 (θ)π0 (ψ)
                                         ˜

      define the associated posterior

                           π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x)
                           ˜                                    ˜

      and impose
                                  π1 (θ0 |x)
                                  ˜                       π0 (ψ)f (x|θ0 , ψ) dψ
                                             =
                                   π0 (θ0 )                     m1 (x)
                                                                ˜
      to hold.
      Then
                                                    π1 (θ0 |x) m1 (x)
                                                    ˜          ˜
                                         B01 =
                                                     π1 (θ0 ) m1 (x)
On some computational methods for Bayesian model choice
  Importance sampling solutions
     The Savage–Dickey ratio



Computational implementation
      Starting from the (new) prior

                                         π1 (θ, ψ) = π1 (θ)π0 (ψ)
                                         ˜

      define the associated posterior

                           π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x)
                           ˜                                    ˜

      and impose
                                  π1 (θ0 |x)
                                  ˜                       π0 (ψ)f (x|θ0 , ψ) dψ
                                             =
                                   π0 (θ0 )                     m1 (x)
                                                                ˜
      to hold.
      Then
                                                    π1 (θ0 |x) m1 (x)
                                                    ˜          ˜
                                         B01 =
                                                     π1 (θ0 ) m1 (x)
On some computational methods for Bayesian model choice
  Importance sampling solutions
     The Savage–Dickey ratio



First ratio

      If (θ(1) , ψ (1) ), . . . , (θ(T ) , ψ (T ) ) ∼ π (θ, ψ|x), then
                                                      ˜
                                            1
                                                        π1 (θ0 |x, ψ (t) )
                                                        ˜
                                            T       t

      converges to π1 (θ0 |x) (if the right version is used in θ0 ).
                    ˜
      When π1 (θ0 |x, ψ unavailable, replace with
            ˜
                                                T
                                        1
                                                    π1 (θ0 |x, z (t) , ψ (t) )
                                                    ˜
                                        T
                                             t=1
On some computational methods for Bayesian model choice
  Importance sampling solutions
     The Savage–Dickey ratio



First ratio

      If (θ(1) , ψ (1) ), . . . , (θ(T ) , ψ (T ) ) ∼ π (θ, ψ|x), then
                                                      ˜
                                            1
                                                        π1 (θ0 |x, ψ (t) )
                                                        ˜
                                            T       t

      converges to π1 (θ0 |x) (if the right version is used in θ0 ).
                    ˜
      When π1 (θ0 |x, ψ unavailable, replace with
            ˜
                                                T
                                        1
                                                    π1 (θ0 |x, z (t) , ψ (t) )
                                                    ˜
                                        T
                                             t=1
On some computational methods for Bayesian model choice
  Importance sampling solutions
     The Savage–Dickey ratio



Bridge revival (1)

      Since m1 (x)/m1 (x) is unknown, apparent failure!
            ˜
      Use of the identity

                         π1 (θ, ψ)f (x|θ, ψ)                 π1 (ψ|θ)   m1 (x)
      Eπ1 (θ,ψ|x)
       ˜
                                               = Eπ1 (θ,ψ|x)
                                                  ˜
                                                                      =
                        π0 (ψ)π1 (θ)f (x|θ, ψ)                π0 (ψ)    m1 (x)
                                                                        ˜

      to (biasedly) estimate m1 (x)/m1 (x) by
                             ˜
                                                   T
                                                          π1 (ψ (t) |θ(t) )
                                           T
                                                 t=1
                                                            π0 (ψ (t) )

      based on the same sample from π1 .
                                    ˜
On some computational methods for Bayesian model choice
  Importance sampling solutions
     The Savage–Dickey ratio



Bridge revival (1)

      Since m1 (x)/m1 (x) is unknown, apparent failure!
            ˜
      Use of the identity

                         π1 (θ, ψ)f (x|θ, ψ)                 π1 (ψ|θ)   m1 (x)
      Eπ1 (θ,ψ|x)
       ˜
                                               = Eπ1 (θ,ψ|x)
                                                  ˜
                                                                      =
                        π0 (ψ)π1 (θ)f (x|θ, ψ)                π0 (ψ)    m1 (x)
                                                                        ˜

      to (biasedly) estimate m1 (x)/m1 (x) by
                             ˜
                                                   T
                                                          π1 (ψ (t) |θ(t) )
                                           T
                                                 t=1
                                                            π0 (ψ (t) )

      based on the same sample from π1 .
                                    ˜
On some computational methods for Bayesian model choice
  Importance sampling solutions
     The Savage–Dickey ratio



Bridge revival (2)
      Alternative identity

                        π0 (ψ)π1 (θ)f (x|θ, ψ)                π0 (ψ)    m1 (x)
                                                                        ˜
      Eπ1 (θ,ψ|x)                              = Eπ1 (θ,ψ|x)          =
                         π1 (θ, ψ)f (x|θ, ψ)                 π1 (ψ|θ)   m1 (x)
                                                 ¯ ¯
      suggests using a second sample (θ(1) , ψ (1) , z (1) ), . . . ,
       ¯(T ) , ψ (T ) , z (T ) ) ∼ π1 (θ, ψ|x) and
      (θ       ¯

                                                  T              ¯
                                            1               π0 (ψ (t) )
                                            T                 ¯ ¯
                                                          π1 (ψ (t) |θ(t) )
                                                 t=1

      Resulting estimate:

                                                          (t) , ψ (t) )       T            ¯
                                  1   t π1 (θ0 |x, z
                                        ˜                                 1           π0 (ψ (t) )
                   B01 =
                                  T           π1 (θ0 )                    T
                                                                              t=1
                                                                                    π1 (ψ ¯
                                                                                        ¯ (t) |θ (t) )
On some computational methods for Bayesian model choice
  Importance sampling solutions
     The Savage–Dickey ratio



Bridge revival (2)
      Alternative identity

                        π0 (ψ)π1 (θ)f (x|θ, ψ)                π0 (ψ)    m1 (x)
                                                                        ˜
      Eπ1 (θ,ψ|x)                              = Eπ1 (θ,ψ|x)          =
                         π1 (θ, ψ)f (x|θ, ψ)                 π1 (ψ|θ)   m1 (x)
                                                 ¯ ¯
      suggests using a second sample (θ(1) , ψ (1) , z (1) ), . . . ,
       ¯(T ) , ψ (T ) , z (T ) ) ∼ π1 (θ, ψ|x) and
      (θ       ¯

                                                  T              ¯
                                            1               π0 (ψ (t) )
                                            T                 ¯ ¯
                                                          π1 (ψ (t) |θ(t) )
                                                 t=1

      Resulting estimate:

                                                          (t) , ψ (t) )       T            ¯
                                  1   t π1 (θ0 |x, z
                                        ˜                                 1           π0 (ψ (t) )
                   B01 =
                                  T           π1 (θ0 )                    T
                                                                              t=1
                                                                                    π1 (ψ ¯
                                                                                        ¯ (t) |θ (t) )
On some computational methods for Bayesian model choice
  Importance sampling solutions
     The Savage–Dickey ratio



Diabetes in Pima Indian women (cont’d)
      Comparison of the variation of the Bayes factor approximations
      based on 100 replicas for 20, 000 simulations for a simulation from
      the above importance, Chib’s, Savage–Dickey’s and bridge
      samplers

                                                                            q
             3.4
             3.2
             3.0
             2.8




                                                                            q



                               IS             Chib        Savage−Dickey   Bridge
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Bayesian variable selection

      Example of a regression setting: one dependent random variable y
      and a set {x1 , . . . , xk } of k explanatory variables.
      Question: Are all xi ’s involved in the regression?
      Assumption: every subset {i1 , . . . , iq } of q (0 ≤ q ≤ k)
      explanatory variables, {1n , xi1 , . . . , xiq }, is a proper set of
      explanatory variables for the regression of y [intercept included in
      every corresponding model]

      Computational issue
                                      2k models in competition...

                                                   [Marin & Robert, Bayesian Core, 2007]
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Bayesian variable selection

      Example of a regression setting: one dependent random variable y
      and a set {x1 , . . . , xk } of k explanatory variables.
      Question: Are all xi ’s involved in the regression?
      Assumption: every subset {i1 , . . . , iq } of q (0 ≤ q ≤ k)
      explanatory variables, {1n , xi1 , . . . , xiq }, is a proper set of
      explanatory variables for the regression of y [intercept included in
      every corresponding model]

      Computational issue
                                      2k models in competition...

                                                   [Marin & Robert, Bayesian Core, 2007]
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Bayesian variable selection

      Example of a regression setting: one dependent random variable y
      and a set {x1 , . . . , xk } of k explanatory variables.
      Question: Are all xi ’s involved in the regression?
      Assumption: every subset {i1 , . . . , iq } of q (0 ≤ q ≤ k)
      explanatory variables, {1n , xi1 , . . . , xiq }, is a proper set of
      explanatory variables for the regression of y [intercept included in
      every corresponding model]

      Computational issue
                                      2k models in competition...

                                                   [Marin & Robert, Bayesian Core, 2007]
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Bayesian variable selection

      Example of a regression setting: one dependent random variable y
      and a set {x1 , . . . , xk } of k explanatory variables.
      Question: Are all xi ’s involved in the regression?
      Assumption: every subset {i1 , . . . , iq } of q (0 ≤ q ≤ k)
      explanatory variables, {1n , xi1 , . . . , xiq }, is a proper set of
      explanatory variables for the regression of y [intercept included in
      every corresponding model]

      Computational issue
                                      2k models in competition...

                                                   [Marin & Robert, Bayesian Core, 2007]
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Reversible jump


      Idea: Set up a proper measure–theoretic framework for designing
      moves between models Mk
                                                         [Green, 1995]
      Create a reversible kernel K on H = k {k} × Θk such that

                                  K(x, dy)π(x)dx =                K(y, dx)π(y)dy
                          A   B                           B   A

      for the invariant density π [x is of the form (k, θ(k) )]
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Reversible jump


      Idea: Set up a proper measure–theoretic framework for designing
      moves between models Mk
                                                         [Green, 1995]
      Create a reversible kernel K on H = k {k} × Θk such that

                                  K(x, dy)π(x)dx =                K(y, dx)π(y)dy
                          A   B                           B   A

      for the invariant density π [x is of the form (k, θ(k) )]
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Local moves
      For a move between two models, M1 and M2 , the Markov chain
      being in state θ1 ∈ M1 , denote by K1→2 (θ1 , dθ) and K2→1 (θ2 , dθ)
      the corresponding kernels, under the detailed balance condition

                          π(dθ1 ) K1→2 (θ1 , dθ) = π(dθ2 ) K2→1 (θ2 , dθ) ,

      and take, wlog, dim(M2 ) > dim(M1 ).
      Proposal expressed as

                                           θ2 = Ψ1→2 (θ1 , v1→2 )

      where v1→2 is a random variable of dimension
      dim(M2 ) − dim(M1 ), generated as

                                          v1→2 ∼ ϕ1→2 (v1→2 ) .
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Local moves
      For a move between two models, M1 and M2 , the Markov chain
      being in state θ1 ∈ M1 , denote by K1→2 (θ1 , dθ) and K2→1 (θ2 , dθ)
      the corresponding kernels, under the detailed balance condition

                          π(dθ1 ) K1→2 (θ1 , dθ) = π(dθ2 ) K2→1 (θ2 , dθ) ,

      and take, wlog, dim(M2 ) > dim(M1 ).
      Proposal expressed as

                                           θ2 = Ψ1→2 (θ1 , v1→2 )

      where v1→2 is a random variable of dimension
      dim(M2 ) − dim(M1 ), generated as

                                          v1→2 ∼ ϕ1→2 (v1→2 ) .
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Local moves (2)
      In this case, q1→2 (θ1 , dθ2 ) has density
                                                                          −1
                                                     ∂Ψ1→2 (θ1 , v1→2 )
                               ϕ1→2 (v1→2 )                                    ,
                                                       ∂(θ1 , v1→2 )
      by the Jacobian rule.
                                                                                   Reverse importance link

      If probability 1→2 of choosing move to M2 while in M1 ,
      acceptance probability reduces to
                                         π(M2 , θ2 ) 2→1        ∂Ψ1→2 (θ1 , v1→2 )
      α(θ1 , v1→2 ) = 1∧                                                           .
                                   π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 )   ∂(θ1 , v1→2 )
      If several models are considered simultaneously, with probability
         1→2 of choosing move to M2 while in M1 , as in
                                         ∞
                                         XZ
                          K(x, B) =               ρm (x, y)qm (x, dy) + ω(x)IB (x)
                                         m=1
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Local moves (2)
      In this case, q1→2 (θ1 , dθ2 ) has density
                                                                          −1
                                                     ∂Ψ1→2 (θ1 , v1→2 )
                               ϕ1→2 (v1→2 )                                    ,
                                                       ∂(θ1 , v1→2 )
      by the Jacobian rule.
                                                                                   Reverse importance link

      If probability 1→2 of choosing move to M2 while in M1 ,
      acceptance probability reduces to
                                         π(M2 , θ2 ) 2→1        ∂Ψ1→2 (θ1 , v1→2 )
      α(θ1 , v1→2 ) = 1∧                                                           .
                                   π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 )   ∂(θ1 , v1→2 )
      If several models are considered simultaneously, with probability
         1→2 of choosing move to M2 while in M1 , as in
                                         ∞
                                         XZ
                          K(x, B) =               ρm (x, y)qm (x, dy) + ω(x)IB (x)
                                         m=1
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Local moves (2)
      In this case, q1→2 (θ1 , dθ2 ) has density
                                                                          −1
                                                     ∂Ψ1→2 (θ1 , v1→2 )
                               ϕ1→2 (v1→2 )                                    ,
                                                       ∂(θ1 , v1→2 )
      by the Jacobian rule.
                                                                                   Reverse importance link

      If probability 1→2 of choosing move to M2 while in M1 ,
      acceptance probability reduces to
                                         π(M2 , θ2 ) 2→1        ∂Ψ1→2 (θ1 , v1→2 )
      α(θ1 , v1→2 ) = 1∧                                                           .
                                   π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 )   ∂(θ1 , v1→2 )
      If several models are considered simultaneously, with probability
         1→2 of choosing move to M2 while in M1 , as in
                                         ∞
                                         XZ
                          K(x, B) =               ρm (x, y)qm (x, dy) + ω(x)IB (x)
                                         m=1
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Generic reversible jump acceptance probability


      Acceptance probability of θ2 = Ψ1→2 (θ1 , v1→2 ) is
                                           π(M2 , θ2 ) 2→1        ∂Ψ1→2 (θ1 , v1→2 )
          α(θ1 , v1→2 ) = 1 ∧
                                     π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 )   ∂(θ1 , v1→2 )

      while acceptance probability of θ1 with (θ1 , v1→2 ) = Ψ−1 (θ2 ) is
                                                              1→2
                                                                                     −1
                                   π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂Ψ1→2 (θ1 , v1→2 )
        α(θ1 , v1→2 ) = 1 ∧
                                         π(M2 , θ2 ) 2→1          ∂(θ1 , v1→2 )

                                           c Difficult calibration
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Generic reversible jump acceptance probability


      Acceptance probability of θ2 = Ψ1→2 (θ1 , v1→2 ) is
                                           π(M2 , θ2 ) 2→1        ∂Ψ1→2 (θ1 , v1→2 )
          α(θ1 , v1→2 ) = 1 ∧
                                     π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 )   ∂(θ1 , v1→2 )

      while acceptance probability of θ1 with (θ1 , v1→2 ) = Ψ−1 (θ2 ) is
                                                              1→2
                                                                                     −1
                                   π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂Ψ1→2 (θ1 , v1→2 )
        α(θ1 , v1→2 ) = 1 ∧
                                         π(M2 , θ2 ) 2→1          ∂(θ1 , v1→2 )

                                           c Difficult calibration
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Generic reversible jump acceptance probability


      Acceptance probability of θ2 = Ψ1→2 (θ1 , v1→2 ) is
                                           π(M2 , θ2 ) 2→1        ∂Ψ1→2 (θ1 , v1→2 )
          α(θ1 , v1→2 ) = 1 ∧
                                     π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 )   ∂(θ1 , v1→2 )

      while acceptance probability of θ1 with (θ1 , v1→2 ) = Ψ−1 (θ2 ) is
                                                              1→2
                                                                                     −1
                                   π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂Ψ1→2 (θ1 , v1→2 )
        α(θ1 , v1→2 ) = 1 ∧
                                         π(M2 , θ2 ) 2→1          ∂(θ1 , v1→2 )

                                           c Difficult calibration
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Green’s sampler

      Algorithm
      Iteration t (t ≥ 1): if x(t) = (m, θ(m) ),
          1   Select model Mn with probability πmn
          2   Generate umn ∼ ϕmn (u) and set
              (θ(n) , vnm ) = Ψm→n (θ(m) , umn )
          3   Take x(t+1) = (n, θ(n) ) with probability

                            π(n, θ(n) ) πnm ϕnm (vnm )    ∂Ψm→n (θ(m) , umn )
                   min                                                        ,1
                            π(m, θ(m) ) πmn ϕmn (umn )      ∂(θ(m) , umn )

              and take x(t+1) = x(t) otherwise.
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Interpretation


      The representation puts us back in a fixed dimension setting:
              M1 × V1→2 and M2 in one-to-one relation.
              reversibility imposes that θ1 is derived as

                                             (θ1 , v1→2 ) = Ψ−1 (θ2 )
                                                             1→2

              appears like a regular Metropolis–Hastings move from the
              couple (θ1 , v1→2 ) to θ2 when stationary distributions are
              π(M1 , θ1 ) × ϕ1→2 (v1→2 ) and π(M2 , θ2 ), and when proposal
              distribution is deterministic
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Interpretation


      The representation puts us back in a fixed dimension setting:
              M1 × V1→2 and M2 in one-to-one relation.
              reversibility imposes that θ1 is derived as

                                             (θ1 , v1→2 ) = Ψ−1 (θ2 )
                                                             1→2

              appears like a regular Metropolis–Hastings move from the
              couple (θ1 , v1→2 ) to θ2 when stationary distributions are
              π(M1 , θ1 ) × ϕ1→2 (v1→2 ) and π(M2 , θ2 ), and when proposal
              distribution is deterministic
On some computational methods for Bayesian model choice
  Cross-model solutions
     Saturation schemes



Alternative
      Saturation of the parameter space H = k {k} × Θk by creating
           θ = (θ1 , . . . , θD )
           a model index M
           pseudo-priors πj (θj |M = k) for j = k
                                                  [Carlin & Chib, 1995]
      Validation by

                     P(M = k|x) =                 P (M = k|x, θ)π(θ|x)dθ = Zk

      where the (marginal) posterior is [not πk !]
                                       D
                      π(θ|x) =              P(θ, M = k|x)
                                      k=1
                                       D
                                  =         pk Zk πk (θk |x)         πj (θj |M = k) .
                                      k=1                      j=k
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice

Contenu connexe

Tendances

Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]Christian Robert
 
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013Christian Robert
 
ABC in London, May 5, 2011
ABC in London, May 5, 2011ABC in London, May 5, 2011
ABC in London, May 5, 2011Christian Robert
 
A Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemA Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemErika G. G.
 
Principle of Maximum Entropy
Principle of Maximum EntropyPrinciple of Maximum Entropy
Principle of Maximum EntropyJiawang Liu
 
Max Entropy
Max EntropyMax Entropy
Max Entropyjianingy
 
San Antonio short course, March 2010
San Antonio short course, March 2010San Antonio short course, March 2010
San Antonio short course, March 2010Christian Robert
 
(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?Christian Robert
 
from model uncertainty to ABC
from model uncertainty to ABCfrom model uncertainty to ABC
from model uncertainty to ABCChristian Robert
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationChristian Robert
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsChristian Robert
 
ABC-SysBio – Approximate Bayesian Computation in Python with GPU support
ABC-SysBio – Approximate Bayesian Computation in Python with GPU supportABC-SysBio – Approximate Bayesian Computation in Python with GPU support
ABC-SysBio – Approximate Bayesian Computation in Python with GPU supportBiogeeks
 
Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Asma Ben Slimene
 
Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forestsChristian Robert
 
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013Christian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 

Tendances (19)

Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]
 
Boston talk
Boston talkBoston talk
Boston talk
 
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
 
von Mises lecture, Berlin
von Mises lecture, Berlinvon Mises lecture, Berlin
von Mises lecture, Berlin
 
ABC in London, May 5, 2011
ABC in London, May 5, 2011ABC in London, May 5, 2011
ABC in London, May 5, 2011
 
A Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemA Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation Problem
 
Principle of Maximum Entropy
Principle of Maximum EntropyPrinciple of Maximum Entropy
Principle of Maximum Entropy
 
Max Entropy
Max EntropyMax Entropy
Max Entropy
 
San Antonio short course, March 2010
San Antonio short course, March 2010San Antonio short course, March 2010
San Antonio short course, March 2010
 
(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?
 
from model uncertainty to ABC
from model uncertainty to ABCfrom model uncertainty to ABC
from model uncertainty to ABC
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimation
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 
ABC-SysBio – Approximate Bayesian Computation in Python with GPU support
ABC-SysBio – Approximate Bayesian Computation in Python with GPU supportABC-SysBio – Approximate Bayesian Computation in Python with GPU support
ABC-SysBio – Approximate Bayesian Computation in Python with GPU support
 
Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...
 
Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forests
 
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 

En vedette

Sample of internship performance evaluation form
Sample of internship performance evaluation formSample of internship performance evaluation form
Sample of internship performance evaluation formKhawaja Naveed
 
Alif allah chambey di booty with english translation by arif lohar and meesha...
Alif allah chambey di booty with english translation by arif lohar and meesha...Alif allah chambey di booty with english translation by arif lohar and meesha...
Alif allah chambey di booty with english translation by arif lohar and meesha...Khawaja Naveed
 
Brentwood trucking limited case study solution
Brentwood trucking limited case study solutionBrentwood trucking limited case study solution
Brentwood trucking limited case study solutionKhawaja Naveed
 
Computational methods for Bayesian model choice
Computational methods for Bayesian model choiceComputational methods for Bayesian model choice
Computational methods for Bayesian model choiceChristian Robert
 
Project examples for sampling and the law of large numbers
Project examples for sampling and the law of large numbersProject examples for sampling and the law of large numbers
Project examples for sampling and the law of large numbersJohn Goodpasture
 
Integrated SEO: Combining Different SEO Strategies, Tactics, Techniques and S...
Integrated SEO: Combining Different SEO Strategies, Tactics, Techniques and S...Integrated SEO: Combining Different SEO Strategies, Tactics, Techniques and S...
Integrated SEO: Combining Different SEO Strategies, Tactics, Techniques and S...Victorino Q. Abrugar
 
Sampling Methods in Qualitative and Quantitative Research
Sampling Methods in Qualitative and Quantitative ResearchSampling Methods in Qualitative and Quantitative Research
Sampling Methods in Qualitative and Quantitative ResearchSam Ladner
 

En vedette (7)

Sample of internship performance evaluation form
Sample of internship performance evaluation formSample of internship performance evaluation form
Sample of internship performance evaluation form
 
Alif allah chambey di booty with english translation by arif lohar and meesha...
Alif allah chambey di booty with english translation by arif lohar and meesha...Alif allah chambey di booty with english translation by arif lohar and meesha...
Alif allah chambey di booty with english translation by arif lohar and meesha...
 
Brentwood trucking limited case study solution
Brentwood trucking limited case study solutionBrentwood trucking limited case study solution
Brentwood trucking limited case study solution
 
Computational methods for Bayesian model choice
Computational methods for Bayesian model choiceComputational methods for Bayesian model choice
Computational methods for Bayesian model choice
 
Project examples for sampling and the law of large numbers
Project examples for sampling and the law of large numbersProject examples for sampling and the law of large numbers
Project examples for sampling and the law of large numbers
 
Integrated SEO: Combining Different SEO Strategies, Tactics, Techniques and S...
Integrated SEO: Combining Different SEO Strategies, Tactics, Techniques and S...Integrated SEO: Combining Different SEO Strategies, Tactics, Techniques and S...
Integrated SEO: Combining Different SEO Strategies, Tactics, Techniques and S...
 
Sampling Methods in Qualitative and Quantitative Research
Sampling Methods in Qualitative and Quantitative ResearchSampling Methods in Qualitative and Quantitative Research
Sampling Methods in Qualitative and Quantitative Research
 

Similaire à Yes III: Computational methods for model choice

Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes FactorsChristian Robert
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testingChristian Robert
 
Computational tools for Bayesian model choice
Computational tools for Bayesian model choiceComputational tools for Bayesian model choice
Computational tools for Bayesian model choiceChristian Robert
 
Course on Bayesian computational methods
Course on Bayesian computational methodsCourse on Bayesian computational methods
Course on Bayesian computational methodsChristian Robert
 
Statistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityStatistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityChristian Robert
 
original
originaloriginal
originalbutest
 
Bayesian model choice (and some alternatives)
Bayesian model choice (and some alternatives)Bayesian model choice (and some alternatives)
Bayesian model choice (and some alternatives)Christian Robert
 
seminar at Princeton University
seminar at Princeton Universityseminar at Princeton University
seminar at Princeton UniversityChristian Robert
 
Workshop on Bayesian Inference for Latent Gaussian Models with Applications
Workshop on Bayesian Inference for Latent Gaussian Models with ApplicationsWorkshop on Bayesian Inference for Latent Gaussian Models with Applications
Workshop on Bayesian Inference for Latent Gaussian Models with ApplicationsChristian Robert
 
Elementary Probability and Information Theory
Elementary Probability and Information TheoryElementary Probability and Information Theory
Elementary Probability and Information TheoryKhalidSaghiri2
 
Non-parametric analysis of models and data
Non-parametric analysis of models and dataNon-parametric analysis of models and data
Non-parametric analysis of models and datahaharrington
 
-BayesianLearning in machine Learning 12
-BayesianLearning in machine Learning 12-BayesianLearning in machine Learning 12
-BayesianLearning in machine Learning 12Kumari Naveen
 
Rss talk for Bayes 250 by Steven Walker
Rss talk for Bayes 250 by Steven WalkerRss talk for Bayes 250 by Steven Walker
Rss talk for Bayes 250 by Steven WalkerChristian Robert
 

Similaire à Yes III: Computational methods for model choice (20)

Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes Factors
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testing
 
Computational tools for Bayesian model choice
Computational tools for Bayesian model choiceComputational tools for Bayesian model choice
Computational tools for Bayesian model choice
 
Course on Bayesian computational methods
Course on Bayesian computational methodsCourse on Bayesian computational methods
Course on Bayesian computational methods
 
Statistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityStatistics symposium talk, Harvard University
Statistics symposium talk, Harvard University
 
ISBA 2016: Foundations
ISBA 2016: FoundationsISBA 2016: Foundations
ISBA 2016: Foundations
 
original
originaloriginal
original
 
Bayesian model choice (and some alternatives)
Bayesian model choice (and some alternatives)Bayesian model choice (and some alternatives)
Bayesian model choice (and some alternatives)
 
seminar at Princeton University
seminar at Princeton Universityseminar at Princeton University
seminar at Princeton University
 
Workshop on Bayesian Inference for Latent Gaussian Models with Applications
Workshop on Bayesian Inference for Latent Gaussian Models with ApplicationsWorkshop on Bayesian Inference for Latent Gaussian Models with Applications
Workshop on Bayesian Inference for Latent Gaussian Models with Applications
 
Elementary Probability and Information Theory
Elementary Probability and Information TheoryElementary Probability and Information Theory
Elementary Probability and Information Theory
 
Edinburgh, Bayes-250
Edinburgh, Bayes-250Edinburgh, Bayes-250
Edinburgh, Bayes-250
 
Non-parametric analysis of models and data
Non-parametric analysis of models and dataNon-parametric analysis of models and data
Non-parametric analysis of models and data
 
sigir2017bayesian
sigir2017bayesiansigir2017bayesian
sigir2017bayesian
 
JSM 2011 round table
JSM 2011 round tableJSM 2011 round table
JSM 2011 round table
 
JSM 2011 round table
JSM 2011 round tableJSM 2011 round table
JSM 2011 round table
 
-BayesianLearning in machine Learning 12
-BayesianLearning in machine Learning 12-BayesianLearning in machine Learning 12
-BayesianLearning in machine Learning 12
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Rss talk for Bayes 250 by Steven Walker
Rss talk for Bayes 250 by Steven WalkerRss talk for Bayes 250 by Steven Walker
Rss talk for Bayes 250 by Steven Walker
 

Plus de Christian Robert

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceChristian Robert
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinChristian Robert
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?Christian Robert
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Christian Robert
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Christian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihoodChristian Robert
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Christian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussionChristian Robert
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceChristian Robert
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsChristian Robert
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distancesChristian Robert
 

Plus de Christian Robert (20)

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
 

Dernier

This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 

Dernier (20)

This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 

Yes III: Computational methods for model choice

  • 1. On some computational methods for Bayesian model choice On some computational methods for Bayesian model choice Christian P. Robert CREST-INSEE and Universit´ Paris Dauphine e http://www.ceremade.dauphine.fr/~xian YES III workshop, Eindhoven, October 7, 2009 Joint works with J.-M. Marin, M. Beaumont, N. Chopin, J.-M. Cornuet, and D. Wraith
  • 2. On some computational methods for Bayesian model choice Outline 1 Introduction 2 Importance sampling solutions 3 Cross-model solutions 4 Nested sampling 5 ABC model choice
  • 3. On some computational methods for Bayesian model choice Introduction Bayes tests Construction of Bayes tests Definition (Test) Given an hypothesis H0 : θ ∈ Θ0 on the parameter θ ∈ Θ0 of a statistical model, a test is a statistical procedure that takes its values in {0, 1}. Theorem (Bayes test) The Bayes estimator associated with π and with the 0 − 1 loss is 1 if π(θ ∈ Θ0 |x) > π(θ ∈ Θ0 |x), δ π (x) = 0 otherwise,
  • 4. On some computational methods for Bayesian model choice Introduction Bayes tests Construction of Bayes tests Definition (Test) Given an hypothesis H0 : θ ∈ Θ0 on the parameter θ ∈ Θ0 of a statistical model, a test is a statistical procedure that takes its values in {0, 1}. Theorem (Bayes test) The Bayes estimator associated with π and with the 0 − 1 loss is 1 if π(θ ∈ Θ0 |x) > π(θ ∈ Θ0 |x), δ π (x) = 0 otherwise,
  • 5. On some computational methods for Bayesian model choice Introduction Bayes factor Bayes factor Definition (Bayes factors) For testing hypotheses H0 : θ ∈ Θ0 vs. Ha : θ ∈ Θ0 , under prior π(Θ0 )π0 (θ) + π(Θc )π1 (θ) , 0 central quantity f (x|θ)π0 (θ)dθ π(Θ0 |x) π(Θ0 ) Θ0 B01 = = π(Θc |x) 0 π(Θc ) 0 f (x|θ)π1 (θ)dθ Θc 0 [Jeffreys, 1939]
  • 6. On some computational methods for Bayesian model choice Introduction Bayes factor Self-contained concept Outside decision-theoretic environment: eliminates impact of π(Θ0 ) but depends on the choice of (π0 , π1 ) Bayesian/marginal equivalent to the likelihood ratio Jeffreys’ scale of evidence: π if log10 (B10 ) between 0 and 0.5, evidence against H0 weak, π if log10 (B10 ) 0.5 and 1, evidence substantial, π if log10 (B10 ) 1 and 2, evidence strong and π if log10 (B10 ) above 2, evidence decisive Requires the computation of the marginal/evidence under both hypotheses/models
  • 7. On some computational methods for Bayesian model choice Introduction Model choice Model choice and model comparison Choice between models Several models available for the same observation Mi : x ∼ fi (x|θi ), i∈I where I can be finite or infinite Replace hypotheses with models but keep marginal likelihoods and Bayes factors
  • 8. On some computational methods for Bayesian model choice Introduction Model choice Model choice and model comparison Choice between models Several models available for the same observation Mi : x ∼ fi (x|θi ), i∈I where I can be finite or infinite Replace hypotheses with models but keep marginal likelihoods and Bayes factors
  • 9. On some computational methods for Bayesian model choice Introduction Model choice Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj j Θj take largest π(Mi |x) to determine “best” model, or use averaged predictive π(Mj |x) fj (x |θj )πj (θj |x)dθj j Θj
  • 10. On some computational methods for Bayesian model choice Introduction Model choice Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj j Θj take largest π(Mi |x) to determine “best” model, or use averaged predictive π(Mj |x) fj (x |θj )πj (θj |x)dθj j Θj
  • 11. On some computational methods for Bayesian model choice Introduction Model choice Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj j Θj take largest π(Mi |x) to determine “best” model, or use averaged predictive π(Mj |x) fj (x |θj )πj (θj |x)dθj j Θj
  • 12. On some computational methods for Bayesian model choice Introduction Model choice Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj j Θj take largest π(Mi |x) to determine “best” model, or use averaged predictive π(Mj |x) fj (x |θj )πj (θj |x)dθj j Θj
  • 13. On some computational methods for Bayesian model choice Introduction Evidence Evidence All these problems end up with a similar quantity, the evidence Zk = πk (θk )Lk (θk ) dθk , Θk aka the marginal likelihood.
  • 14. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Bayes factor approximation When approximating the Bayes factor f0 (x|θ0 )π0 (θ0 )dθ0 Θ0 B01 = f1 (x|θ1 )π1 (θ1 )dθ1 Θ1 use of importance functions 0 and 1 and n−1 0 n0 i i i=1 f0 (x|θ0 )π0 (θ0 )/ i 0 (θ0 ) B01 = n−1 1 n1 i i i=1 f1 (x|θ1 )π1 (θ1 )/ i 1 (θ1 )
  • 15. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Diabetes in Pima Indian women Example (R benchmark) “A population of women who were at least 21 years old, of Pima Indian heritage and living near Phoenix (AZ), was tested for diabetes according to WHO criteria. The data were collected by the US National Institute of Diabetes and Digestive and Kidney Diseases.” 200 Pima Indian women with observed variables plasma glucose concentration in oral glucose tolerance test diastolic blood pressure diabetes pedigree function presence/absence of diabetes
  • 16. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Probit modelling on Pima Indian women Probability of diabetes function of above variables P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) , Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a g-prior modelling: β ∼ N3 (0, n XT X)−1
  • 17. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Probit modelling on Pima Indian women Probability of diabetes function of above variables P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) , Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a g-prior modelling: β ∼ N3 (0, n XT X)−1
  • 18. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance MCMC 101 for probit models Use of either a random walk proposal β =β+ in a Metropolis-Hastings algorithm (since the likelihood is available) or of a Gibbs sampler that takes advantage of the missing/latent variable z|y, x, β ∼ N (xT β, 1) Iy × I1−y z≥0 z≤0 (since β|y, X, z is distributed as a standard normal) [Gibbs three times faster]
  • 19. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance MCMC 101 for probit models Use of either a random walk proposal β =β+ in a Metropolis-Hastings algorithm (since the likelihood is available) or of a Gibbs sampler that takes advantage of the missing/latent variable z|y, x, β ∼ N (xT β, 1) Iy × I1−y z≥0 z≤0 (since β|y, X, z is distributed as a standard normal) [Gibbs three times faster]
  • 20. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance MCMC 101 for probit models Use of either a random walk proposal β =β+ in a Metropolis-Hastings algorithm (since the likelihood is available) or of a Gibbs sampler that takes advantage of the missing/latent variable z|y, x, β ∼ N (xT β, 1) Iy × I1−y z≥0 z≤0 (since β|y, X, z is distributed as a standard normal) [Gibbs three times faster]
  • 21. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Diabetes in Pima Indian women ˆ ˆ MCMC output for βmax = 1.5, β = 1.15, k = 40, and 20, 000 simulations. 500 1000 1500 2000 2500 1.3 1.2 Frequency 1.1 1.0 0.9 0 0 5000 10000 15000 20000 0.9 1.0 1.1 1.2 1.3 400 10 20 30 40 50 60 70 300 200 100 0 0 5000 10000 15000 20000 12 21 29 37 45 53 61 69
  • 22. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Importance sampling for the Pima Indian dataset Use of the importance function inspired from the MLE estimate distribution ˆ ˆ β ∼ N (β, Σ) R Importance sampling code model1=summary(glm(y~-1+X1,family=binomial(link="probit"))) is1=rmvnorm(Niter,mean=model1$coeff[,1],sigma=2*model1$cov.unscaled) is2=rmvnorm(Niter,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled) bfis=mean(exp(probitlpost(is1,y,X1)-dmvlnorm(is1,mean=model1$coeff[,1], sigma=2*model1$cov.unscaled))) / mean(exp(probitlpost(is2,y,X2)- dmvlnorm(is2,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)))
  • 23. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Importance sampling for the Pima Indian dataset Use of the importance function inspired from the MLE estimate distribution ˆ ˆ β ∼ N (β, Σ) R Importance sampling code model1=summary(glm(y~-1+X1,family=binomial(link="probit"))) is1=rmvnorm(Niter,mean=model1$coeff[,1],sigma=2*model1$cov.unscaled) is2=rmvnorm(Niter,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled) bfis=mean(exp(probitlpost(is1,y,X1)-dmvlnorm(is1,mean=model1$coeff[,1], sigma=2*model1$cov.unscaled))) / mean(exp(probitlpost(is2,y,X2)- dmvlnorm(is2,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)))
  • 24. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Diabetes in Pima Indian women Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations for a simulation from the prior and the above importance sampler 5 4 q 3 2 Monte Carlo Importance sampling
  • 25. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Bridge sampling Special case: If π1 (θ1 |x) ∝ π1 (θ1 |x) ˜ π2 (θ2 |x) ∝ π2 (θ2 |x) ˜ live on the same space (Θ1 = Θ2 ), then n 1 π1 (θi |x) ˜ B12 ≈ θi ∼ π2 (θ|x) n π2 (θi |x) ˜ i=1 [Gelman & Meng, 1998; Chen, Shao & Ibrahim, 2000]
  • 26. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Bridge sampling variance The bridge sampling estimator does poorly if 2 var(B12 ) 1 π1 (θ) − π2 (θ) 2 ≈ E B12 n π2 (θ) is large, i.e. if π1 and π2 have little overlap...
  • 27. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Bridge sampling variance The bridge sampling estimator does poorly if 2 var(B12 ) 1 π1 (θ) − π2 (θ) 2 ≈ E B12 n π2 (θ) is large, i.e. if π1 and π2 have little overlap...
  • 28. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance (Further) bridge sampling In addition π2 (θ|x)α(θ)π1 (θ|x)dθ ˜ B12 = ∀ α(·) π1 (θ|x)α(θ)π2 (θ|x)dθ ˜ n1 1 π2 (θ1i |x)α(θ1i ) ˜ n1 i=1 ≈ n2 θji ∼ πj (θ|x) 1 π1 (θ2i |x)α(θ2i ) ˜ n2 i=1
  • 29. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance An infamous example When 1 α(θ) = π1 (θ|x)˜2 (θ|x) ˜ π harmonic mean approximation to B12 n1 1 1/˜1 (θ1i |x) π n1 i=1 B21 = n2 θji ∼ πj (θ|x) 1 1/˜2 (θ2i |x) π n2 i=1
  • 30. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance The original harmonic mean estimator When θki ∼ πk (θ|x), T 1 1 T L(θkt |x) t=1 is an unbiased estimator of 1/mk (x) [Newton & Raftery, 1994] Highly dangerous: Most often leads to an infinite variance!!!
  • 31. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance The original harmonic mean estimator When θki ∼ πk (θ|x), T 1 1 T L(θkt |x) t=1 is an unbiased estimator of 1/mk (x) [Newton & Raftery, 1994] Highly dangerous: Most often leads to an infinite variance!!!
  • 32. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance “The Worst Monte Carlo Method Ever” “The good news is that the Law of Large Numbers guarantees that this estimator is consistent ie, it will very likely be very close to the correct answer if you use a sufficiently large number of points from the posterior distribution. The bad news is that the number of points required for this estimator to get close to the right answer will often be greater than the number of atoms in the observable universe. The even worse news is that it’s easy for people to not realize this, and to na¨ıvely accept estimates that are nowhere close to the correct value of the marginal likelihood.” [Radford Neal’s blog, Aug. 23, 2008]
  • 33. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance “The Worst Monte Carlo Method Ever” “The good news is that the Law of Large Numbers guarantees that this estimator is consistent ie, it will very likely be very close to the correct answer if you use a sufficiently large number of points from the posterior distribution. The bad news is that the number of points required for this estimator to get close to the right answer will often be greater than the number of atoms in the observable universe. The even worse news is that it’s easy for people to not realize this, and to na¨ıvely accept estimates that are nowhere close to the correct value of the marginal likelihood.” [Radford Neal’s blog, Aug. 23, 2008]
  • 34. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Optimal bridge sampling The optimal choice of auxiliary function is n1 + n2 α = n1 π1 (θ|x) + n2 π2 (θ|x) leading to n1 1 π2 (θ1i |x) ˜ n1 n1 π1 (θ1i |x) + n2 π2 (θ1i |x) i=1 B12 ≈ n2 1 π1 (θ2i |x) ˜ n2 n1 π1 (θ2i |x) + n2 π2 (θ2i |x) i=1 Back later!
  • 35. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Optimal bridge sampling (2) Reason: Var(B12 ) 1 π1 (θ)π2 (θ)[n1 π1 (θ) + n2 π2 (θ)]α(θ)2 dθ 2 ≈ 2 −1 B12 n1 n2 π1 (θ)π2 (θ)α(θ) dθ (by the δ method) Drag: Dependence on the unknown normalising constants solved iteratively
  • 36. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Optimal bridge sampling (2) Reason: Var(B12 ) 1 π1 (θ)π2 (θ)[n1 π1 (θ) + n2 π2 (θ)]α(θ)2 dθ 2 ≈ 2 −1 B12 n1 n2 π1 (θ)π2 (θ)α(θ) dθ (by the δ method) Drag: Dependence on the unknown normalising constants solved iteratively
  • 37. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Extension to varying dimensions When π1 and π2 work on different dimensions, for instance θ2 = (θ1 , ψ), introduction of a pseudo-posterior density, ω(ψ|θ1 , x), to augment π1 (θ1 |x) into joint distribution π1 (θ1 |x) × ω(ψ|θ1 , x) on θ2 so that π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ ˜ B12 = , π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ ˜
  • 38. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Extension to varying dimensions When π1 and π2 work on different dimensions, for instance θ2 = (θ1 , ψ), introduction of a pseudo-posterior density, ω(ψ|θ1 , x), to augment π1 (θ1 |x) into joint distribution π1 (θ1 |x) × ω(ψ|θ1 , x) on θ2 so that π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ ˜ B12 = , π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ ˜
  • 39. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Illustration for the Pima Indian dataset Use of the MLE induced conditional of β3 given (β1 , β2 ) as a pseudo-posterior and mixture of both MLE approximations on β3 in bridge sampling estimate R Bridge sampling code cova=model2$cov.unscaled expecta=model2$coeff[,1] covw=cova[3,3]-t(cova[1:2,3])%*%ginv(cova[1:2,1:2])%*%cova[1:2,3] probit1=hmprobit(Niter,y,X1) probit2=hmprobit(Niter,y,X2) pseudo=rnorm(Niter,meanw(probit1),sqrt(covw)) probit1p=cbind(probit1,pseudo) bfbs=mean(exp(probitlpost(probit2[,1:2],y,X1)+dnorm(probit2[,3],meanw(probit2[,1:2]), sqrt(covw),log=T))/ (dmvnorm(probit2,expecta,cova)+dnorm(probit2[,3],expecta[3], cova[3,3])))/ mean(exp(probitlpost(probit1p,y,X2))/(dmvnorm(probit1p,expecta,cova)+dnorm(pseudo, expecta[3],cova[3,3])))
  • 40. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Illustration for the Pima Indian dataset Use of the MLE induced conditional of β3 given (β1 , β2 ) as a pseudo-posterior and mixture of both MLE approximations on β3 in bridge sampling estimate R Bridge sampling code cova=model2$cov.unscaled expecta=model2$coeff[,1] covw=cova[3,3]-t(cova[1:2,3])%*%ginv(cova[1:2,1:2])%*%cova[1:2,3] probit1=hmprobit(Niter,y,X1) probit2=hmprobit(Niter,y,X2) pseudo=rnorm(Niter,meanw(probit1),sqrt(covw)) probit1p=cbind(probit1,pseudo) bfbs=mean(exp(probitlpost(probit2[,1:2],y,X1)+dnorm(probit2[,3],meanw(probit2[,1:2]), sqrt(covw),log=T))/ (dmvnorm(probit2,expecta,cova)+dnorm(probit2[,3],expecta[3], cova[3,3])))/ mean(exp(probitlpost(probit1p,y,X2))/(dmvnorm(probit1p,expecta,cova)+dnorm(pseudo, expecta[3],cova[3,3])))
  • 41. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Diabetes in Pima Indian women (cont’d) Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations for a simulation from the prior, the above bridge sampler and the above importance sampler 5 4 q q 3 q 2 MC Bridge IS
  • 42. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Ratio importance sampling Another identity: Eϕ [˜1 (θ)/ϕ(θ)] π B12 = Eϕ [˜2 (θ)/ϕ(θ)] π for any density ϕ with sufficiently large support [Torrie & Valleau, 1977] Use of a single sample θ1 , . . . , θn from ϕ i=1 π1 (θi )/ϕ(θi ) ˜ B12 = i=1 π2 (θi )/ϕ(θi ) ˜
  • 43. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Ratio importance sampling Another identity: Eϕ [˜1 (θ)/ϕ(θ)] π B12 = Eϕ [˜2 (θ)/ϕ(θ)] π for any density ϕ with sufficiently large support [Torrie & Valleau, 1977] Use of a single sample θ1 , . . . , θn from ϕ i=1 π1 (θi )/ϕ(θi ) ˜ B12 = i=1 π2 (θi )/ϕ(θi ) ˜
  • 44. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Ratio importance sampling (2) Approximate variance: 2 var(B12 ) 1 (π1 (θ) − π2 (θ))2 2 ≈ Eϕ B12 n ϕ(θ)2 Optimal choice: | π1 (θ) − π2 (θ) | ϕ∗ (θ) = | π1 (η) − π2 (η) | dη [Chen, Shao & Ibrahim, 2000]
  • 45. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Ratio importance sampling (2) Approximate variance: 2 var(B12 ) 1 (π1 (θ) − π2 (θ))2 2 ≈ Eϕ B12 n ϕ(θ)2 Optimal choice: | π1 (θ) − π2 (θ) | ϕ∗ (θ) = | π1 (η) − π2 (η) | dη [Chen, Shao & Ibrahim, 2000]
  • 46. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Improving upon bridge sampler Theorem 5.5.3: The asymptotic variance of the optimal ratio importance sampling estimator is smaller than the asymptotic variance of the optimal bridge sampling estimator [Chen, Shao, & Ibrahim, 2000] Does not require the normalising constant | π1 (η) − π2 (η) | dη but a simulation from ϕ∗ (θ) ∝| π1 (θ) − π2 (θ) | .
  • 47. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Improving upon bridge sampler Theorem 5.5.3: The asymptotic variance of the optimal ratio importance sampling estimator is smaller than the asymptotic variance of the optimal bridge sampling estimator [Chen, Shao, & Ibrahim, 2000] Does not require the normalising constant | π1 (η) − π2 (η) | dη but a simulation from ϕ∗ (θ) ∝| π1 (θ) − π2 (θ) | .
  • 48. On some computational methods for Bayesian model choice Importance sampling solutions Varying dimensions Generalisation to point null situations When π1 (θ1 )dθ1 ˜ Θ1 B12 = π2 (θ2 )dθ2 ˜ Θ2 and Θ2 = Θ1 × Ψ, we get θ2 = (θ1 , ψ) and π1 (θ1 )ω(ψ|θ1 ) ˜ B12 = Eπ2 π2 (θ1 , ψ) ˜ holds for any conditional density ω(ψ|θ1 ).
  • 49. On some computational methods for Bayesian model choice Importance sampling solutions Varying dimensions X-dimen’al bridge sampling Generalisation of the previous identity: For any α, Eπ [˜1 (θ1 )ω(ψ|θ1 )α(θ1 , ψ)] π B12 = 2 Eπ1 ×ω [˜2 (θ1 , ψ)α(θ1 , ψ)] π and, for any density ϕ, Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)] π B12 = Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)] π [Chen, Shao, & Ibrahim, 2000] Optimal choice: ω(ψ|θ1 ) = π2 (ψ|θ1 ) [Theorem 5.8.2]
  • 50. On some computational methods for Bayesian model choice Importance sampling solutions Varying dimensions X-dimen’al bridge sampling Generalisation of the previous identity: For any α, Eπ [˜1 (θ1 )ω(ψ|θ1 )α(θ1 , ψ)] π B12 = 2 Eπ1 ×ω [˜2 (θ1 , ψ)α(θ1 , ψ)] π and, for any density ϕ, Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)] π B12 = Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)] π [Chen, Shao, & Ibrahim, 2000] Optimal choice: ω(ψ|θ1 ) = π2 (ψ|θ1 ) [Theorem 5.8.2]
  • 51. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Zk from a posterior sample Use of the [harmonic mean] identity ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1 Eπk x = dθk = πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk no matter what the proposal ϕ(·) is. [Gelfand & Dey, 1994; Bartolucci et al., 2006] Direct exploitation of the MCMC output RB-RJ
  • 52. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Zk from a posterior sample Use of the [harmonic mean] identity ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1 Eπk x = dθk = πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk no matter what the proposal ϕ(·) is. [Gelfand & Dey, 1994; Bartolucci et al., 2006] Direct exploitation of the MCMC output RB-RJ
  • 53. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Comparison with regular importance sampling Harmonic mean: Constraint opposed to usual importance sampling constraints: ϕ(θ) must have lighter (rather than fatter) tails than πk (θk )Lk (θk ) for the approximation T (t) 1 ϕ(θk ) Z1k = 1 (t) (t) T πk (θk )Lk (θk ) t=1 to have a finite variance. E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
  • 54. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Comparison with regular importance sampling Harmonic mean: Constraint opposed to usual importance sampling constraints: ϕ(θ) must have lighter (rather than fatter) tails than πk (θk )Lk (θk ) for the approximation T (t) 1 ϕ(θk ) Z1k = 1 (t) (t) T πk (θk )Lk (θk ) t=1 to have a finite variance. E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
  • 55. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Comparison with regular importance sampling (cont’d) Compare Z1k with a standard importance sampling approximation T (t) (t) 1 πk (θk )Lk (θk ) Z2k = (t) T ϕ(θk ) t=1 (t) where the θk ’s are generated from the density ϕ(·) (with fatter tails like t’s)
  • 56. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means HPD indicator as ϕ Use the convex hull of MCMC simulations corresponding to the 10% HPD region (easily derived!) and ϕ as indicator: 1 ϕ(θ) = I (t) T t d(θ,θ )≤
  • 57. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means HPD indicator as ϕ Use the convex hull of MCMC simulations corresponding to the 10% HPD region (easily derived!) and ϕ as indicator: 1 ϕ(θ) = I (t) T t d(θ,θ )≤
  • 58. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Diabetes in Pima Indian women (cont’d) Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations for a simulation from the above harmonic mean sampler and importance samplers 3.102 3.104 3.106 3.108 3.110 3.112 3.114 3.116 q q Harmonic mean Importance sampling
  • 59. On some computational methods for Bayesian model choice Importance sampling solutions Mixtures to bridge Approximating Zk using a mixture representation Bridge sampling redux Design a specific mixture for simulation [importance sampling] purposes, with density ϕk (θk ) ∝ ω1 πk (θk )Lk (θk ) + ϕ(θk ) , where ϕ(·) is arbitrary (but normalised) Note: ω1 is not a probability weight
  • 60. On some computational methods for Bayesian model choice Importance sampling solutions Mixtures to bridge Approximating Zk using a mixture representation Bridge sampling redux Design a specific mixture for simulation [importance sampling] purposes, with density ϕk (θk ) ∝ ω1 πk (θk )Lk (θk ) + ϕ(θk ) , where ϕ(·) is arbitrary (but normalised) Note: ω1 is not a probability weight
  • 61. On some computational methods for Bayesian model choice Importance sampling solutions Mixtures to bridge Approximating Z using a mixture representation (cont’d) Corresponding MCMC (=Gibbs) sampler At iteration t 1 Take δ (t) = 1 with probability (t−1) (t−1) (t−1) (t−1) (t−1) ω1 πk (θk )Lk (θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) and δ (t) = 2 otherwise; (t) (t−1) 2 If δ (t) = 1, generate θk ∼ MCMC(θk , θk ) where MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated with the posterior πk (θk |x) ∝ πk (θk )Lk (θk ); (t) 3 If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
  • 62. On some computational methods for Bayesian model choice Importance sampling solutions Mixtures to bridge Approximating Z using a mixture representation (cont’d) Corresponding MCMC (=Gibbs) sampler At iteration t 1 Take δ (t) = 1 with probability (t−1) (t−1) (t−1) (t−1) (t−1) ω1 πk (θk )Lk (θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) and δ (t) = 2 otherwise; (t) (t−1) 2 If δ (t) = 1, generate θk ∼ MCMC(θk , θk ) where MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated with the posterior πk (θk |x) ∝ πk (θk )Lk (θk ); (t) 3 If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
  • 63. On some computational methods for Bayesian model choice Importance sampling solutions Mixtures to bridge Approximating Z using a mixture representation (cont’d) Corresponding MCMC (=Gibbs) sampler At iteration t 1 Take δ (t) = 1 with probability (t−1) (t−1) (t−1) (t−1) (t−1) ω1 πk (θk )Lk (θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) and δ (t) = 2 otherwise; (t) (t−1) 2 If δ (t) = 1, generate θk ∼ MCMC(θk , θk ) where MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated with the posterior πk (θk |x) ∝ πk (θk )Lk (θk ); (t) 3 If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
  • 64. On some computational methods for Bayesian model choice Importance sampling solutions Mixtures to bridge Evidence approximation by mixtures Rao-Blackwellised estimate T ˆ 1 ξ= (t) ω1 πk (θk )Lk (θk ) (t) (t) (t) ω1 πk (θk )Lk (θk ) + ϕ(θk ) , (t) T t=1 converges to ω1 Zk /{ω1 Zk + 1} 3k ˆ ˆ ˆ Deduce Zˆ from ω1 Z3k /{ω1 Z3k + 1} = ξ ie T (t) (t) (t) (t) (t) t=1 ω1 πk (θk )Lk (θk ) ω1 π(θk )Lk (θk ) + ϕ(θk ) ˆ Z3k = T (t) (t) (t) (t) t=1 ϕ(θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) [Bridge sampler]
  • 65. On some computational methods for Bayesian model choice Importance sampling solutions Mixtures to bridge Evidence approximation by mixtures Rao-Blackwellised estimate T ˆ 1 ξ= (t) ω1 πk (θk )Lk (θk ) (t) (t) (t) ω1 πk (θk )Lk (θk ) + ϕ(θk ) , (t) T t=1 converges to ω1 Zk /{ω1 Zk + 1} 3k ˆ ˆ ˆ Deduce Zˆ from ω1 Z3k /{ω1 Z3k + 1} = ξ ie T (t) (t) (t) (t) (t) t=1 ω1 πk (θk )Lk (θk ) ω1 π(θk )Lk (θk ) + ϕ(θk ) ˆ Z3k = T (t) (t) (t) (t) t=1 ϕ(θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) [Bridge sampler]
  • 66. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Chib’s representation Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and θk ∼ πk (θk ), fk (x|θk ) πk (θk ) Zk = mk (x) = πk (θk |x) Use of an approximation to the posterior ∗ ∗ fk (x|θk ) πk (θk ) Zk = mk (x) = . ˆ ∗ πk (θk |x)
  • 67. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Chib’s representation Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and θk ∼ πk (θk ), fk (x|θk ) πk (θk ) Zk = mk (x) = πk (θk |x) Use of an approximation to the posterior ∗ ∗ fk (x|θk ) πk (θk ) Zk = mk (x) = . ˆ ∗ πk (θk |x)
  • 68. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Case of latent variables For missing variable z as in mixture models, natural Rao-Blackwell estimate T ∗ 1 ∗ (t) πk (θk |x) = πk (θk |x, zk ) , T t=1 (t) where the zk ’s are Gibbs sampled latent variables
  • 69. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Label switching A mixture model [special case of missing variable model] is invariant under permutations of the indices of the components. E.g., mixtures 0.3N (0, 1) + 0.7N (2.3, 1) and 0.7N (2.3, 1) + 0.3N (0, 1) are exactly the same! c The component parameters θi are not identifiable marginally since they are exchangeable
  • 70. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Label switching A mixture model [special case of missing variable model] is invariant under permutations of the indices of the components. E.g., mixtures 0.3N (0, 1) + 0.7N (2.3, 1) and 0.7N (2.3, 1) + 0.3N (0, 1) are exactly the same! c The component parameters θi are not identifiable marginally since they are exchangeable
  • 71. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Connected difficulties 1 Number of modes of the likelihood of order O(k!): c Maximization and even [MCMC] exploration of the posterior surface harder 2 Under exchangeable priors on (θ, p) [prior invariant under permutation of the indices], all posterior marginals are identical: c Posterior expectation of θ1 equal to posterior expectation of θ2
  • 72. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Connected difficulties 1 Number of modes of the likelihood of order O(k!): c Maximization and even [MCMC] exploration of the posterior surface harder 2 Under exchangeable priors on (θ, p) [prior invariant under permutation of the indices], all posterior marginals are identical: c Posterior expectation of θ1 equal to posterior expectation of θ2
  • 73. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution License Since Gibbs output does not produce exchangeability, the Gibbs sampler has not explored the whole parameter space: it lacks energy to switch simultaneously enough component allocations at once 0.2 0.3 0.4 0.5 −1 0 1 2 3 µi 0 100 200 n 300 400 500 pi −1 0 µ 1 i 2 3 0.4 0.6 0.8 1.0 0.2 0.3 0.4 0.5 σi pi 0 100 200 300 400 500 0.2 0.3 0.4 0.5 n pi 0.4 0.6 0.8 1.0 −1 0 1 2 3 σi µi 0 100 200 300 400 500 0.4 0.6 0.8 1.0 n σi
  • 74. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Label switching paradox We should observe the exchangeability of the components [label switching] to conclude about convergence of the Gibbs sampler. If we observe it, then we do not know how to estimate the parameters. If we do not, then we are uncertain about the convergence!!!
  • 75. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Label switching paradox We should observe the exchangeability of the components [label switching] to conclude about convergence of the Gibbs sampler. If we observe it, then we do not know how to estimate the parameters. If we do not, then we are uncertain about the convergence!!!
  • 76. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Label switching paradox We should observe the exchangeability of the components [label switching] to conclude about convergence of the Gibbs sampler. If we observe it, then we do not know how to estimate the parameters. If we do not, then we are uncertain about the convergence!!!
  • 77. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Compensation for label switching (t) For mixture models, zk usually fails to visit all configurations in a balanced way, despite the symmetry predicted by the theory 1 πk (θk |x) = πk (σ(θk )|x) = πk (σ(θk )|x) k! σ∈S for all σ’s in Sk , set of all permutations of {1, . . . , k}. Consequences on numerical approximation, biased by an order k! Recover the theoretical symmetry by using T ∗ 1 ∗ (t) πk (θk |x) = πk (σ(θk )|x, zk ) . T k! σ∈Sk t=1 [Berkhof, Mechelen, & Gelman, 2003]
  • 78. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Compensation for label switching (t) For mixture models, zk usually fails to visit all configurations in a balanced way, despite the symmetry predicted by the theory 1 πk (θk |x) = πk (σ(θk )|x) = πk (σ(θk )|x) k! σ∈S for all σ’s in Sk , set of all permutations of {1, . . . , k}. Consequences on numerical approximation, biased by an order k! Recover the theoretical symmetry by using T ∗ 1 ∗ (t) πk (θk |x) = πk (σ(θk )|x, zk ) . T k! σ∈Sk t=1 [Berkhof, Mechelen, & Gelman, 2003]
  • 79. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Galaxy dataset n = 82 galaxies as a mixture of k normal distributions with both mean and variance unknown. [Roeder, 1992] Average density 0.8 0.6 Relative Frequency 0.4 0.2 0.0 −2 −1 0 1 2 3 data
  • 80. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Galaxy dataset (k) ∗ Using only the original estimate, with θk as the MAP estimator, log(mk (x)) = −105.1396 ˆ for k = 3 (based on 103 simulations), while introducing the permutations leads to log(mk (x)) = −103.3479 ˆ Note that −105.1396 + log(3!) = −103.3479 k 2 3 4 5 6 7 8 mk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44 Estimations of the marginal likelihoods by the symmetrised Chib’s approximation (based on 105 Gibbs iterations and, for k > 5, 100 permutations selected at random in Sk ). [Lee, Marin, Mengersen & Robert, 2008]
  • 81. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Galaxy dataset (k) ∗ Using only the original estimate, with θk as the MAP estimator, log(mk (x)) = −105.1396 ˆ for k = 3 (based on 103 simulations), while introducing the permutations leads to log(mk (x)) = −103.3479 ˆ Note that −105.1396 + log(3!) = −103.3479 k 2 3 4 5 6 7 8 mk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44 Estimations of the marginal likelihoods by the symmetrised Chib’s approximation (based on 105 Gibbs iterations and, for k > 5, 100 permutations selected at random in Sk ). [Lee, Marin, Mengersen & Robert, 2008]
  • 82. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Galaxy dataset (k) ∗ Using only the original estimate, with θk as the MAP estimator, log(mk (x)) = −105.1396 ˆ for k = 3 (based on 103 simulations), while introducing the permutations leads to log(mk (x)) = −103.3479 ˆ Note that −105.1396 + log(3!) = −103.3479 k 2 3 4 5 6 7 8 mk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44 Estimations of the marginal likelihoods by the symmetrised Chib’s approximation (based on 105 Gibbs iterations and, for k > 5, 100 permutations selected at random in Sk ). [Lee, Marin, Mengersen & Robert, 2008]
  • 83. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Case of the probit model For the completion by z, 1 π (θ|x) = ˆ π(θ|x, z (t) ) T t is a simple average of normal densities R Bridge sampling code gibbs1=gibbsprobit(Niter,y,X1) gibbs2=gibbsprobit(Niter,y,X2) bfchi=mean(exp(dmvlnorm(t(t(gibbs2$mu)-model2$coeff[,1]),mean=rep(0,3), sigma=gibbs2$Sigma2)-probitlpost(model2$coeff[,1],y,X2)))/ mean(exp(dmvlnorm(t(t(gibbs1$mu)-model1$coeff[,1]),mean=rep(0,2), sigma=gibbs1$Sigma2)-probitlpost(model1$coeff[,1],y,X1)))
  • 84. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Case of the probit model For the completion by z, 1 π (θ|x) = ˆ π(θ|x, z (t) ) T t is a simple average of normal densities R Bridge sampling code gibbs1=gibbsprobit(Niter,y,X1) gibbs2=gibbsprobit(Niter,y,X2) bfchi=mean(exp(dmvlnorm(t(t(gibbs2$mu)-model2$coeff[,1]),mean=rep(0,3), sigma=gibbs2$Sigma2)-probitlpost(model2$coeff[,1],y,X2)))/ mean(exp(dmvlnorm(t(t(gibbs1$mu)-model1$coeff[,1]),mean=rep(0,2), sigma=gibbs1$Sigma2)-probitlpost(model1$coeff[,1],y,X1)))
  • 85. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Diabetes in Pima Indian women (cont’d) Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations for a simulation from the above Chib’s and importance samplers q 0.0255 q q q q 0.0250 0.0245 0.0240 q q Chib's method importance sampling
  • 86. On some computational methods for Bayesian model choice Importance sampling solutions The Savage–Dickey ratio The Savage–Dickey ratio Special representation of the Bayes factor used for simulation Given a test H0 : θ = θ0 in a model f (x|θ, ψ) with a nuisance parameter ψ, under priors π0 (ψ) and π1 (θ, ψ) such that π1 (ψ|θ0 ) = π0 (ψ) then π1 (θ0 |x) B01 = , π1 (θ0 ) with the obvious notations π1 (θ) = π1 (θ, ψ)dψ , π1 (θ|x) = π1 (θ, ψ|x)dψ , [Dickey, 1971; Verdinelli & Wasserman, 1995]
  • 87. On some computational methods for Bayesian model choice Importance sampling solutions The Savage–Dickey ratio Measure-theoretic difficulty The representation depends on the choice of versions of conditional densities: π0 (ψ)f (x|θ0 , ψ) dψ B01 = [by definition] π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (ψ|θ0 )f (x|θ0 , ψ) dψ π1 (θ0 ) = [specific version of π1 (ψ|θ0 )] π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (θ0 ) π1 (θ0 , ψ)f (x|θ0 , ψ) dψ = [specific version of π1 (θ0 , ψ)] m1 (x)π1 (θ0 ) π1 (θ0 |x) = π1 (θ0 ) c Dickey’s (1971) condition is not a condition
  • 88. On some computational methods for Bayesian model choice Importance sampling solutions The Savage–Dickey ratio Similar measure-theoretic difficulty Verdinelli-Wasserman extension: π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ) B01 = E π1 (θ0 ) π1 (ψ|θ0 ) depends on similar choices of versions Monte Carlo implementation relies on continuous versions of all densities without making mention of it [Chen, Shao & Ibrahim, 2000]
  • 89. On some computational methods for Bayesian model choice Importance sampling solutions The Savage–Dickey ratio Similar measure-theoretic difficulty Verdinelli-Wasserman extension: π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ) B01 = E π1 (θ0 ) π1 (ψ|θ0 ) depends on similar choices of versions Monte Carlo implementation relies on continuous versions of all densities without making mention of it [Chen, Shao & Ibrahim, 2000]
  • 90. On some computational methods for Bayesian model choice Importance sampling solutions The Savage–Dickey ratio Computational implementation Starting from the (new) prior π1 (θ, ψ) = π1 (θ)π0 (ψ) ˜ define the associated posterior π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x) ˜ ˜ and impose π1 (θ0 |x) ˜ π0 (ψ)f (x|θ0 , ψ) dψ = π0 (θ0 ) m1 (x) ˜ to hold. Then π1 (θ0 |x) m1 (x) ˜ ˜ B01 = π1 (θ0 ) m1 (x)
  • 91. On some computational methods for Bayesian model choice Importance sampling solutions The Savage–Dickey ratio Computational implementation Starting from the (new) prior π1 (θ, ψ) = π1 (θ)π0 (ψ) ˜ define the associated posterior π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x) ˜ ˜ and impose π1 (θ0 |x) ˜ π0 (ψ)f (x|θ0 , ψ) dψ = π0 (θ0 ) m1 (x) ˜ to hold. Then π1 (θ0 |x) m1 (x) ˜ ˜ B01 = π1 (θ0 ) m1 (x)
  • 92. On some computational methods for Bayesian model choice Importance sampling solutions The Savage–Dickey ratio First ratio If (θ(1) , ψ (1) ), . . . , (θ(T ) , ψ (T ) ) ∼ π (θ, ψ|x), then ˜ 1 π1 (θ0 |x, ψ (t) ) ˜ T t converges to π1 (θ0 |x) (if the right version is used in θ0 ). ˜ When π1 (θ0 |x, ψ unavailable, replace with ˜ T 1 π1 (θ0 |x, z (t) , ψ (t) ) ˜ T t=1
  • 93. On some computational methods for Bayesian model choice Importance sampling solutions The Savage–Dickey ratio First ratio If (θ(1) , ψ (1) ), . . . , (θ(T ) , ψ (T ) ) ∼ π (θ, ψ|x), then ˜ 1 π1 (θ0 |x, ψ (t) ) ˜ T t converges to π1 (θ0 |x) (if the right version is used in θ0 ). ˜ When π1 (θ0 |x, ψ unavailable, replace with ˜ T 1 π1 (θ0 |x, z (t) , ψ (t) ) ˜ T t=1
  • 94. On some computational methods for Bayesian model choice Importance sampling solutions The Savage–Dickey ratio Bridge revival (1) Since m1 (x)/m1 (x) is unknown, apparent failure! ˜ Use of the identity π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x) Eπ1 (θ,ψ|x) ˜ = Eπ1 (θ,ψ|x) ˜ = π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x) ˜ to (biasedly) estimate m1 (x)/m1 (x) by ˜ T π1 (ψ (t) |θ(t) ) T t=1 π0 (ψ (t) ) based on the same sample from π1 . ˜
  • 95. On some computational methods for Bayesian model choice Importance sampling solutions The Savage–Dickey ratio Bridge revival (1) Since m1 (x)/m1 (x) is unknown, apparent failure! ˜ Use of the identity π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x) Eπ1 (θ,ψ|x) ˜ = Eπ1 (θ,ψ|x) ˜ = π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x) ˜ to (biasedly) estimate m1 (x)/m1 (x) by ˜ T π1 (ψ (t) |θ(t) ) T t=1 π0 (ψ (t) ) based on the same sample from π1 . ˜
  • 96. On some computational methods for Bayesian model choice Importance sampling solutions The Savage–Dickey ratio Bridge revival (2) Alternative identity π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x) ˜ Eπ1 (θ,ψ|x) = Eπ1 (θ,ψ|x) = π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x) ¯ ¯ suggests using a second sample (θ(1) , ψ (1) , z (1) ), . . . , ¯(T ) , ψ (T ) , z (T ) ) ∼ π1 (θ, ψ|x) and (θ ¯ T ¯ 1 π0 (ψ (t) ) T ¯ ¯ π1 (ψ (t) |θ(t) ) t=1 Resulting estimate: (t) , ψ (t) ) T ¯ 1 t π1 (θ0 |x, z ˜ 1 π0 (ψ (t) ) B01 = T π1 (θ0 ) T t=1 π1 (ψ ¯ ¯ (t) |θ (t) )
  • 97. On some computational methods for Bayesian model choice Importance sampling solutions The Savage–Dickey ratio Bridge revival (2) Alternative identity π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x) ˜ Eπ1 (θ,ψ|x) = Eπ1 (θ,ψ|x) = π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x) ¯ ¯ suggests using a second sample (θ(1) , ψ (1) , z (1) ), . . . , ¯(T ) , ψ (T ) , z (T ) ) ∼ π1 (θ, ψ|x) and (θ ¯ T ¯ 1 π0 (ψ (t) ) T ¯ ¯ π1 (ψ (t) |θ(t) ) t=1 Resulting estimate: (t) , ψ (t) ) T ¯ 1 t π1 (θ0 |x, z ˜ 1 π0 (ψ (t) ) B01 = T π1 (θ0 ) T t=1 π1 (ψ ¯ ¯ (t) |θ (t) )
  • 98. On some computational methods for Bayesian model choice Importance sampling solutions The Savage–Dickey ratio Diabetes in Pima Indian women (cont’d) Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations for a simulation from the above importance, Chib’s, Savage–Dickey’s and bridge samplers q 3.4 3.2 3.0 2.8 q IS Chib Savage−Dickey Bridge
  • 99. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Bayesian variable selection Example of a regression setting: one dependent random variable y and a set {x1 , . . . , xk } of k explanatory variables. Question: Are all xi ’s involved in the regression? Assumption: every subset {i1 , . . . , iq } of q (0 ≤ q ≤ k) explanatory variables, {1n , xi1 , . . . , xiq }, is a proper set of explanatory variables for the regression of y [intercept included in every corresponding model] Computational issue 2k models in competition... [Marin & Robert, Bayesian Core, 2007]
  • 100. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Bayesian variable selection Example of a regression setting: one dependent random variable y and a set {x1 , . . . , xk } of k explanatory variables. Question: Are all xi ’s involved in the regression? Assumption: every subset {i1 , . . . , iq } of q (0 ≤ q ≤ k) explanatory variables, {1n , xi1 , . . . , xiq }, is a proper set of explanatory variables for the regression of y [intercept included in every corresponding model] Computational issue 2k models in competition... [Marin & Robert, Bayesian Core, 2007]
  • 101. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Bayesian variable selection Example of a regression setting: one dependent random variable y and a set {x1 , . . . , xk } of k explanatory variables. Question: Are all xi ’s involved in the regression? Assumption: every subset {i1 , . . . , iq } of q (0 ≤ q ≤ k) explanatory variables, {1n , xi1 , . . . , xiq }, is a proper set of explanatory variables for the regression of y [intercept included in every corresponding model] Computational issue 2k models in competition... [Marin & Robert, Bayesian Core, 2007]
  • 102. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Bayesian variable selection Example of a regression setting: one dependent random variable y and a set {x1 , . . . , xk } of k explanatory variables. Question: Are all xi ’s involved in the regression? Assumption: every subset {i1 , . . . , iq } of q (0 ≤ q ≤ k) explanatory variables, {1n , xi1 , . . . , xiq }, is a proper set of explanatory variables for the regression of y [intercept included in every corresponding model] Computational issue 2k models in competition... [Marin & Robert, Bayesian Core, 2007]
  • 103. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Reversible jump Idea: Set up a proper measure–theoretic framework for designing moves between models Mk [Green, 1995] Create a reversible kernel K on H = k {k} × Θk such that K(x, dy)π(x)dx = K(y, dx)π(y)dy A B B A for the invariant density π [x is of the form (k, θ(k) )]
  • 104. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Reversible jump Idea: Set up a proper measure–theoretic framework for designing moves between models Mk [Green, 1995] Create a reversible kernel K on H = k {k} × Θk such that K(x, dy)π(x)dx = K(y, dx)π(y)dy A B B A for the invariant density π [x is of the form (k, θ(k) )]
  • 105. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Local moves For a move between two models, M1 and M2 , the Markov chain being in state θ1 ∈ M1 , denote by K1→2 (θ1 , dθ) and K2→1 (θ2 , dθ) the corresponding kernels, under the detailed balance condition π(dθ1 ) K1→2 (θ1 , dθ) = π(dθ2 ) K2→1 (θ2 , dθ) , and take, wlog, dim(M2 ) > dim(M1 ). Proposal expressed as θ2 = Ψ1→2 (θ1 , v1→2 ) where v1→2 is a random variable of dimension dim(M2 ) − dim(M1 ), generated as v1→2 ∼ ϕ1→2 (v1→2 ) .
  • 106. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Local moves For a move between two models, M1 and M2 , the Markov chain being in state θ1 ∈ M1 , denote by K1→2 (θ1 , dθ) and K2→1 (θ2 , dθ) the corresponding kernels, under the detailed balance condition π(dθ1 ) K1→2 (θ1 , dθ) = π(dθ2 ) K2→1 (θ2 , dθ) , and take, wlog, dim(M2 ) > dim(M1 ). Proposal expressed as θ2 = Ψ1→2 (θ1 , v1→2 ) where v1→2 is a random variable of dimension dim(M2 ) − dim(M1 ), generated as v1→2 ∼ ϕ1→2 (v1→2 ) .
  • 107. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Local moves (2) In this case, q1→2 (θ1 , dθ2 ) has density −1 ∂Ψ1→2 (θ1 , v1→2 ) ϕ1→2 (v1→2 ) , ∂(θ1 , v1→2 ) by the Jacobian rule. Reverse importance link If probability 1→2 of choosing move to M2 while in M1 , acceptance probability reduces to π(M2 , θ2 ) 2→1 ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1∧ . π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) If several models are considered simultaneously, with probability 1→2 of choosing move to M2 while in M1 , as in ∞ XZ K(x, B) = ρm (x, y)qm (x, dy) + ω(x)IB (x) m=1
  • 108. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Local moves (2) In this case, q1→2 (θ1 , dθ2 ) has density −1 ∂Ψ1→2 (θ1 , v1→2 ) ϕ1→2 (v1→2 ) , ∂(θ1 , v1→2 ) by the Jacobian rule. Reverse importance link If probability 1→2 of choosing move to M2 while in M1 , acceptance probability reduces to π(M2 , θ2 ) 2→1 ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1∧ . π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) If several models are considered simultaneously, with probability 1→2 of choosing move to M2 while in M1 , as in ∞ XZ K(x, B) = ρm (x, y)qm (x, dy) + ω(x)IB (x) m=1
  • 109. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Local moves (2) In this case, q1→2 (θ1 , dθ2 ) has density −1 ∂Ψ1→2 (θ1 , v1→2 ) ϕ1→2 (v1→2 ) , ∂(θ1 , v1→2 ) by the Jacobian rule. Reverse importance link If probability 1→2 of choosing move to M2 while in M1 , acceptance probability reduces to π(M2 , θ2 ) 2→1 ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1∧ . π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) If several models are considered simultaneously, with probability 1→2 of choosing move to M2 while in M1 , as in ∞ XZ K(x, B) = ρm (x, y)qm (x, dy) + ω(x)IB (x) m=1
  • 110. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Generic reversible jump acceptance probability Acceptance probability of θ2 = Ψ1→2 (θ1 , v1→2 ) is π(M2 , θ2 ) 2→1 ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1 ∧ π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) while acceptance probability of θ1 with (θ1 , v1→2 ) = Ψ−1 (θ2 ) is 1→2 −1 π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1 ∧ π(M2 , θ2 ) 2→1 ∂(θ1 , v1→2 ) c Difficult calibration
  • 111. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Generic reversible jump acceptance probability Acceptance probability of θ2 = Ψ1→2 (θ1 , v1→2 ) is π(M2 , θ2 ) 2→1 ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1 ∧ π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) while acceptance probability of θ1 with (θ1 , v1→2 ) = Ψ−1 (θ2 ) is 1→2 −1 π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1 ∧ π(M2 , θ2 ) 2→1 ∂(θ1 , v1→2 ) c Difficult calibration
  • 112. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Generic reversible jump acceptance probability Acceptance probability of θ2 = Ψ1→2 (θ1 , v1→2 ) is π(M2 , θ2 ) 2→1 ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1 ∧ π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) while acceptance probability of θ1 with (θ1 , v1→2 ) = Ψ−1 (θ2 ) is 1→2 −1 π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1 ∧ π(M2 , θ2 ) 2→1 ∂(θ1 , v1→2 ) c Difficult calibration
  • 113. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Green’s sampler Algorithm Iteration t (t ≥ 1): if x(t) = (m, θ(m) ), 1 Select model Mn with probability πmn 2 Generate umn ∼ ϕmn (u) and set (θ(n) , vnm ) = Ψm→n (θ(m) , umn ) 3 Take x(t+1) = (n, θ(n) ) with probability π(n, θ(n) ) πnm ϕnm (vnm ) ∂Ψm→n (θ(m) , umn ) min ,1 π(m, θ(m) ) πmn ϕmn (umn ) ∂(θ(m) , umn ) and take x(t+1) = x(t) otherwise.
  • 114. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Interpretation The representation puts us back in a fixed dimension setting: M1 × V1→2 and M2 in one-to-one relation. reversibility imposes that θ1 is derived as (θ1 , v1→2 ) = Ψ−1 (θ2 ) 1→2 appears like a regular Metropolis–Hastings move from the couple (θ1 , v1→2 ) to θ2 when stationary distributions are π(M1 , θ1 ) × ϕ1→2 (v1→2 ) and π(M2 , θ2 ), and when proposal distribution is deterministic
  • 115. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Interpretation The representation puts us back in a fixed dimension setting: M1 × V1→2 and M2 in one-to-one relation. reversibility imposes that θ1 is derived as (θ1 , v1→2 ) = Ψ−1 (θ2 ) 1→2 appears like a regular Metropolis–Hastings move from the couple (θ1 , v1→2 ) to θ2 when stationary distributions are π(M1 , θ1 ) × ϕ1→2 (v1→2 ) and π(M2 , θ2 ), and when proposal distribution is deterministic
  • 116. On some computational methods for Bayesian model choice Cross-model solutions Saturation schemes Alternative Saturation of the parameter space H = k {k} × Θk by creating θ = (θ1 , . . . , θD ) a model index M pseudo-priors πj (θj |M = k) for j = k [Carlin & Chib, 1995] Validation by P(M = k|x) = P (M = k|x, θ)π(θ|x)dθ = Zk where the (marginal) posterior is [not πk !] D π(θ|x) = P(θ, M = k|x) k=1 D = pk Zk πk (θk |x) πj (θj |M = k) . k=1 j=k