SlideShare une entreprise Scribd logo
1  sur  103
Télécharger pour lire hors ligne
On some computational methods for Bayesian model choice




             On some computational methods for Bayesian
                          model choice

                                        Christian P. Robert

                               CREST-INSEE and Universit´ Paris Dauphine
                                                        e
                                   http://xianblog.wordpress.com


                    JSM 2009, Washington D.C., August 2, 2009
                     Joint works with M. Beaumont, N. Chopin,
                     J.-M. Cornuet, J.-M. Marin and D. Wraith
On some computational methods for Bayesian model choice




Outline


      1    Evidence

      2    Importance sampling solutions

      3    Nested sampling

      4    ABC model choice
On some computational methods for Bayesian model choice
  Evidence




Model choice and model comparison



      Choice between models
      Several models available for the same observation vector x

                                    Mi : x ∼ fi (x|θi ),   i∈I

      where the family I can be finite or infinite
On some computational methods for Bayesian model choice
  Evidence




Bayesian model choice
      Probabilise the entire model/parameter space
          allocate probabilities pi to all models Mi
          define priors πi (θi ) for each parameter space Θi
          compute

                                                        pi          fi (x|θi )πi (θi )dθi
                                                               Θi
                               π(Mi |x) =
                                                          pj         fj (x|θj )πj (θj )dθj
                                                    j           Θj


              take largest π(Mi |x) to determine “best” model,
              or use averaged predictive

                                        π(Mj |x)               fj (x |θj )πj (θj |x)dθj
                                    j                     Θj
On some computational methods for Bayesian model choice
  Evidence




Bayesian model choice
      Probabilise the entire model/parameter space
          allocate probabilities pi to all models Mi
          define priors πi (θi ) for each parameter space Θi
          compute

                                                        pi          fi (x|θi )πi (θi )dθi
                                                               Θi
                               π(Mi |x) =
                                                          pj         fj (x|θj )πj (θj )dθj
                                                    j           Θj


              take largest π(Mi |x) to determine “best” model,
              or use averaged predictive

                                        π(Mj |x)               fj (x |θj )πj (θj |x)dθj
                                    j                     Θj
On some computational methods for Bayesian model choice
  Evidence




Bayesian model choice
      Probabilise the entire model/parameter space
          allocate probabilities pi to all models Mi
          define priors πi (θi ) for each parameter space Θi
          compute

                                                        pi          fi (x|θi )πi (θi )dθi
                                                               Θi
                               π(Mi |x) =
                                                          pj         fj (x|θj )πj (θj )dθj
                                                    j           Θj


              take largest π(Mi |x) to determine “best” model,
              or use averaged predictive

                                        π(Mj |x)               fj (x |θj )πj (θj |x)dθj
                                    j                     Θj
On some computational methods for Bayesian model choice
  Evidence




Bayesian model choice
      Probabilise the entire model/parameter space
          allocate probabilities pi to all models Mi
          define priors πi (θi ) for each parameter space Θi
          compute

                                                        pi          fi (x|θi )πi (θi )dθi
                                                               Θi
                               π(Mi |x) =
                                                          pj         fj (x|θj )πj (θj )dθj
                                                    j           Θj


              take largest π(Mi |x) to determine “best” model,
              or use averaged predictive

                                        π(Mj |x)               fj (x |θj )πj (θj |x)dθj
                                    j                     Θj
On some computational methods for Bayesian model choice
  Evidence




Bayes factor



      Definition (Bayes factors)
      Central quantity

                                              Θi   fi (x|θi )πi (θi )dθi       mi (x)
                         Bij (x) ==                                        =
                                             Θj fj (x|θj )πj (θj )dθj          mj (x)

                                                                                 [Jeffreys, 1939]
On some computational methods for Bayesian model choice
  Evidence




Evidence




      All these problems end up with a similar quantity, the evidence
      k = 1, . . .
                                     Zk =            πk (θk )Lk (θk ) dθk ,
                                                Θk

      aka the marginal likelihood.
On some computational methods for Bayesian model choice
  Importance sampling solutions




Importance sampling revisited

      1    Evidence

      2    Importance sampling solutions
             Bridge sampling
             Harmonic means
             One bridge back

      3    Nested sampling

      4    ABC model choice
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Bridge sampling



Bayes factor approximation

      When approximating the Bayes factor

                                                      f0 (x|θ0 )π0 (θ0 )dθ0
                                                 Θ0
                                    B01 =
                                                      f1 (x|θ1 )π1 (θ1 )dθ1
                                                 Θ1

      use of importance functions                     0   and   1   and

                                       n−1
                                        0
                                                 n0         i       i
                                                 i=1 f0 (x|θ0 )π0 (θ0 )/
                                                                                  i
                                                                              0 (θ0 )
                           B01 =
                                       n−1
                                        1
                                                 n1         i       i
                                                 i=1 f1 (x|θ1 )π1 (θ1 )/
                                                                                  i
                                                                              1 (θ1 )
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Bridge sampling



Bridge sampling identity

      Generic representation

                                       π2 (θ|x)α(θ)π1 (θ|x)dθ
                                       ˜
                  B12 =                                              ∀ α(·)
                                       π1 (θ|x)α(θ)π2 (θ|x)dθ
                                       ˜

                                        n1
                                  1
                                              π2 (θ1i |x)α(θ1i )
                                              ˜
                                  n1
                                        i=1
                           ≈             n2                        θji ∼ πj (θ|x)
                                  1
                                              π1 (θ2i |x)α(θ2i )
                                              ˜
                                  n2
                                        i=1

                           [Gelman & Meng, 1998; Chen, Shao & Ibrahim, 2000]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Bridge sampling



Optimal bridge sampling

      The optimal choice of auxiliary function is
                                                       n1 + n2
                                    α =
                                              n1 π1 (θ|x) + n2 π2 (θ|x)

      leading to
                                             n1
                                       1                    π2 (θ1i |x)
                                                             ˜
                                       n1         n1 π1 (θ1i |x) + n2 π2 (θ1i |x)
                                            i=1
                           B12 ≈             n2
                                       1                    π1 (θ2i |x)
                                                             ˜
                                       n2         n1 π1 (θ2i |x) + n2 π2 (θ2i |x)
                                            i=1

                                                                                    Back later!
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Bridge sampling



Optimal bridge sampling (2)



      Reason:

       Var(B12 )    1                      π1 (θ)π2 (θ)[n1 π1 (θ) + n2 π2 (θ)]α(θ)2 dθ
           2     ≈                                                              2        −1
         B12       n1 n2                                  π1 (θ)π2 (θ)α(θ) dθ

      δ method
      Drag: Dependence on the unknown normalising constants solved
      iteratively
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Bridge sampling



Optimal bridge sampling (2)



      Reason:

       Var(B12 )    1                      π1 (θ)π2 (θ)[n1 π1 (θ) + n2 π2 (θ)]α(θ)2 dθ
           2     ≈                                                              2        −1
         B12       n1 n2                                  π1 (θ)π2 (θ)α(θ) dθ

      δ method
      Drag: Dependence on the unknown normalising constants solved
      iteratively
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Approximating Zk from a posterior sample



      Use of the [harmonic mean] identity

                    ϕ(θk )                               ϕ(θk )       πk (θk )Lk (θk )       1
      Eπk                        x =                                                   dθk =
                πk (θk )Lk (θk )                     πk (θk )Lk (θk )       Zk               Zk

      no matter what the proposal ϕ(·) is.
                          [Gelfand & Dey, 1994; Bartolucci et al., 2006]
      Direct exploitation of the MCMC output
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Approximating Zk from a posterior sample



      Use of the [harmonic mean] identity

                    ϕ(θk )                               ϕ(θk )       πk (θk )Lk (θk )       1
      Eπk                        x =                                                   dθk =
                πk (θk )Lk (θk )                     πk (θk )Lk (θk )       Zk               Zk

      no matter what the proposal ϕ(·) is.
                          [Gelfand & Dey, 1994; Bartolucci et al., 2006]
      Direct exploitation of the MCMC output
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



RJ-MCMC interpretation


      If ϕ is the density of an alternative to model pik (θk )Lk (θk |x), the
      acceptance probability for a move from model pik (θk )Lk (θk |x) to
      model ϕ(θk ) is based upon

                                                    ϕ(θk )
                                               πk (θk )Lk (θk |x)

      (under a proper choice of prior weight)
      Unbiased estimator of the odds ratio
                                   [Bartolucci, Scaccia, and Mira, 2006]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



An infamous example

      When
                                               ϕ(θk ) = πk (θk )
                                                        ˜
      harmonic mean approximation to B12 is
                                         n1
                                   1
                                              1/L1 (θ1i |x)
                                   n1
                                        i=1
                       B21 =             n2                        θji ∼ πj (θ|x)
                                   1
                                              1/L2 (θ2i |x)
                                   n2
                                        i=1

                         [Newton & Raftery, 1994; Raftery et al., 2008]
      Infamous: Most often leads to an infinite variance!!!
                                             [Radford Neal’s blog, 2008]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



An infamous example

      When
                                               ϕ(θk ) = πk (θk )
                                                        ˜
      harmonic mean approximation to B12 is
                                         n1
                                   1
                                              1/L1 (θ1i |x)
                                   n1
                                        i=1
                       B21 =             n2                        θji ∼ πj (θ|x)
                                   1
                                              1/L2 (θ2i |x)
                                   n2
                                        i=1

                         [Newton & Raftery, 1994; Raftery et al., 2008]
      Infamous: Most often leads to an infinite variance!!!
                                             [Radford Neal’s blog, 2008]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



“The Worst Monte Carlo Method Ever”

      “The good news is that the Law of Large Numbers guarantees that
      this estimator is consistent ie, it will very likely be very close to the
      correct answer if you use a sufficiently large number of points from
      the posterior distribution.
      The bad news is that the number of points required for this
      estimator to get close to the right answer will often be greater
      than the number of atoms in the observable universe. The even
      worse news is that it’s easy for people to not realize this, and to
      naively accept estimates that are nowhere close to the correct
      value of the marginal likelihood.”
                                       [Radford Neal’s blog, Aug. 23, 2008]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



“The Worst Monte Carlo Method Ever”

      “The good news is that the Law of Large Numbers guarantees that
      this estimator is consistent ie, it will very likely be very close to the
      correct answer if you use a sufficiently large number of points from
      the posterior distribution.
      The bad news is that the number of points required for this
      estimator to get close to the right answer will often be greater
      than the number of atoms in the observable universe. The even
      worse news is that it’s easy for people to not realize this, and to
      naively accept estimates that are nowhere close to the correct
      value of the marginal likelihood.”
                                       [Radford Neal’s blog, Aug. 23, 2008]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Comparison with regular importance sampling


      Harmonic mean: Constraint opposed to usual importance sampling
      constraints: ϕ(θ) must have lighter (rather than fatter) tails than
      πk (θk )Lk (θk ) for the approximation
                                                          T               (t)
                                                  1                 ϕ(θk )
                                  Z1k = 1                           (t)         (t)
                                                  T             πk (θk )Lk (θk )
                                                          t=1

      to have a finite variance.
      E.g., use finite support kernels for ϕ
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Comparison with regular importance sampling


      Harmonic mean: Constraint opposed to usual importance sampling
      constraints: ϕ(θ) must have lighter (rather than fatter) tails than
      πk (θk )Lk (θk ) for the approximation
                                                          T               (t)
                                                  1                 ϕ(θk )
                                  Z1k = 1                           (t)         (t)
                                                  T             πk (θk )Lk (θk )
                                                          t=1

      to have a finite variance.
      E.g., use finite support kernels for ϕ
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



HPD supports



      In practice, possibility to use the HPD region defined by an MCMC
      sample as
                         Hα = {θ; π (θ|x) ≥ qα [˜ (·|x)]}
                                     ˜       ˆ π
      where qα [˜ (·|x)] is the empirical α quantile of the observed
               ˆ π
      π (θi |x))’s
      ˜
       c Construction of ϕ as uniformly supported by Hα or an ellipsoid
      approximation whose volume can be computed
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



HPD supports



      In practice, possibility to use the HPD region defined by an MCMC
      sample as
                         Hα = {θ; π (θ|x) ≥ qα [˜ (·|x)]}
                                     ˜       ˆ π
      where qα [˜ (·|x)] is the empirical α quantile of the observed
               ˆ π
      π (θi |x))’s
      ˜
       c Construction of ϕ as uniformly supported by Hα or an ellipsoid
      approximation whose volume can be computed
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



HPD supports: illustration
      Gibbs sample of 103 parameters (θ, σ 2 ) produced for the normal
      model, x1 , . . . , xn ∼ N (θ, σ 2 ) with x = 0, s2 = 1 and n = 10,
      under Jeffreys’ prior, leads to a straightforward approximation to
      the 10% HPD region, replaced by an ellipsoid.
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



HPD supports: illustration
      Gibbs sample of 103 parameters (θ, σ 2 ) produced for the normal
      model, x1 , . . . , xn ∼ N (θ, σ 2 ) with x = 0, s2 = 1 and n = 10,
      under Jeffreys’ prior, leads to a straightforward approximation to
      the 50% HPD region, replaced by an ellipsoid.
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Comparison with regular importance sampling (cont’d)



      Compare Z1k with a standard importance sampling approximation
                                                     T        (t)         (t)
                                            1             πk (θk )Lk (θk )
                                    Z2k   =                         (t)
                                            T                 ϕ(θk )
                                                   t=1

                         (t)
      where the θk ’s are generated from the density ϕ(·) (with fatter
      tails like t’s)
On some computational methods for Bayesian model choice
  Importance sampling solutions
     One bridge back



Approximating Zk using a mixture representation



                                                                            Bridge sampling recall

      Design a specific mixture for simulation [importance sampling]
      purposes, with density

                                  ϕk (θk ) ∝ ωπk (θk )Lk (θk ) + ϕ(θk ) ,

      where ϕ(·) is arbitrary (but normalised)
      Note: ω is not a probability weight
On some computational methods for Bayesian model choice
  Importance sampling solutions
     One bridge back



Approximating Zk using a mixture representation



                                                                            Bridge sampling recall

      Design a specific mixture for simulation [importance sampling]
      purposes, with density

                                  ϕk (θk ) ∝ ωπk (θk )Lk (θk ) + ϕ(θk ) ,

      where ϕ(·) is arbitrary (but normalised)
      Note: ω is not a probability weight
On some computational methods for Bayesian model choice
  Importance sampling solutions
     One bridge back



Approximating Z using a mixture representation (cont’d)

      Corresponding MCMC (=Gibbs) sampler
      At iteration t
          1   Take δ (t) = 1 with probability

                        (t−1)           (t−1)                   (t−1)         (t−1)          (t−1)
              ωπk (θk             )Lk (θk       )         ωπk (θk       )Lk (θk       ) + ϕ(θk       )

              and δ (t) = 2 otherwise;
                                                (t)                 (t−1)
          2   If δ (t) = 1, generate θk ∼ MCMC(θk          , θk ) where
              MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated
              with the posterior πk (θk |x) ∝ πk (θk )Lk (θk );
                                                (t)
          3   If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
On some computational methods for Bayesian model choice
  Importance sampling solutions
     One bridge back



Approximating Z using a mixture representation (cont’d)

      Corresponding MCMC (=Gibbs) sampler
      At iteration t
          1   Take δ (t) = 1 with probability

                        (t−1)           (t−1)                   (t−1)         (t−1)          (t−1)
              ωπk (θk             )Lk (θk       )         ωπk (θk       )Lk (θk       ) + ϕ(θk       )

              and δ (t) = 2 otherwise;
                                                (t)                 (t−1)
          2   If δ (t) = 1, generate θk ∼ MCMC(θk          , θk ) where
              MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated
              with the posterior πk (θk |x) ∝ πk (θk )Lk (θk );
                                                (t)
          3   If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
On some computational methods for Bayesian model choice
  Importance sampling solutions
     One bridge back



Approximating Z using a mixture representation (cont’d)

      Corresponding MCMC (=Gibbs) sampler
      At iteration t
          1   Take δ (t) = 1 with probability

                        (t−1)           (t−1)                   (t−1)         (t−1)          (t−1)
              ωπk (θk             )Lk (θk       )         ωπk (θk       )Lk (θk       ) + ϕ(θk       )

              and δ (t) = 2 otherwise;
                                                (t)                 (t−1)
          2   If δ (t) = 1, generate θk ∼ MCMC(θk          , θk ) where
              MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated
              with the posterior πk (θk |x) ∝ πk (θk )Lk (θk );
                                                (t)
          3   If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
On some computational methods for Bayesian model choice
  Importance sampling solutions
     One bridge back



Evidence approximation by mixtures
      Rao-Blackwellised estimate
                          T
             ˆ 1
             ξ=
                                        (t)
                                  ωπk (θk )Lk (θk )
                                                     (t)             (t)          (t)
                                                             ωπk (θk )Lk (θk ) + ϕ(θk ) ,
                                                                                               (t)
                T
                         t=1

      converges to ωZk /{ωZk + 1}
               3k
                         ˆ     ˆ          ˆ
      Deduce Zˆ from ω Z3k /{ω Z3k + 1} = ξ ie

                            T         (t)     (t)                     (t)          (t)          (t)
                            t=1 ωπk (θk )Lk (θk )              ωπ(θk )Lk (θk ) + ϕ(θk )
             ˆ
             Z3k =
                                    T      (t)                 (t)          (t)          (t)
                                    t=1 ϕ(θk )            ωπk (θk )Lk (θk ) + ϕ(θk )

                                                                            [Bridge sampler redux]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     One bridge back



Evidence approximation by mixtures
      Rao-Blackwellised estimate
                          T
             ˆ 1
             ξ=
                                        (t)
                                  ωπk (θk )Lk (θk )
                                                     (t)             (t)          (t)
                                                             ωπk (θk )Lk (θk ) + ϕ(θk ) ,
                                                                                               (t)
                T
                         t=1

      converges to ωZk /{ωZk + 1}
               3k
                         ˆ     ˆ          ˆ
      Deduce Zˆ from ω Z3k /{ω Z3k + 1} = ξ ie

                            T         (t)     (t)                     (t)          (t)          (t)
                            t=1 ωπk (θk )Lk (θk )              ωπ(θk )Lk (θk ) + ϕ(θk )
             ˆ
             Z3k =
                                    T      (t)                 (t)          (t)          (t)
                                    t=1 ϕ(θk )            ωπk (θk )Lk (θk ) + ϕ(θk )

                                                                            [Bridge sampler redux]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     One bridge back



Mixture illustration
      For the normal example, use of the same uniform ϕ over the ellipse
      approximating the same HPD region Hα and ω equal to one-tenth
      of the true evidence. Gibbs samples of size 104
On some computational methods for Bayesian model choice
  Importance sampling solutions
     One bridge back



Mixture illustration
      For the normal example, use of the same uniform ϕ over the ellipse
      approximating the same HPD region Hα and ω equal to one-tenth
      of the true evidence. Gibbs samples of size 104 and 100 Monte
      Carlo replicas
On some computational methods for Bayesian model choice
  Nested sampling




Nested sampling

      1    Evidence

      2    Importance sampling solutions

      3    Nested sampling
             Purpose
             Implementation
             Error rates
             Constraints

      4    ABC model choice
On some computational methods for Bayesian model choice
  Nested sampling
     Purpose



Nested sampling: Goal


      Skilling’s (2007) technique using the one-dimensional
      representation:
                                                              1
                                     Z = Eπ [L(θ)] =              ϕ(x) dx
                                                          0

      with
                                        ϕ−1 (l) = P π (L(θ) > l).
      Note; ϕ(·) is intractable in most cases.
On some computational methods for Bayesian model choice
  Nested sampling
     Implementation



Nested sampling: First approximation

      Approximate Z by a Riemann sum:
                                                N
                                        Z=           (xi−1 − xi )ϕ(xi )
                                               i=1

      where the xi ’s are either:
              deterministic:            xi = e−i/N
              or random:

                                x0 = 0,          xi+1 = ti xi ,   ti ∼ Be(N, 1)

              so that E[log xi ] = −i/N .
On some computational methods for Bayesian model choice
  Nested sampling
     Implementation



Nested sampling: Alternative representation

      Another representation is
                                           N −1
                                    Z=            {ϕ(xi+1 ) − ϕ(xi )}xi
                                            i=0

      which is a special case of
                         N −1
                    Z=          {L(θ(i+1) ) − L(θ(i) )}π({θ; L(θ) > L(θ(i) )})
                          i=0

      where · · · L(θ(i+1) ) < L(θ(i) ) · · ·
                                          [Lebesgue version of Riemann’s sum]
On some computational methods for Bayesian model choice
  Nested sampling
     Implementation



Nested sampling: Alternative representation

      Another representation is
                                           N −1
                                    Z=            {ϕ(xi+1 ) − ϕ(xi )}xi
                                            i=0

      which is a special case of
                         N −1
                    Z=          {L(θ(i+1) ) − L(θ(i) )}π({θ; L(θ) > L(θ(i) )})
                          i=0

      where · · · L(θ(i+1) ) < L(θ(i) ) · · ·
                                          [Lebesgue version of Riemann’s sum]
On some computational methods for Bayesian model choice
  Nested sampling
     Implementation



Nested sampling: Second approximation
      Estimate (intractable) ϕ(xi ) by ϕi :

      Nested sampling
      Start with N values θ1 , . . . , θN sampled from π
      At iteration i,
          1    Take ϕi = L(θk ), where θk is the point with smallest
               likelihood in the pool of θi ’s
          2    Replace θk with a sample from the prior constrained to
               L(θ) > ϕi : the current N points are sampled from prior
               constrained to L(θ) > ϕi .

      Note that
                                                                                          1
              π({θ; L(θ) > L(θ(i+1) )})                   π({θ; L(θ) > L(θ(i) )}) ≈ 1 −
                                                                                          N
On some computational methods for Bayesian model choice
  Nested sampling
     Implementation



Nested sampling: Second approximation
      Estimate (intractable) ϕ(xi ) by ϕi :

      Nested sampling
      Start with N values θ1 , . . . , θN sampled from π
      At iteration i,
          1    Take ϕi = L(θk ), where θk is the point with smallest
               likelihood in the pool of θi ’s
          2    Replace θk with a sample from the prior constrained to
               L(θ) > ϕi : the current N points are sampled from prior
               constrained to L(θ) > ϕi .

      Note that
                                                                                          1
              π({θ; L(θ) > L(θ(i+1) )})                   π({θ; L(θ) > L(θ(i) )}) ≈ 1 −
                                                                                          N
On some computational methods for Bayesian model choice
  Nested sampling
     Implementation



Nested sampling: Second approximation
      Estimate (intractable) ϕ(xi ) by ϕi :

      Nested sampling
      Start with N values θ1 , . . . , θN sampled from π
      At iteration i,
          1    Take ϕi = L(θk ), where θk is the point with smallest
               likelihood in the pool of θi ’s
          2    Replace θk with a sample from the prior constrained to
               L(θ) > ϕi : the current N points are sampled from prior
               constrained to L(θ) > ϕi .

      Note that
                                                                                          1
              π({θ; L(θ) > L(θ(i+1) )})                   π({θ; L(θ) > L(θ(i) )}) ≈ 1 −
                                                                                          N
On some computational methods for Bayesian model choice
  Nested sampling
     Implementation



Nested sampling: Second approximation
      Estimate (intractable) ϕ(xi ) by ϕi :

      Nested sampling
      Start with N values θ1 , . . . , θN sampled from π
      At iteration i,
          1    Take ϕi = L(θk ), where θk is the point with smallest
               likelihood in the pool of θi ’s
          2    Replace θk with a sample from the prior constrained to
               L(θ) > ϕi : the current N points are sampled from prior
               constrained to L(θ) > ϕi .

      Note that
                                                                                          1
              π({θ; L(θ) > L(θ(i+1) )})                   π({θ; L(θ) > L(θ(i) )}) ≈ 1 −
                                                                                          N
On some computational methods for Bayesian model choice
  Nested sampling
     Implementation



Nested sampling: Third approximation


      Iterate the above steps until a given stopping iteration j is
      reached: e.g.,
              observe very small changes in the approximation Z;
              reach the maximal value of L(θ) when the likelihood is
              bounded and its maximum is known;
              truncate the integral Z at level , i.e. replace
                                        1                        1
                                            ϕ(x) dx       with       ϕ(x) dx
                                    0
On some computational methods for Bayesian model choice
  Nested sampling
     Error rates



Approximation error


      Error = Z − Z
                        j                                 1
                   =         (xi−1 − xi )ϕi −                 ϕ(x) dx = −         ϕ(x) dx
                       i=1                            0                       0
                         j                                      1
                   +          (xi−1 − xi )ϕ(xi ) −                  ϕ(x) dx       (Quadrature Error)
                        i=1
                         j
                   +          (xi−1 − xi ) {ϕi − ϕ(xi )}                          (Stochastic Error)
                        i=1



                                                                     [Dominated by Monte Carlo]
On some computational methods for Bayesian model choice
  Nested sampling
     Error rates



A CLT for the Stochastic Error


      The (dominating) stochastic error is OP (N −1/2 ):
                                                              D
                                      N 1/2 Z − Z → N (0, V )

      with
                         V =−                     sϕ (s)tϕ (t) log(s ∨ t) ds dt.
                                      s,t∈[ ,1]

                                                          [Proof based on Donsker’s theorem]
On some computational methods for Bayesian model choice
  Nested sampling
     Error rates



What of log Z?

      If the interest lies within log Z, Slutsky’s transform of the CLT:
                                                          D
                            N −1/2 log Z − log Z → N 0, V /Z2


      Note: The number of simulated points equals the number of
      iterations j, and is a multiple of N : if one stops at first iteration j
      such that e−j/N < , then:

                                              j = N − log     .
On some computational methods for Bayesian model choice
  Nested sampling
     Error rates



What of log Z?

      If the interest lies within log Z, Slutsky’s transform of the CLT:
                                                          D
                            N −1/2 log Z − log Z → N 0, V /Z2


      Note: The number of simulated points equals the number of
      iterations j, and is a multiple of N : if one stops at first iteration j
      such that e−j/N < , then:

                                              j = N − log     .
On some computational methods for Bayesian model choice
  Nested sampling
     Error rates



What of log Z?

      If the interest lies within log Z, Slutsky’s transform of the CLT:
                                                          D
                            N −1/2 log Z − log Z → N 0, V /Z2


      Note: The number of simulated points equals the number of
      iterations j, and is a multiple of N : if one stops at first iteration j
      such that e−j/N < , then:

                                              j = N − log     .
On some computational methods for Bayesian model choice
  Nested sampling
     Constraints



Sampling from constr’d priors



      Exact simulation from the constrained prior is intractable in most
      cases

      Skilling (2007) proposes to use MCMC but this introduces a bias
      (stopping rule)
      If implementable, then slice sampler can be devised at the same
      cost [or less]
On some computational methods for Bayesian model choice
  Nested sampling
     Constraints



Sampling from constr’d priors



      Exact simulation from the constrained prior is intractable in most
      cases

      Skilling (2007) proposes to use MCMC but this introduces a bias
      (stopping rule)
      If implementable, then slice sampler can be devised at the same
      cost [or less]
On some computational methods for Bayesian model choice
  Nested sampling
     Constraints



Sampling from constr’d priors



      Exact simulation from the constrained prior is intractable in most
      cases

      Skilling (2007) proposes to use MCMC but this introduces a bias
      (stopping rule)
      If implementable, then slice sampler can be devised at the same
      cost [or less]
On some computational methods for Bayesian model choice
  Nested sampling
     Constraints



Banana illustration
      Case of a banana target made of a twisted 2D normal:
      x2 = x2 + βx2 − 100β
                   1
                                  [Haario, Sacksman, Tamminen, 1999]




                                       β = .03, Σ = diag(100, 1)
On some computational methods for Bayesian model choice
  Nested sampling
     Constraints



Banana illustration (2)
      Use of nested sampling with N = 1000, 50 MCMC steps with size
      0.1, compared with a population Monte Carlo (PMC) based on 10
      iterations with 5000 points per iteration and final sample of 50000
      points, using nine Student’s t components with 9 df
                [Wraith, Kilbinger, Benabed et al., 2009, Physica Rev. D]




                                            Evidence estimation
On some computational methods for Bayesian model choice
  Nested sampling
     Constraints



Banana illustration (2)
      Use of nested sampling with N = 1000, 50 MCMC steps with size
      0.1, compared with a population Monte Carlo (PMC) based on 10
      iterations with 5000 points per iteration and final sample of 50000
      points, using nine Student’s t components with 9 df
                [Wraith, Kilbinger, Benabed et al., 2009, Physica Rev. D]




                                              E[X1 ] estimation
On some computational methods for Bayesian model choice
  Nested sampling
     Constraints



Banana illustration (2)
      Use of nested sampling with N = 1000, 50 MCMC steps with size
      0.1, compared with a population Monte Carlo (PMC) based on 10
      iterations with 5000 points per iteration and final sample of 50000
      points, using nine Student’s t components with 9 df
                [Wraith, Kilbinger, Benabed et al., 2009, Physica Rev. D]




                                              E[X2 ] estimation
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method(s)



Approximate Bayesian Computation

      Bayesian setting: target is π(θ)f (x|θ)
      When likelihood f (x|θ) not in closed form, likelihood-free rejection
      technique:
      ABC algorithm
      For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly
      simulating
                           θ ∼ π(θ) , x ∼ f (x|θ ) ,
      until the auxiliary variable x is equal to the observed value, x = y.

                                                          [Pritchard et al., 1999]
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method(s)



Approximate Bayesian Computation

      Bayesian setting: target is π(θ)f (x|θ)
      When likelihood f (x|θ) not in closed form, likelihood-free rejection
      technique:
      ABC algorithm
      For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly
      simulating
                           θ ∼ π(θ) , x ∼ f (x|θ ) ,
      until the auxiliary variable x is equal to the observed value, x = y.

                                                          [Pritchard et al., 1999]
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method(s)



Approximate Bayesian Computation

      Bayesian setting: target is π(θ)f (x|θ)
      When likelihood f (x|θ) not in closed form, likelihood-free rejection
      technique:
      ABC algorithm
      For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly
      simulating
                           θ ∼ π(θ) , x ∼ f (x|θ ) ,
      until the auxiliary variable x is equal to the observed value, x = y.

                                                          [Pritchard et al., 1999]
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method(s)



A as approximative


      When y is a continuous random variable, equality x = y is replaced
      with a tolerance condition,

                                                     (x, y) ≤

      where is a distance between summary statistics
      Output distributed from

                            π(θ) Pθ { (x, y) < } ∝ π(θ| (x, y) < )
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method(s)



A as approximative


      When y is a continuous random variable, equality x = y is replaced
      with a tolerance condition,

                                                     (x, y) ≤

      where is a distance between summary statistics
      Output distributed from

                            π(θ) Pθ { (x, y) < } ∝ π(θ| (x, y) < )
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method(s)



Dynamic area
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method(s)



ABC improvements

      Simulating from the prior is often poor in efficiency
      Either modify the proposal distribution on θ to increase the density
      of x’s within the vicinity of y...
           [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]

      ...or by viewing the problem as a conditional density estimation
      and by developing techniques to allow for larger ....
                                                  [Beaumont et al., 2002]

      ...or even by including                in the inferential framework
                                                                   [Ratmann et al., 2009]
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method(s)



ABC improvements

      Simulating from the prior is often poor in efficiency
      Either modify the proposal distribution on θ to increase the density
      of x’s within the vicinity of y...
           [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]

      ...or by viewing the problem as a conditional density estimation
      and by developing techniques to allow for larger ....
                                                  [Beaumont et al., 2002]

      ...or even by including                in the inferential framework
                                                                   [Ratmann et al., 2009]
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method(s)



ABC improvements

      Simulating from the prior is often poor in efficiency
      Either modify the proposal distribution on θ to increase the density
      of x’s within the vicinity of y...
           [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]

      ...or by viewing the problem as a conditional density estimation
      and by developing techniques to allow for larger ....
                                                  [Beaumont et al., 2002]

      ...or even by including                in the inferential framework
                                                                   [Ratmann et al., 2009]
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method(s)



ABC improvements

      Simulating from the prior is often poor in efficiency
      Either modify the proposal distribution on θ to increase the density
      of x’s within the vicinity of y...
           [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]

      ...or by viewing the problem as a conditional density estimation
      and by developing techniques to allow for larger ....
                                                  [Beaumont et al., 2002]

      ...or even by including                in the inferential framework
                                                                   [Ratmann et al., 2009]
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method(s)



ABC-MCMC


      Markov chain (θ(t) ) created via the transition function
                 
                 θ ∼ K(θ |θ(t) ) if x ∼ f (x|θ ) is such that x = y
                 
                 
         (t+1)                                             π(θ )K(θ(t) |θ )
       θ       =                      and u ∼ U(0, 1) ≤ π(θ(t) )K(θ |θ(t) ) ,
                 
                  (t)
                 θ                  otherwise,

      has the posterior π(θ|y) as stationary distribution
                                                    [Marjoram et al, 2003]
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method(s)



ABC-MCMC


      Markov chain (θ(t) ) created via the transition function
                 
                 θ ∼ K(θ |θ(t) ) if x ∼ f (x|θ ) is such that x = y
                 
                 
         (t+1)                                             π(θ )K(θ(t) |θ )
       θ       =                      and u ∼ U(0, 1) ≤ π(θ(t) )K(θ |θ(t) ) ,
                 
                  (t)
                 θ                  otherwise,

      has the posterior π(θ|y) as stationary distribution
                                                    [Marjoram et al, 2003]
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method(s)



ABC-PRC

      Another sequential version producing a sequence of Markov
                                             (t)          (t)
      transition kernels Kt and of samples (θ1 , . . . , θN ) (1 ≤ t ≤ T )

      ABC-PRC Algorithm
                                                                            (t−1)
          1   Pick a θ is selected at random among the previous θi                  ’s
                                  (t−1)
              with probabilities ωi     (1 ≤ i ≤ N ).
          2   Generate
                                        (t)                      (t)
                                      θi ∼ Kt (θ|θ ) , x ∼ f (x|θi ) ,
          3   Check that (x, y) < , otherwise start again.

                                                                 [Sisson et al., 2007]
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method(s)



ABC-PRC

      Another sequential version producing a sequence of Markov
                                             (t)          (t)
      transition kernels Kt and of samples (θ1 , . . . , θN ) (1 ≤ t ≤ T )

      ABC-PRC Algorithm
                                                                            (t−1)
          1   Pick a θ is selected at random among the previous θi                  ’s
                                  (t−1)
              with probabilities ωi     (1 ≤ i ≤ N ).
          2   Generate
                                        (t)                      (t)
                                      θi ∼ Kt (θ|θ ) , x ∼ f (x|θi ) ,
          3   Check that (x, y) < , otherwise start again.

                                                                 [Sisson et al., 2007]
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method(s)



ABC-PRC weight

                            (t)
      Probability ωi              computed as
                      (t)            (t)                  (t)   (t)
                     ωi ∝ π(θi )Lt−1 (θ |θi ){π(θ )Kt (θi |θ )}−1 ,

       where Lt−1 is an arbitrary transition kernel.
      In case
                            Lt−1 (θ |θ) = Kt (θ|θ ) ,
      all weights are equal under a uniform prior.
      Inspired from Del Moral et al. (2006), who use backward kernels
      Lt−1 in SMC to achieve unbiasedness with a recent (2008)
      ABC-SMC version
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method(s)



ABC-PRC weight

                            (t)
      Probability ωi              computed as
                      (t)            (t)                  (t)   (t)
                     ωi ∝ π(θi )Lt−1 (θ |θi ){π(θ )Kt (θi |θ )}−1 ,

       where Lt−1 is an arbitrary transition kernel.
      In case
                            Lt−1 (θ |θ) = Kt (θ|θ ) ,
      all weights are equal under a uniform prior.
      Inspired from Del Moral et al. (2006), who use backward kernels
      Lt−1 in SMC to achieve unbiasedness with a recent (2008)
      ABC-SMC version
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method(s)



ABC-PRC weight

                            (t)
      Probability ωi              computed as
                      (t)            (t)                  (t)   (t)
                     ωi ∝ π(θi )Lt−1 (θ |θi ){π(θ )Kt (θi |θ )}−1 ,

       where Lt−1 is an arbitrary transition kernel.
      In case
                            Lt−1 (θ |θ) = Kt (θ|θ ) ,
      all weights are equal under a uniform prior.
      Inspired from Del Moral et al. (2006), who use backward kernels
      Lt−1 in SMC to achieve unbiasedness with a recent (2008)
      ABC-SMC version
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method(s)



ABC-PRC weight

                            (t)
      Probability ωi              computed as
                      (t)            (t)                  (t)   (t)
                     ωi ∝ π(θi )Lt−1 (θ |θi ){π(θ )Kt (θi |θ )}−1 ,

       where Lt−1 is an arbitrary transition kernel.
      In case
                            Lt−1 (θ |θ) = Kt (θ|θ ) ,
      all weights are equal under a uniform prior.
      Inspired from Del Moral et al. (2006), who use backward kernels
      Lt−1 in SMC to achieve unbiasedness with a recent (2008)
      ABC-SMC version
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method(s)



ABC-PRC bias

                    Lack of unbiasedness of the method
      Joint density of the accepted pair (θ(t−1) , θ(t) ) proportional to

                                             π(θ(t−1) |y)Kt (θ(t) |θ(t−1) )f (y|θ(t) ) ,

      For an arbitrary function h(θ), E[ωt h(θ(t) )] proportional to
                                π(θ (t) )Lt−1 (θ (t−1) |θ (t) )
           ZZ
                      (t)                                                  (t−1)              (t)        (t−1)             (t)         (t−1)        (t)
                h(θ         )                                        π(θ           |y)Kt (θ         |θ           )f (y|θ         )dθ           dθ
                                π(θ (t−1) )Kt (θ (t) |θ (t−1) )
                                             π(θ (t) )Lt−1 (θ (t−1) |θ (t) )
                    ZZ
                                   (t)                                               (t−1)               (t−1)
                ∝           h(θ          )                                     π(θ           )f (y|θ             )
                                             π(θ (t−1) )Kt (θ (t) |θ (t−1) )
                                       (t)        (t−1)       (t)   (t−1)      (t)
                        × Kt (θ              |θ         )f (y|θ )dθ       dθ
                    Z                                   Z                                           ff
                                 (t)          (t)                    (t−1) (t)       (t−1)     (t−1)      (t)
                ∝        h(θ           )π(θ         |y)      Lt−1 (θ      |θ )f (y|θ       )dθ         dθ     .
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method(s)



ABC-PRC bias

                    Lack of unbiasedness of the method
      Joint density of the accepted pair (θ(t−1) , θ(t) ) proportional to

                                             π(θ(t−1) |y)Kt (θ(t) |θ(t−1) )f (y|θ(t) ) ,

      For an arbitrary function h(θ), E[ωt h(θ(t) )] proportional to
                                π(θ (t) )Lt−1 (θ (t−1) |θ (t) )
           ZZ
                      (t)                                                  (t−1)              (t)        (t−1)             (t)         (t−1)        (t)
                h(θ         )                                        π(θ           |y)Kt (θ         |θ           )f (y|θ         )dθ           dθ
                                π(θ (t−1) )Kt (θ (t) |θ (t−1) )
                                             π(θ (t) )Lt−1 (θ (t−1) |θ (t) )
                    ZZ
                                   (t)                                               (t−1)               (t−1)
                ∝           h(θ          )                                     π(θ           )f (y|θ             )
                                             π(θ (t−1) )Kt (θ (t) |θ (t−1) )
                                       (t)        (t−1)       (t)   (t−1)      (t)
                        × Kt (θ              |θ         )f (y|θ )dθ       dθ
                    Z                                   Z                                           ff
                                 (t)          (t)                    (t−1) (t)       (t−1)     (t−1)      (t)
                ∝        h(θ           )π(θ         |y)      Lt−1 (θ      |θ )f (y|θ       )dθ         dθ     .
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method(s)



A mixture example (0)

      Toy model of Sisson et al. (2007): if

              θ ∼ U(−10, 10) ,               x|θ ∼ 0.5 N (θ, 1) + 0.5 N (θ, 1/100) ,

      then the posterior distribution associated with y = 0 is the normal
      mixture
                   θ|y = 0 ∼ 0.5 N (0, 1) + 0.5 N (0, 1/100)
      restricted to [−10, 10].
      Furthermore, true target available as

      π(θ||x| < ) ∝ Φ( −θ)−Φ(− −θ)+Φ(10( −θ))−Φ(−10( +θ)) .
On some computational methods for Bayesian model choice
  ABC model choice
     ABC method(s)



A mixture example (1)

                            1.0




                                                        1.0




                                                                                    1.0




                                                                                                                1.0




                                                                                                                                            1.0
                            0.8




                                                        0.8




                                                                                    0.8




                                                                                                                0.8




                                                                                                                                            0.8
                            0.6




                                                        0.6




                                                                                    0.6




                                                                                                                0.6




                                                                                                                                            0.6
                            0.4




                                                        0.4




                                                                                    0.4




                                                                                                                0.4




                                                                                                                                            0.4
                            0.2




                                                        0.2




                                                                                    0.2




                                                                                                                0.2




                                                                                                                                            0.2
                            0.0




                                                        0.0




                                                                                    0.0




                                                                                                                0.0




                                                                                                                                            0.0
                                  −3   −1       1   3         −3   −1       1   3         −3   −1       1   3         −3   −1       1   3         −3   −1       1   3

                                            θ                           θ                           θ                           θ                           θ
                            1.0




                                                        1.0




                                                                                    1.0




                                                                                                                1.0




                                                                                                                                            1.0
                            0.8




                                                        0.8




                                                                                    0.8




                                                                                                                0.8




                                                                                                                                            0.8
                            0.6




                                                        0.6




                                                                                    0.6




                                                                                                                0.6




                                                                                                                                            0.6
                            0.4




                                                        0.4




                                                                                    0.4




                                                                                                                0.4




                                                                                                                                            0.4
                            0.2




                                                        0.2




                                                                                    0.2




                                                                                                                0.2




                                                                                                                                            0.2
                            0.0




                                                        0.0




                                                                                    0.0




                                                                                                                0.0




                                                                                                                                            0.0
                                  −3   −1       1   3         −3   −1       1   3         −3   −1       1   3         −3   −1       1   3         −3   −1       1   3

                                            θ                           θ                           θ                           θ                           θ



                     Comparison of τ = 0.15 and τ = 1/0.15 in Kt
On some computational methods for Bayesian model choice
  ABC model choice
     ABC-PMC



A PMC version
      Use of the same kernel idea as ABC-PRC but with IS correction
      Generate a sample at iteration t by
                                                  N
                                                          (t−1)              (t−1)
                                πt (θ(t) ) ∝
                                ˆ                      ωj         Kt (θ(t) |θj       )
                                                 j=1

      modulo acceptance of the associated xt , and use an importance
                                                     (t)
      weight associated with an accepted simulation θi
                                           (t)            (t)          (t)
                                         ωi ∝ π(θi ) πt (θi ) .
                                                     ˆ

                                          c Still likelihood free
                                                               [Beaumont et al., 2009]
On some computational methods for Bayesian model choice
  ABC model choice
     ABC-PMC



A mixture example (2)
      Recovery of the target, whether using a fixed standard deviation of
      τ = 0.15 or τ = 1/0.15,
On some computational methods for Bayesian model choice
  ABC model choice
     ABC for model choice in GRFs



Gibbs random fields

      Gibbs distribution
      The rv y = (y1 , . . . , yn ) is a Gibbs random field associated with
      the graph G if

                                               1
                                    f (y) =      exp −          Vc (yc )   ,
                                               Z
                                                          c∈C

      where Z is the normalising constant, C is the set of cliques of G
      and Vc is any function also called potential
      U (y) = c∈C Vc (yc ) is the energy function

       c Z is usually unavailable in closed form
On some computational methods for Bayesian model choice
  ABC model choice
     ABC for model choice in GRFs



Gibbs random fields

      Gibbs distribution
      The rv y = (y1 , . . . , yn ) is a Gibbs random field associated with
      the graph G if

                                               1
                                    f (y) =      exp −          Vc (yc )   ,
                                               Z
                                                          c∈C

      where Z is the normalising constant, C is the set of cliques of G
      and Vc is any function also called potential
      U (y) = c∈C Vc (yc ) is the energy function

       c Z is usually unavailable in closed form
On some computational methods for Bayesian model choice
  ABC model choice
     ABC for model choice in GRFs



Potts model
      Potts model
      Vc (y) is of the form

                                    Vc (y) = θS(y) = θ                δyl =yi
                                                                l∼i

      where l∼i denotes a neighbourhood structure

      In most realistic settings, summation

                                         Zθ =             exp{θT S(x)}
                                                  x∈X

      involves too many terms to be manageable and numerical
      approximations cannot always be trusted
                             [Cucala, Marin, CPR & Titterington, 2009]
On some computational methods for Bayesian model choice
  ABC model choice
     ABC for model choice in GRFs



Potts model
      Potts model
      Vc (y) is of the form

                                    Vc (y) = θS(y) = θ                δyl =yi
                                                                l∼i

      where l∼i denotes a neighbourhood structure

      In most realistic settings, summation

                                         Zθ =             exp{θT S(x)}
                                                  x∈X

      involves too many terms to be manageable and numerical
      approximations cannot always be trusted
                             [Cucala, Marin, CPR & Titterington, 2009]
On some computational methods for Bayesian model choice
  ABC model choice
     ABC for model choice in GRFs



Bayesian Model Choice



      Comparing a model with potential S0 taking values in Rp0 versus a
      model with potential S1 taking values in Rp1 can be done through
      the Bayes factor corresponding to the priors π0 and π1 on each
      parameter space
                                                       T
                                                  exp{θ0 S0 (x)}/Zθ0 ,0 π0 (dθ0 )
                       Bm0 /m1 (x) =                   T
                                                  exp{θ1 S1 (x)}/Zθ1 ,1 π1 (dθ1 )
On some computational methods for Bayesian model choice
  ABC model choice
     ABC for model choice in GRFs



Neighbourhood relations



      Choice to be made between M neighbourhood relations
                                       m
                                     i∼i             (0 ≤ m ≤ M − 1)

      with
                                           Sm (x) =             I{xi =xi }
                                                          m
                                                          i∼i

      driven by the posterior probabilities of the models.
On some computational methods for Bayesian model choice
  ABC model choice
     ABC for model choice in GRFs



Model index



      Formalisation via a model index M that appears as a new
      parameter with prior distribution π(M = m) and
      π(θ|M = m) = πm (θm )
      Computational target:

               P(M = m|x) ∝                      fm (x|θm )πm (θm ) dθm π(M = m) ,
                                           Θm
On some computational methods for Bayesian model choice
  ABC model choice
     ABC for model choice in GRFs



Model index



      Formalisation via a model index M that appears as a new
      parameter with prior distribution π(M = m) and
      π(θ|M = m) = πm (θm )
      Computational target:

               P(M = m|x) ∝                      fm (x|θm )πm (θm ) dθm π(M = m) ,
                                           Θm
On some computational methods for Bayesian model choice
  ABC model choice
     ABC for model choice in GRFs



Sufficient statistics
      By definition, if S(x) sufficient statistic for the joint parameters
      (M, θ0 , . . . , θM −1 ),
                                P(M = m|x) = P(M = m|S(x)) .
      For each model m, own sufficient statistic Sm (·) and
      S(·) = (S0 (·), . . . , SM −1 (·)) also sufficient.
      For Gibbs random fields,
                                        1           2
                x|M = m ∼ fm (x|θm ) = fm (x|S(x))fm (S(x)|θm )
                                           1
                                     =         f 2 (S(x)|θm )
                                       n(S(x)) m
      where
                              n(S(x)) = {˜ ∈ X : S(˜ ) = S(x)}
                                         x         x
       c S(x) is therefore also sufficient for the joint parameters
                                     [Specific to Gibbs random fields!]
On some computational methods for Bayesian model choice
  ABC model choice
     ABC for model choice in GRFs



Sufficient statistics
      By definition, if S(x) sufficient statistic for the joint parameters
      (M, θ0 , . . . , θM −1 ),
                                P(M = m|x) = P(M = m|S(x)) .
      For each model m, own sufficient statistic Sm (·) and
      S(·) = (S0 (·), . . . , SM −1 (·)) also sufficient.
      For Gibbs random fields,
                                        1           2
                x|M = m ∼ fm (x|θm ) = fm (x|S(x))fm (S(x)|θm )
                                           1
                                     =         f 2 (S(x)|θm )
                                       n(S(x)) m
      where
                              n(S(x)) = {˜ ∈ X : S(˜ ) = S(x)}
                                         x         x
       c S(x) is therefore also sufficient for the joint parameters
                                     [Specific to Gibbs random fields!]
On some computational methods for Bayesian model choice
  ABC model choice
     ABC for model choice in GRFs



Sufficient statistics
      By definition, if S(x) sufficient statistic for the joint parameters
      (M, θ0 , . . . , θM −1 ),
                                P(M = m|x) = P(M = m|S(x)) .
      For each model m, own sufficient statistic Sm (·) and
      S(·) = (S0 (·), . . . , SM −1 (·)) also sufficient.
      For Gibbs random fields,
                                        1           2
                x|M = m ∼ fm (x|θm ) = fm (x|S(x))fm (S(x)|θm )
                                           1
                                     =         f 2 (S(x)|θm )
                                       n(S(x)) m
      where
                              n(S(x)) = {˜ ∈ X : S(˜ ) = S(x)}
                                         x         x
       c S(x) is therefore also sufficient for the joint parameters
                                     [Specific to Gibbs random fields!]
On some computational methods for Bayesian model choice
  ABC model choice
     ABC for model choice in GRFs



ABC model choice algorithm


      ABC-MC
              Generate m∗ from the prior π(M = m).
                        ∗
              Generate θm∗ from the prior πm∗ (·).
              Generate x∗ from the model fm∗ (·|θm∗ ).
                                                 ∗

              Compute the distance ρ(S(x0 ), S(x∗ )).
              Accept (θm∗ , m∗ ) if ρ(S(x0 ), S(x∗ )) < .
                       ∗


                                              [Cornuet, Grelaud, Marin & Robert, 2009]

      Note When             = 0 the algorithm is exact
On some computational methods for Bayesian model choice
  ABC model choice
     ABC for model choice in GRFs



ABC approximation to the Bayes factor

      Frequency ratio:
                                                   ˆ
                                                   P(M = m0 |x0 ) π(M = m1 )
                  BF m0 /m1 (x0 ) =                              ×
                                                   ˆ
                                                   P(M = m1 |x0 ) π(M = m0 )
                                                    {mi∗ = m0 } π(M = m1 )
                                            =                  ×           ,
                                                    {mi∗ = m1 } π(M = m0 )

      replaced with

                                               1 + {mi∗ = m0 } π(M = m1 )
                   BF m0 /m1 (x0 ) =                          ×
                                               1 + {mi∗ = m1 } π(M = m0 )

      to avoid indeterminacy (also Bayes estimate).
On some computational methods for Bayesian model choice
  ABC model choice
     ABC for model choice in GRFs



ABC approximation to the Bayes factor

      Frequency ratio:
                                                   ˆ
                                                   P(M = m0 |x0 ) π(M = m1 )
                  BF m0 /m1 (x0 ) =                              ×
                                                   ˆ
                                                   P(M = m1 |x0 ) π(M = m0 )
                                                    {mi∗ = m0 } π(M = m1 )
                                            =                  ×           ,
                                                    {mi∗ = m1 } π(M = m0 )

      replaced with

                                               1 + {mi∗ = m0 } π(M = m1 )
                   BF m0 /m1 (x0 ) =                          ×
                                               1 + {mi∗ = m1 } π(M = m0 )

      to avoid indeterminacy (also Bayes estimate).
On some computational methods for Bayesian model choice
  ABC model choice
     Illustrations



Toy example

      iid Bernoulli model versus two-state first-order Markov chain, i.e.
                                                      n
                     f0 (x|θ0 ) = exp θ0                  I{xi =1}    {1 + exp(θ0 )}n ,
                                                   i=1

      versus
                                                  n
                                1
               f1 (x|θ1 ) =       exp θ1               I{xi =xi−1 }    {1 + exp(θ1 )}n−1 ,
                                2
                                                 i=2

      with priors θ0 ∼ U(−5, 5) and θ1 ∼ U(0, 6) (inspired by “phase
      transition” boundaries).
On some computational methods for Bayesian model choice
  ABC model choice
     Illustrations



Toy example (2)




                                                                             10
                                          5




                                                                             5
                                   BF01




                                                                      BF01

                                                                             0
                                    ^




                                                                       ^
                                          0




                                                                             −5
                                          −5




                                                                             −10
                                               −40   −20     0   10                −40   −20     0   10

                                                      BF01                                BF01




      (left) Comparison of the true BF m0 /m1 (x0 ) with BF m0 /m1 (x0 )
      (in logs) over 2, 000 simulations and 4.106 proposals from the
      prior. (right) Same when using tolerance corresponding to the
      1% quantile on the distances.
On some computational methods for Bayesian model choice
  ABC model choice
     Illustrations



Protein folding




      Superposition of the native structure (grey) with the ST1
      structure (red.), the ST2 structure (orange), the ST3 structure
      (green), and the DT structure (blue).
On some computational methods for Bayesian model choice
  ABC model choice
     Illustrations



Protein folding (2)


                                         % seq . Id.      TM-score   FROST score
                 1i5nA (ST1)                 32            0.86         75.3
                 1ls1A1 (ST2)                5             0.42         8.9
                 1jr8A (ST3)                 4             0.24         8.9
                 1s7oA (DT)                  10            0.08         7.8

      Characteristics of dataset. % seq. Id.: percentage of identity with
      the query sequence. TM-score.: similarity between predicted and
      native structure (uncertainty between 0.17 and 0.4) FROST score:
      quality of alignment of the query onto the candidate structure
      (uncertainty between 7 and 9).
On some computational methods for Bayesian model choice
  ABC model choice
     Illustrations



Protein folding (3)



                                        NS/ST1            NS/ST2   NS/ST3   NS/DT
               BF                         1.34             1.22     2.42     2.76
           P(M = NS|x0 )                 0.573             0.551    0.708    0.734

      Estimates of the Bayes factors between model NS and models
      ST1, ST2, ST3, and DT, and corresponding posterior
      probabilities of model NS based on an ABC-MC algorithm using
      1.2 106 simulations and a tolerance equal to the 1% quantile of
      the distances.

Contenu connexe

Tendances

Max Entropy
Max EntropyMax Entropy
Max Entropy
jianingy
 

Tendances (20)

Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]
 
BIRS 12w5105 meeting
BIRS 12w5105 meetingBIRS 12w5105 meeting
BIRS 12w5105 meeting
 
A Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemA Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation Problem
 
ABC in London, May 5, 2011
ABC in London, May 5, 2011ABC in London, May 5, 2011
ABC in London, May 5, 2011
 
from model uncertainty to ABC
from model uncertainty to ABCfrom model uncertainty to ABC
from model uncertainty to ABC
 
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
 
Principle of Maximum Entropy
Principle of Maximum EntropyPrinciple of Maximum Entropy
Principle of Maximum Entropy
 
Max Entropy
Max EntropyMax Entropy
Max Entropy
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 
(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?
 
von Mises lecture, Berlin
von Mises lecture, Berlinvon Mises lecture, Berlin
von Mises lecture, Berlin
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimation
 
Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forests
 
WSC 2011, advanced tutorial on simulation in Statistics
WSC 2011, advanced tutorial on simulation in StatisticsWSC 2011, advanced tutorial on simulation in Statistics
WSC 2011, advanced tutorial on simulation in Statistics
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes Factors
 
Intractable likelihoods
Intractable likelihoodsIntractable likelihoods
Intractable likelihoods
 
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
 
Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methods
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 

En vedette (8)

Computer Intensive Statistical Methods
Computer Intensive Statistical MethodsComputer Intensive Statistical Methods
Computer Intensive Statistical Methods
 
jrudd1_RDay
jrudd1_RDayjrudd1_RDay
jrudd1_RDay
 
MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methods
 
Bootstrap: a bias minimization technique of an estimator
Bootstrap: a bias minimization technique of an estimatorBootstrap: a bias minimization technique of an estimator
Bootstrap: a bias minimization technique of an estimator
 
Reading Efron's 1979 paper on bootstrap
Reading Efron's 1979 paper on bootstrapReading Efron's 1979 paper on bootstrap
Reading Efron's 1979 paper on bootstrap
 
Aula 2 Teoria Da Amostragem Daniel
Aula 2 Teoria Da Amostragem DanielAula 2 Teoria Da Amostragem Daniel
Aula 2 Teoria Da Amostragem Daniel
 
A.b aula 4 amostragem
A.b aula 4 amostragemA.b aula 4 amostragem
A.b aula 4 amostragem
 
Uchim.org 6-klass-vilenkin
Uchim.org 6-klass-vilenkinUchim.org 6-klass-vilenkin
Uchim.org 6-klass-vilenkin
 

Similaire à Jsm09 talk

Bayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet ProcessBayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet Process
Alessandro Panella
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
zukun
 
original
originaloriginal
original
butest
 
Bayesian model choice (and some alternatives)
Bayesian model choice (and some alternatives)Bayesian model choice (and some alternatives)
Bayesian model choice (and some alternatives)
Christian Robert
 

Similaire à Jsm09 talk (20)

Computational tools for Bayesian model choice
Computational tools for Bayesian model choiceComputational tools for Bayesian model choice
Computational tools for Bayesian model choice
 
Non-parametric analysis of models and data
Non-parametric analysis of models and dataNon-parametric analysis of models and data
Non-parametric analysis of models and data
 
Workshop on Bayesian Inference for Latent Gaussian Models with Applications
Workshop on Bayesian Inference for Latent Gaussian Models with ApplicationsWorkshop on Bayesian Inference for Latent Gaussian Models with Applications
Workshop on Bayesian Inference for Latent Gaussian Models with Applications
 
Bayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet ProcessBayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet Process
 
seminar at Princeton University
seminar at Princeton Universityseminar at Princeton University
seminar at Princeton University
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
 
Course on Bayesian computational methods
Course on Bayesian computational methodsCourse on Bayesian computational methods
Course on Bayesian computational methods
 
San Antonio short course, March 2010
San Antonio short course, March 2010San Antonio short course, March 2010
San Antonio short course, March 2010
 
ABC in Varanasi
ABC in VaranasiABC in Varanasi
ABC in Varanasi
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Elementary Probability and Information Theory
Elementary Probability and Information TheoryElementary Probability and Information Theory
Elementary Probability and Information Theory
 
ABC-SysBio – Approximate Bayesian Computation in Python with GPU support
ABC-SysBio – Approximate Bayesian Computation in Python with GPU supportABC-SysBio – Approximate Bayesian Computation in Python with GPU support
ABC-SysBio – Approximate Bayesian Computation in Python with GPU support
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
ABC and empirical likelihood
ABC and empirical likelihoodABC and empirical likelihood
ABC and empirical likelihood
 
Edinburgh, Bayes-250
Edinburgh, Bayes-250Edinburgh, Bayes-250
Edinburgh, Bayes-250
 
Is ABC a new empirical Bayes approach?
Is ABC a new empirical Bayes approach?Is ABC a new empirical Bayes approach?
Is ABC a new empirical Bayes approach?
 
original
originaloriginal
original
 
Bayesian model choice (and some alternatives)
Bayesian model choice (and some alternatives)Bayesian model choice (and some alternatives)
Bayesian model choice (and some alternatives)
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 

Plus de Christian Robert

Plus de Christian Robert (20)

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
 

Dernier

Dernier (20)

HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 

Jsm09 talk

  • 1. On some computational methods for Bayesian model choice On some computational methods for Bayesian model choice Christian P. Robert CREST-INSEE and Universit´ Paris Dauphine e http://xianblog.wordpress.com JSM 2009, Washington D.C., August 2, 2009 Joint works with M. Beaumont, N. Chopin, J.-M. Cornuet, J.-M. Marin and D. Wraith
  • 2. On some computational methods for Bayesian model choice Outline 1 Evidence 2 Importance sampling solutions 3 Nested sampling 4 ABC model choice
  • 3. On some computational methods for Bayesian model choice Evidence Model choice and model comparison Choice between models Several models available for the same observation vector x Mi : x ∼ fi (x|θi ), i∈I where the family I can be finite or infinite
  • 4. On some computational methods for Bayesian model choice Evidence Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj j Θj take largest π(Mi |x) to determine “best” model, or use averaged predictive π(Mj |x) fj (x |θj )πj (θj |x)dθj j Θj
  • 5. On some computational methods for Bayesian model choice Evidence Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj j Θj take largest π(Mi |x) to determine “best” model, or use averaged predictive π(Mj |x) fj (x |θj )πj (θj |x)dθj j Θj
  • 6. On some computational methods for Bayesian model choice Evidence Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj j Θj take largest π(Mi |x) to determine “best” model, or use averaged predictive π(Mj |x) fj (x |θj )πj (θj |x)dθj j Θj
  • 7. On some computational methods for Bayesian model choice Evidence Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj j Θj take largest π(Mi |x) to determine “best” model, or use averaged predictive π(Mj |x) fj (x |θj )πj (θj |x)dθj j Θj
  • 8. On some computational methods for Bayesian model choice Evidence Bayes factor Definition (Bayes factors) Central quantity Θi fi (x|θi )πi (θi )dθi mi (x) Bij (x) == = Θj fj (x|θj )πj (θj )dθj mj (x) [Jeffreys, 1939]
  • 9. On some computational methods for Bayesian model choice Evidence Evidence All these problems end up with a similar quantity, the evidence k = 1, . . . Zk = πk (θk )Lk (θk ) dθk , Θk aka the marginal likelihood.
  • 10. On some computational methods for Bayesian model choice Importance sampling solutions Importance sampling revisited 1 Evidence 2 Importance sampling solutions Bridge sampling Harmonic means One bridge back 3 Nested sampling 4 ABC model choice
  • 11. On some computational methods for Bayesian model choice Importance sampling solutions Bridge sampling Bayes factor approximation When approximating the Bayes factor f0 (x|θ0 )π0 (θ0 )dθ0 Θ0 B01 = f1 (x|θ1 )π1 (θ1 )dθ1 Θ1 use of importance functions 0 and 1 and n−1 0 n0 i i i=1 f0 (x|θ0 )π0 (θ0 )/ i 0 (θ0 ) B01 = n−1 1 n1 i i i=1 f1 (x|θ1 )π1 (θ1 )/ i 1 (θ1 )
  • 12. On some computational methods for Bayesian model choice Importance sampling solutions Bridge sampling Bridge sampling identity Generic representation π2 (θ|x)α(θ)π1 (θ|x)dθ ˜ B12 = ∀ α(·) π1 (θ|x)α(θ)π2 (θ|x)dθ ˜ n1 1 π2 (θ1i |x)α(θ1i ) ˜ n1 i=1 ≈ n2 θji ∼ πj (θ|x) 1 π1 (θ2i |x)α(θ2i ) ˜ n2 i=1 [Gelman & Meng, 1998; Chen, Shao & Ibrahim, 2000]
  • 13. On some computational methods for Bayesian model choice Importance sampling solutions Bridge sampling Optimal bridge sampling The optimal choice of auxiliary function is n1 + n2 α = n1 π1 (θ|x) + n2 π2 (θ|x) leading to n1 1 π2 (θ1i |x) ˜ n1 n1 π1 (θ1i |x) + n2 π2 (θ1i |x) i=1 B12 ≈ n2 1 π1 (θ2i |x) ˜ n2 n1 π1 (θ2i |x) + n2 π2 (θ2i |x) i=1 Back later!
  • 14. On some computational methods for Bayesian model choice Importance sampling solutions Bridge sampling Optimal bridge sampling (2) Reason: Var(B12 ) 1 π1 (θ)π2 (θ)[n1 π1 (θ) + n2 π2 (θ)]α(θ)2 dθ 2 ≈ 2 −1 B12 n1 n2 π1 (θ)π2 (θ)α(θ) dθ δ method Drag: Dependence on the unknown normalising constants solved iteratively
  • 15. On some computational methods for Bayesian model choice Importance sampling solutions Bridge sampling Optimal bridge sampling (2) Reason: Var(B12 ) 1 π1 (θ)π2 (θ)[n1 π1 (θ) + n2 π2 (θ)]α(θ)2 dθ 2 ≈ 2 −1 B12 n1 n2 π1 (θ)π2 (θ)α(θ) dθ δ method Drag: Dependence on the unknown normalising constants solved iteratively
  • 16. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Zk from a posterior sample Use of the [harmonic mean] identity ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1 Eπk x = dθk = πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk no matter what the proposal ϕ(·) is. [Gelfand & Dey, 1994; Bartolucci et al., 2006] Direct exploitation of the MCMC output
  • 17. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Zk from a posterior sample Use of the [harmonic mean] identity ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1 Eπk x = dθk = πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk no matter what the proposal ϕ(·) is. [Gelfand & Dey, 1994; Bartolucci et al., 2006] Direct exploitation of the MCMC output
  • 18. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means RJ-MCMC interpretation If ϕ is the density of an alternative to model pik (θk )Lk (θk |x), the acceptance probability for a move from model pik (θk )Lk (θk |x) to model ϕ(θk ) is based upon ϕ(θk ) πk (θk )Lk (θk |x) (under a proper choice of prior weight) Unbiased estimator of the odds ratio [Bartolucci, Scaccia, and Mira, 2006]
  • 19. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means An infamous example When ϕ(θk ) = πk (θk ) ˜ harmonic mean approximation to B12 is n1 1 1/L1 (θ1i |x) n1 i=1 B21 = n2 θji ∼ πj (θ|x) 1 1/L2 (θ2i |x) n2 i=1 [Newton & Raftery, 1994; Raftery et al., 2008] Infamous: Most often leads to an infinite variance!!! [Radford Neal’s blog, 2008]
  • 20. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means An infamous example When ϕ(θk ) = πk (θk ) ˜ harmonic mean approximation to B12 is n1 1 1/L1 (θ1i |x) n1 i=1 B21 = n2 θji ∼ πj (θ|x) 1 1/L2 (θ2i |x) n2 i=1 [Newton & Raftery, 1994; Raftery et al., 2008] Infamous: Most often leads to an infinite variance!!! [Radford Neal’s blog, 2008]
  • 21. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means “The Worst Monte Carlo Method Ever” “The good news is that the Law of Large Numbers guarantees that this estimator is consistent ie, it will very likely be very close to the correct answer if you use a sufficiently large number of points from the posterior distribution. The bad news is that the number of points required for this estimator to get close to the right answer will often be greater than the number of atoms in the observable universe. The even worse news is that it’s easy for people to not realize this, and to naively accept estimates that are nowhere close to the correct value of the marginal likelihood.” [Radford Neal’s blog, Aug. 23, 2008]
  • 22. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means “The Worst Monte Carlo Method Ever” “The good news is that the Law of Large Numbers guarantees that this estimator is consistent ie, it will very likely be very close to the correct answer if you use a sufficiently large number of points from the posterior distribution. The bad news is that the number of points required for this estimator to get close to the right answer will often be greater than the number of atoms in the observable universe. The even worse news is that it’s easy for people to not realize this, and to naively accept estimates that are nowhere close to the correct value of the marginal likelihood.” [Radford Neal’s blog, Aug. 23, 2008]
  • 23. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Comparison with regular importance sampling Harmonic mean: Constraint opposed to usual importance sampling constraints: ϕ(θ) must have lighter (rather than fatter) tails than πk (θk )Lk (θk ) for the approximation T (t) 1 ϕ(θk ) Z1k = 1 (t) (t) T πk (θk )Lk (θk ) t=1 to have a finite variance. E.g., use finite support kernels for ϕ
  • 24. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Comparison with regular importance sampling Harmonic mean: Constraint opposed to usual importance sampling constraints: ϕ(θ) must have lighter (rather than fatter) tails than πk (θk )Lk (θk ) for the approximation T (t) 1 ϕ(θk ) Z1k = 1 (t) (t) T πk (θk )Lk (θk ) t=1 to have a finite variance. E.g., use finite support kernels for ϕ
  • 25. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means HPD supports In practice, possibility to use the HPD region defined by an MCMC sample as Hα = {θ; π (θ|x) ≥ qα [˜ (·|x)]} ˜ ˆ π where qα [˜ (·|x)] is the empirical α quantile of the observed ˆ π π (θi |x))’s ˜ c Construction of ϕ as uniformly supported by Hα or an ellipsoid approximation whose volume can be computed
  • 26. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means HPD supports In practice, possibility to use the HPD region defined by an MCMC sample as Hα = {θ; π (θ|x) ≥ qα [˜ (·|x)]} ˜ ˆ π where qα [˜ (·|x)] is the empirical α quantile of the observed ˆ π π (θi |x))’s ˜ c Construction of ϕ as uniformly supported by Hα or an ellipsoid approximation whose volume can be computed
  • 27. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means HPD supports: illustration Gibbs sample of 103 parameters (θ, σ 2 ) produced for the normal model, x1 , . . . , xn ∼ N (θ, σ 2 ) with x = 0, s2 = 1 and n = 10, under Jeffreys’ prior, leads to a straightforward approximation to the 10% HPD region, replaced by an ellipsoid.
  • 28. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means HPD supports: illustration Gibbs sample of 103 parameters (θ, σ 2 ) produced for the normal model, x1 , . . . , xn ∼ N (θ, σ 2 ) with x = 0, s2 = 1 and n = 10, under Jeffreys’ prior, leads to a straightforward approximation to the 50% HPD region, replaced by an ellipsoid.
  • 29. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Comparison with regular importance sampling (cont’d) Compare Z1k with a standard importance sampling approximation T (t) (t) 1 πk (θk )Lk (θk ) Z2k = (t) T ϕ(θk ) t=1 (t) where the θk ’s are generated from the density ϕ(·) (with fatter tails like t’s)
  • 30. On some computational methods for Bayesian model choice Importance sampling solutions One bridge back Approximating Zk using a mixture representation Bridge sampling recall Design a specific mixture for simulation [importance sampling] purposes, with density ϕk (θk ) ∝ ωπk (θk )Lk (θk ) + ϕ(θk ) , where ϕ(·) is arbitrary (but normalised) Note: ω is not a probability weight
  • 31. On some computational methods for Bayesian model choice Importance sampling solutions One bridge back Approximating Zk using a mixture representation Bridge sampling recall Design a specific mixture for simulation [importance sampling] purposes, with density ϕk (θk ) ∝ ωπk (θk )Lk (θk ) + ϕ(θk ) , where ϕ(·) is arbitrary (but normalised) Note: ω is not a probability weight
  • 32. On some computational methods for Bayesian model choice Importance sampling solutions One bridge back Approximating Z using a mixture representation (cont’d) Corresponding MCMC (=Gibbs) sampler At iteration t 1 Take δ (t) = 1 with probability (t−1) (t−1) (t−1) (t−1) (t−1) ωπk (θk )Lk (θk ) ωπk (θk )Lk (θk ) + ϕ(θk ) and δ (t) = 2 otherwise; (t) (t−1) 2 If δ (t) = 1, generate θk ∼ MCMC(θk , θk ) where MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated with the posterior πk (θk |x) ∝ πk (θk )Lk (θk ); (t) 3 If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
  • 33. On some computational methods for Bayesian model choice Importance sampling solutions One bridge back Approximating Z using a mixture representation (cont’d) Corresponding MCMC (=Gibbs) sampler At iteration t 1 Take δ (t) = 1 with probability (t−1) (t−1) (t−1) (t−1) (t−1) ωπk (θk )Lk (θk ) ωπk (θk )Lk (θk ) + ϕ(θk ) and δ (t) = 2 otherwise; (t) (t−1) 2 If δ (t) = 1, generate θk ∼ MCMC(θk , θk ) where MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated with the posterior πk (θk |x) ∝ πk (θk )Lk (θk ); (t) 3 If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
  • 34. On some computational methods for Bayesian model choice Importance sampling solutions One bridge back Approximating Z using a mixture representation (cont’d) Corresponding MCMC (=Gibbs) sampler At iteration t 1 Take δ (t) = 1 with probability (t−1) (t−1) (t−1) (t−1) (t−1) ωπk (θk )Lk (θk ) ωπk (θk )Lk (θk ) + ϕ(θk ) and δ (t) = 2 otherwise; (t) (t−1) 2 If δ (t) = 1, generate θk ∼ MCMC(θk , θk ) where MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated with the posterior πk (θk |x) ∝ πk (θk )Lk (θk ); (t) 3 If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
  • 35. On some computational methods for Bayesian model choice Importance sampling solutions One bridge back Evidence approximation by mixtures Rao-Blackwellised estimate T ˆ 1 ξ= (t) ωπk (θk )Lk (θk ) (t) (t) (t) ωπk (θk )Lk (θk ) + ϕ(θk ) , (t) T t=1 converges to ωZk /{ωZk + 1} 3k ˆ ˆ ˆ Deduce Zˆ from ω Z3k /{ω Z3k + 1} = ξ ie T (t) (t) (t) (t) (t) t=1 ωπk (θk )Lk (θk ) ωπ(θk )Lk (θk ) + ϕ(θk ) ˆ Z3k = T (t) (t) (t) (t) t=1 ϕ(θk ) ωπk (θk )Lk (θk ) + ϕ(θk ) [Bridge sampler redux]
  • 36. On some computational methods for Bayesian model choice Importance sampling solutions One bridge back Evidence approximation by mixtures Rao-Blackwellised estimate T ˆ 1 ξ= (t) ωπk (θk )Lk (θk ) (t) (t) (t) ωπk (θk )Lk (θk ) + ϕ(θk ) , (t) T t=1 converges to ωZk /{ωZk + 1} 3k ˆ ˆ ˆ Deduce Zˆ from ω Z3k /{ω Z3k + 1} = ξ ie T (t) (t) (t) (t) (t) t=1 ωπk (θk )Lk (θk ) ωπ(θk )Lk (θk ) + ϕ(θk ) ˆ Z3k = T (t) (t) (t) (t) t=1 ϕ(θk ) ωπk (θk )Lk (θk ) + ϕ(θk ) [Bridge sampler redux]
  • 37. On some computational methods for Bayesian model choice Importance sampling solutions One bridge back Mixture illustration For the normal example, use of the same uniform ϕ over the ellipse approximating the same HPD region Hα and ω equal to one-tenth of the true evidence. Gibbs samples of size 104
  • 38. On some computational methods for Bayesian model choice Importance sampling solutions One bridge back Mixture illustration For the normal example, use of the same uniform ϕ over the ellipse approximating the same HPD region Hα and ω equal to one-tenth of the true evidence. Gibbs samples of size 104 and 100 Monte Carlo replicas
  • 39. On some computational methods for Bayesian model choice Nested sampling Nested sampling 1 Evidence 2 Importance sampling solutions 3 Nested sampling Purpose Implementation Error rates Constraints 4 ABC model choice
  • 40. On some computational methods for Bayesian model choice Nested sampling Purpose Nested sampling: Goal Skilling’s (2007) technique using the one-dimensional representation: 1 Z = Eπ [L(θ)] = ϕ(x) dx 0 with ϕ−1 (l) = P π (L(θ) > l). Note; ϕ(·) is intractable in most cases.
  • 41. On some computational methods for Bayesian model choice Nested sampling Implementation Nested sampling: First approximation Approximate Z by a Riemann sum: N Z= (xi−1 − xi )ϕ(xi ) i=1 where the xi ’s are either: deterministic: xi = e−i/N or random: x0 = 0, xi+1 = ti xi , ti ∼ Be(N, 1) so that E[log xi ] = −i/N .
  • 42. On some computational methods for Bayesian model choice Nested sampling Implementation Nested sampling: Alternative representation Another representation is N −1 Z= {ϕ(xi+1 ) − ϕ(xi )}xi i=0 which is a special case of N −1 Z= {L(θ(i+1) ) − L(θ(i) )}π({θ; L(θ) > L(θ(i) )}) i=0 where · · · L(θ(i+1) ) < L(θ(i) ) · · · [Lebesgue version of Riemann’s sum]
  • 43. On some computational methods for Bayesian model choice Nested sampling Implementation Nested sampling: Alternative representation Another representation is N −1 Z= {ϕ(xi+1 ) − ϕ(xi )}xi i=0 which is a special case of N −1 Z= {L(θ(i+1) ) − L(θ(i) )}π({θ; L(θ) > L(θ(i) )}) i=0 where · · · L(θ(i+1) ) < L(θ(i) ) · · · [Lebesgue version of Riemann’s sum]
  • 44. On some computational methods for Bayesian model choice Nested sampling Implementation Nested sampling: Second approximation Estimate (intractable) ϕ(xi ) by ϕi : Nested sampling Start with N values θ1 , . . . , θN sampled from π At iteration i, 1 Take ϕi = L(θk ), where θk is the point with smallest likelihood in the pool of θi ’s 2 Replace θk with a sample from the prior constrained to L(θ) > ϕi : the current N points are sampled from prior constrained to L(θ) > ϕi . Note that 1 π({θ; L(θ) > L(θ(i+1) )}) π({θ; L(θ) > L(θ(i) )}) ≈ 1 − N
  • 45. On some computational methods for Bayesian model choice Nested sampling Implementation Nested sampling: Second approximation Estimate (intractable) ϕ(xi ) by ϕi : Nested sampling Start with N values θ1 , . . . , θN sampled from π At iteration i, 1 Take ϕi = L(θk ), where θk is the point with smallest likelihood in the pool of θi ’s 2 Replace θk with a sample from the prior constrained to L(θ) > ϕi : the current N points are sampled from prior constrained to L(θ) > ϕi . Note that 1 π({θ; L(θ) > L(θ(i+1) )}) π({θ; L(θ) > L(θ(i) )}) ≈ 1 − N
  • 46. On some computational methods for Bayesian model choice Nested sampling Implementation Nested sampling: Second approximation Estimate (intractable) ϕ(xi ) by ϕi : Nested sampling Start with N values θ1 , . . . , θN sampled from π At iteration i, 1 Take ϕi = L(θk ), where θk is the point with smallest likelihood in the pool of θi ’s 2 Replace θk with a sample from the prior constrained to L(θ) > ϕi : the current N points are sampled from prior constrained to L(θ) > ϕi . Note that 1 π({θ; L(θ) > L(θ(i+1) )}) π({θ; L(θ) > L(θ(i) )}) ≈ 1 − N
  • 47. On some computational methods for Bayesian model choice Nested sampling Implementation Nested sampling: Second approximation Estimate (intractable) ϕ(xi ) by ϕi : Nested sampling Start with N values θ1 , . . . , θN sampled from π At iteration i, 1 Take ϕi = L(θk ), where θk is the point with smallest likelihood in the pool of θi ’s 2 Replace θk with a sample from the prior constrained to L(θ) > ϕi : the current N points are sampled from prior constrained to L(θ) > ϕi . Note that 1 π({θ; L(θ) > L(θ(i+1) )}) π({θ; L(θ) > L(θ(i) )}) ≈ 1 − N
  • 48. On some computational methods for Bayesian model choice Nested sampling Implementation Nested sampling: Third approximation Iterate the above steps until a given stopping iteration j is reached: e.g., observe very small changes in the approximation Z; reach the maximal value of L(θ) when the likelihood is bounded and its maximum is known; truncate the integral Z at level , i.e. replace 1 1 ϕ(x) dx with ϕ(x) dx 0
  • 49. On some computational methods for Bayesian model choice Nested sampling Error rates Approximation error Error = Z − Z j 1 = (xi−1 − xi )ϕi − ϕ(x) dx = − ϕ(x) dx i=1 0 0 j 1 + (xi−1 − xi )ϕ(xi ) − ϕ(x) dx (Quadrature Error) i=1 j + (xi−1 − xi ) {ϕi − ϕ(xi )} (Stochastic Error) i=1 [Dominated by Monte Carlo]
  • 50. On some computational methods for Bayesian model choice Nested sampling Error rates A CLT for the Stochastic Error The (dominating) stochastic error is OP (N −1/2 ): D N 1/2 Z − Z → N (0, V ) with V =− sϕ (s)tϕ (t) log(s ∨ t) ds dt. s,t∈[ ,1] [Proof based on Donsker’s theorem]
  • 51. On some computational methods for Bayesian model choice Nested sampling Error rates What of log Z? If the interest lies within log Z, Slutsky’s transform of the CLT: D N −1/2 log Z − log Z → N 0, V /Z2 Note: The number of simulated points equals the number of iterations j, and is a multiple of N : if one stops at first iteration j such that e−j/N < , then: j = N − log .
  • 52. On some computational methods for Bayesian model choice Nested sampling Error rates What of log Z? If the interest lies within log Z, Slutsky’s transform of the CLT: D N −1/2 log Z − log Z → N 0, V /Z2 Note: The number of simulated points equals the number of iterations j, and is a multiple of N : if one stops at first iteration j such that e−j/N < , then: j = N − log .
  • 53. On some computational methods for Bayesian model choice Nested sampling Error rates What of log Z? If the interest lies within log Z, Slutsky’s transform of the CLT: D N −1/2 log Z − log Z → N 0, V /Z2 Note: The number of simulated points equals the number of iterations j, and is a multiple of N : if one stops at first iteration j such that e−j/N < , then: j = N − log .
  • 54. On some computational methods for Bayesian model choice Nested sampling Constraints Sampling from constr’d priors Exact simulation from the constrained prior is intractable in most cases Skilling (2007) proposes to use MCMC but this introduces a bias (stopping rule) If implementable, then slice sampler can be devised at the same cost [or less]
  • 55. On some computational methods for Bayesian model choice Nested sampling Constraints Sampling from constr’d priors Exact simulation from the constrained prior is intractable in most cases Skilling (2007) proposes to use MCMC but this introduces a bias (stopping rule) If implementable, then slice sampler can be devised at the same cost [or less]
  • 56. On some computational methods for Bayesian model choice Nested sampling Constraints Sampling from constr’d priors Exact simulation from the constrained prior is intractable in most cases Skilling (2007) proposes to use MCMC but this introduces a bias (stopping rule) If implementable, then slice sampler can be devised at the same cost [or less]
  • 57. On some computational methods for Bayesian model choice Nested sampling Constraints Banana illustration Case of a banana target made of a twisted 2D normal: x2 = x2 + βx2 − 100β 1 [Haario, Sacksman, Tamminen, 1999] β = .03, Σ = diag(100, 1)
  • 58. On some computational methods for Bayesian model choice Nested sampling Constraints Banana illustration (2) Use of nested sampling with N = 1000, 50 MCMC steps with size 0.1, compared with a population Monte Carlo (PMC) based on 10 iterations with 5000 points per iteration and final sample of 50000 points, using nine Student’s t components with 9 df [Wraith, Kilbinger, Benabed et al., 2009, Physica Rev. D] Evidence estimation
  • 59. On some computational methods for Bayesian model choice Nested sampling Constraints Banana illustration (2) Use of nested sampling with N = 1000, 50 MCMC steps with size 0.1, compared with a population Monte Carlo (PMC) based on 10 iterations with 5000 points per iteration and final sample of 50000 points, using nine Student’s t components with 9 df [Wraith, Kilbinger, Benabed et al., 2009, Physica Rev. D] E[X1 ] estimation
  • 60. On some computational methods for Bayesian model choice Nested sampling Constraints Banana illustration (2) Use of nested sampling with N = 1000, 50 MCMC steps with size 0.1, compared with a population Monte Carlo (PMC) based on 10 iterations with 5000 points per iteration and final sample of 50000 points, using nine Student’s t components with 9 df [Wraith, Kilbinger, Benabed et al., 2009, Physica Rev. D] E[X2 ] estimation
  • 61. On some computational methods for Bayesian model choice ABC model choice ABC method(s) Approximate Bayesian Computation Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique: ABC algorithm For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly simulating θ ∼ π(θ) , x ∼ f (x|θ ) , until the auxiliary variable x is equal to the observed value, x = y. [Pritchard et al., 1999]
  • 62. On some computational methods for Bayesian model choice ABC model choice ABC method(s) Approximate Bayesian Computation Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique: ABC algorithm For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly simulating θ ∼ π(θ) , x ∼ f (x|θ ) , until the auxiliary variable x is equal to the observed value, x = y. [Pritchard et al., 1999]
  • 63. On some computational methods for Bayesian model choice ABC model choice ABC method(s) Approximate Bayesian Computation Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique: ABC algorithm For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly simulating θ ∼ π(θ) , x ∼ f (x|θ ) , until the auxiliary variable x is equal to the observed value, x = y. [Pritchard et al., 1999]
  • 64. On some computational methods for Bayesian model choice ABC model choice ABC method(s) A as approximative When y is a continuous random variable, equality x = y is replaced with a tolerance condition, (x, y) ≤ where is a distance between summary statistics Output distributed from π(θ) Pθ { (x, y) < } ∝ π(θ| (x, y) < )
  • 65. On some computational methods for Bayesian model choice ABC model choice ABC method(s) A as approximative When y is a continuous random variable, equality x = y is replaced with a tolerance condition, (x, y) ≤ where is a distance between summary statistics Output distributed from π(θ) Pθ { (x, y) < } ∝ π(θ| (x, y) < )
  • 66. On some computational methods for Bayesian model choice ABC model choice ABC method(s) Dynamic area
  • 67. On some computational methods for Bayesian model choice ABC model choice ABC method(s) ABC improvements Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y... [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger .... [Beaumont et al., 2002] ...or even by including in the inferential framework [Ratmann et al., 2009]
  • 68. On some computational methods for Bayesian model choice ABC model choice ABC method(s) ABC improvements Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y... [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger .... [Beaumont et al., 2002] ...or even by including in the inferential framework [Ratmann et al., 2009]
  • 69. On some computational methods for Bayesian model choice ABC model choice ABC method(s) ABC improvements Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y... [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger .... [Beaumont et al., 2002] ...or even by including in the inferential framework [Ratmann et al., 2009]
  • 70. On some computational methods for Bayesian model choice ABC model choice ABC method(s) ABC improvements Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y... [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger .... [Beaumont et al., 2002] ...or even by including in the inferential framework [Ratmann et al., 2009]
  • 71. On some computational methods for Bayesian model choice ABC model choice ABC method(s) ABC-MCMC Markov chain (θ(t) ) created via the transition function  θ ∼ K(θ |θ(t) ) if x ∼ f (x|θ ) is such that x = y   (t+1) π(θ )K(θ(t) |θ ) θ = and u ∼ U(0, 1) ≤ π(θ(t) )K(θ |θ(t) ) ,   (t) θ otherwise, has the posterior π(θ|y) as stationary distribution [Marjoram et al, 2003]
  • 72. On some computational methods for Bayesian model choice ABC model choice ABC method(s) ABC-MCMC Markov chain (θ(t) ) created via the transition function  θ ∼ K(θ |θ(t) ) if x ∼ f (x|θ ) is such that x = y   (t+1) π(θ )K(θ(t) |θ ) θ = and u ∼ U(0, 1) ≤ π(θ(t) )K(θ |θ(t) ) ,   (t) θ otherwise, has the posterior π(θ|y) as stationary distribution [Marjoram et al, 2003]
  • 73. On some computational methods for Bayesian model choice ABC model choice ABC method(s) ABC-PRC Another sequential version producing a sequence of Markov (t) (t) transition kernels Kt and of samples (θ1 , . . . , θN ) (1 ≤ t ≤ T ) ABC-PRC Algorithm (t−1) 1 Pick a θ is selected at random among the previous θi ’s (t−1) with probabilities ωi (1 ≤ i ≤ N ). 2 Generate (t) (t) θi ∼ Kt (θ|θ ) , x ∼ f (x|θi ) , 3 Check that (x, y) < , otherwise start again. [Sisson et al., 2007]
  • 74. On some computational methods for Bayesian model choice ABC model choice ABC method(s) ABC-PRC Another sequential version producing a sequence of Markov (t) (t) transition kernels Kt and of samples (θ1 , . . . , θN ) (1 ≤ t ≤ T ) ABC-PRC Algorithm (t−1) 1 Pick a θ is selected at random among the previous θi ’s (t−1) with probabilities ωi (1 ≤ i ≤ N ). 2 Generate (t) (t) θi ∼ Kt (θ|θ ) , x ∼ f (x|θi ) , 3 Check that (x, y) < , otherwise start again. [Sisson et al., 2007]
  • 75. On some computational methods for Bayesian model choice ABC model choice ABC method(s) ABC-PRC weight (t) Probability ωi computed as (t) (t) (t) (t) ωi ∝ π(θi )Lt−1 (θ |θi ){π(θ )Kt (θi |θ )}−1 , where Lt−1 is an arbitrary transition kernel. In case Lt−1 (θ |θ) = Kt (θ|θ ) , all weights are equal under a uniform prior. Inspired from Del Moral et al. (2006), who use backward kernels Lt−1 in SMC to achieve unbiasedness with a recent (2008) ABC-SMC version
  • 76. On some computational methods for Bayesian model choice ABC model choice ABC method(s) ABC-PRC weight (t) Probability ωi computed as (t) (t) (t) (t) ωi ∝ π(θi )Lt−1 (θ |θi ){π(θ )Kt (θi |θ )}−1 , where Lt−1 is an arbitrary transition kernel. In case Lt−1 (θ |θ) = Kt (θ|θ ) , all weights are equal under a uniform prior. Inspired from Del Moral et al. (2006), who use backward kernels Lt−1 in SMC to achieve unbiasedness with a recent (2008) ABC-SMC version
  • 77. On some computational methods for Bayesian model choice ABC model choice ABC method(s) ABC-PRC weight (t) Probability ωi computed as (t) (t) (t) (t) ωi ∝ π(θi )Lt−1 (θ |θi ){π(θ )Kt (θi |θ )}−1 , where Lt−1 is an arbitrary transition kernel. In case Lt−1 (θ |θ) = Kt (θ|θ ) , all weights are equal under a uniform prior. Inspired from Del Moral et al. (2006), who use backward kernels Lt−1 in SMC to achieve unbiasedness with a recent (2008) ABC-SMC version
  • 78. On some computational methods for Bayesian model choice ABC model choice ABC method(s) ABC-PRC weight (t) Probability ωi computed as (t) (t) (t) (t) ωi ∝ π(θi )Lt−1 (θ |θi ){π(θ )Kt (θi |θ )}−1 , where Lt−1 is an arbitrary transition kernel. In case Lt−1 (θ |θ) = Kt (θ|θ ) , all weights are equal under a uniform prior. Inspired from Del Moral et al. (2006), who use backward kernels Lt−1 in SMC to achieve unbiasedness with a recent (2008) ABC-SMC version
  • 79. On some computational methods for Bayesian model choice ABC model choice ABC method(s) ABC-PRC bias Lack of unbiasedness of the method Joint density of the accepted pair (θ(t−1) , θ(t) ) proportional to π(θ(t−1) |y)Kt (θ(t) |θ(t−1) )f (y|θ(t) ) , For an arbitrary function h(θ), E[ωt h(θ(t) )] proportional to π(θ (t) )Lt−1 (θ (t−1) |θ (t) ) ZZ (t) (t−1) (t) (t−1) (t) (t−1) (t) h(θ ) π(θ |y)Kt (θ |θ )f (y|θ )dθ dθ π(θ (t−1) )Kt (θ (t) |θ (t−1) ) π(θ (t) )Lt−1 (θ (t−1) |θ (t) ) ZZ (t) (t−1) (t−1) ∝ h(θ ) π(θ )f (y|θ ) π(θ (t−1) )Kt (θ (t) |θ (t−1) ) (t) (t−1) (t) (t−1) (t) × Kt (θ |θ )f (y|θ )dθ dθ Z Z ff (t) (t) (t−1) (t) (t−1) (t−1) (t) ∝ h(θ )π(θ |y) Lt−1 (θ |θ )f (y|θ )dθ dθ .
  • 80. On some computational methods for Bayesian model choice ABC model choice ABC method(s) ABC-PRC bias Lack of unbiasedness of the method Joint density of the accepted pair (θ(t−1) , θ(t) ) proportional to π(θ(t−1) |y)Kt (θ(t) |θ(t−1) )f (y|θ(t) ) , For an arbitrary function h(θ), E[ωt h(θ(t) )] proportional to π(θ (t) )Lt−1 (θ (t−1) |θ (t) ) ZZ (t) (t−1) (t) (t−1) (t) (t−1) (t) h(θ ) π(θ |y)Kt (θ |θ )f (y|θ )dθ dθ π(θ (t−1) )Kt (θ (t) |θ (t−1) ) π(θ (t) )Lt−1 (θ (t−1) |θ (t) ) ZZ (t) (t−1) (t−1) ∝ h(θ ) π(θ )f (y|θ ) π(θ (t−1) )Kt (θ (t) |θ (t−1) ) (t) (t−1) (t) (t−1) (t) × Kt (θ |θ )f (y|θ )dθ dθ Z Z ff (t) (t) (t−1) (t) (t−1) (t−1) (t) ∝ h(θ )π(θ |y) Lt−1 (θ |θ )f (y|θ )dθ dθ .
  • 81. On some computational methods for Bayesian model choice ABC model choice ABC method(s) A mixture example (0) Toy model of Sisson et al. (2007): if θ ∼ U(−10, 10) , x|θ ∼ 0.5 N (θ, 1) + 0.5 N (θ, 1/100) , then the posterior distribution associated with y = 0 is the normal mixture θ|y = 0 ∼ 0.5 N (0, 1) + 0.5 N (0, 1/100) restricted to [−10, 10]. Furthermore, true target available as π(θ||x| < ) ∝ Φ( −θ)−Φ(− −θ)+Φ(10( −θ))−Φ(−10( +θ)) .
  • 82. On some computational methods for Bayesian model choice ABC model choice ABC method(s) A mixture example (1) 1.0 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3 θ θ θ θ θ 1.0 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3 θ θ θ θ θ Comparison of τ = 0.15 and τ = 1/0.15 in Kt
  • 83. On some computational methods for Bayesian model choice ABC model choice ABC-PMC A PMC version Use of the same kernel idea as ABC-PRC but with IS correction Generate a sample at iteration t by N (t−1) (t−1) πt (θ(t) ) ∝ ˆ ωj Kt (θ(t) |θj ) j=1 modulo acceptance of the associated xt , and use an importance (t) weight associated with an accepted simulation θi (t) (t) (t) ωi ∝ π(θi ) πt (θi ) . ˆ c Still likelihood free [Beaumont et al., 2009]
  • 84. On some computational methods for Bayesian model choice ABC model choice ABC-PMC A mixture example (2) Recovery of the target, whether using a fixed standard deviation of τ = 0.15 or τ = 1/0.15,
  • 85. On some computational methods for Bayesian model choice ABC model choice ABC for model choice in GRFs Gibbs random fields Gibbs distribution The rv y = (y1 , . . . , yn ) is a Gibbs random field associated with the graph G if 1 f (y) = exp − Vc (yc ) , Z c∈C where Z is the normalising constant, C is the set of cliques of G and Vc is any function also called potential U (y) = c∈C Vc (yc ) is the energy function c Z is usually unavailable in closed form
  • 86. On some computational methods for Bayesian model choice ABC model choice ABC for model choice in GRFs Gibbs random fields Gibbs distribution The rv y = (y1 , . . . , yn ) is a Gibbs random field associated with the graph G if 1 f (y) = exp − Vc (yc ) , Z c∈C where Z is the normalising constant, C is the set of cliques of G and Vc is any function also called potential U (y) = c∈C Vc (yc ) is the energy function c Z is usually unavailable in closed form
  • 87. On some computational methods for Bayesian model choice ABC model choice ABC for model choice in GRFs Potts model Potts model Vc (y) is of the form Vc (y) = θS(y) = θ δyl =yi l∼i where l∼i denotes a neighbourhood structure In most realistic settings, summation Zθ = exp{θT S(x)} x∈X involves too many terms to be manageable and numerical approximations cannot always be trusted [Cucala, Marin, CPR & Titterington, 2009]
  • 88. On some computational methods for Bayesian model choice ABC model choice ABC for model choice in GRFs Potts model Potts model Vc (y) is of the form Vc (y) = θS(y) = θ δyl =yi l∼i where l∼i denotes a neighbourhood structure In most realistic settings, summation Zθ = exp{θT S(x)} x∈X involves too many terms to be manageable and numerical approximations cannot always be trusted [Cucala, Marin, CPR & Titterington, 2009]
  • 89. On some computational methods for Bayesian model choice ABC model choice ABC for model choice in GRFs Bayesian Model Choice Comparing a model with potential S0 taking values in Rp0 versus a model with potential S1 taking values in Rp1 can be done through the Bayes factor corresponding to the priors π0 and π1 on each parameter space T exp{θ0 S0 (x)}/Zθ0 ,0 π0 (dθ0 ) Bm0 /m1 (x) = T exp{θ1 S1 (x)}/Zθ1 ,1 π1 (dθ1 )
  • 90. On some computational methods for Bayesian model choice ABC model choice ABC for model choice in GRFs Neighbourhood relations Choice to be made between M neighbourhood relations m i∼i (0 ≤ m ≤ M − 1) with Sm (x) = I{xi =xi } m i∼i driven by the posterior probabilities of the models.
  • 91. On some computational methods for Bayesian model choice ABC model choice ABC for model choice in GRFs Model index Formalisation via a model index M that appears as a new parameter with prior distribution π(M = m) and π(θ|M = m) = πm (θm ) Computational target: P(M = m|x) ∝ fm (x|θm )πm (θm ) dθm π(M = m) , Θm
  • 92. On some computational methods for Bayesian model choice ABC model choice ABC for model choice in GRFs Model index Formalisation via a model index M that appears as a new parameter with prior distribution π(M = m) and π(θ|M = m) = πm (θm ) Computational target: P(M = m|x) ∝ fm (x|θm )πm (θm ) dθm π(M = m) , Θm
  • 93. On some computational methods for Bayesian model choice ABC model choice ABC for model choice in GRFs Sufficient statistics By definition, if S(x) sufficient statistic for the joint parameters (M, θ0 , . . . , θM −1 ), P(M = m|x) = P(M = m|S(x)) . For each model m, own sufficient statistic Sm (·) and S(·) = (S0 (·), . . . , SM −1 (·)) also sufficient. For Gibbs random fields, 1 2 x|M = m ∼ fm (x|θm ) = fm (x|S(x))fm (S(x)|θm ) 1 = f 2 (S(x)|θm ) n(S(x)) m where n(S(x)) = {˜ ∈ X : S(˜ ) = S(x)} x x c S(x) is therefore also sufficient for the joint parameters [Specific to Gibbs random fields!]
  • 94. On some computational methods for Bayesian model choice ABC model choice ABC for model choice in GRFs Sufficient statistics By definition, if S(x) sufficient statistic for the joint parameters (M, θ0 , . . . , θM −1 ), P(M = m|x) = P(M = m|S(x)) . For each model m, own sufficient statistic Sm (·) and S(·) = (S0 (·), . . . , SM −1 (·)) also sufficient. For Gibbs random fields, 1 2 x|M = m ∼ fm (x|θm ) = fm (x|S(x))fm (S(x)|θm ) 1 = f 2 (S(x)|θm ) n(S(x)) m where n(S(x)) = {˜ ∈ X : S(˜ ) = S(x)} x x c S(x) is therefore also sufficient for the joint parameters [Specific to Gibbs random fields!]
  • 95. On some computational methods for Bayesian model choice ABC model choice ABC for model choice in GRFs Sufficient statistics By definition, if S(x) sufficient statistic for the joint parameters (M, θ0 , . . . , θM −1 ), P(M = m|x) = P(M = m|S(x)) . For each model m, own sufficient statistic Sm (·) and S(·) = (S0 (·), . . . , SM −1 (·)) also sufficient. For Gibbs random fields, 1 2 x|M = m ∼ fm (x|θm ) = fm (x|S(x))fm (S(x)|θm ) 1 = f 2 (S(x)|θm ) n(S(x)) m where n(S(x)) = {˜ ∈ X : S(˜ ) = S(x)} x x c S(x) is therefore also sufficient for the joint parameters [Specific to Gibbs random fields!]
  • 96. On some computational methods for Bayesian model choice ABC model choice ABC for model choice in GRFs ABC model choice algorithm ABC-MC Generate m∗ from the prior π(M = m). ∗ Generate θm∗ from the prior πm∗ (·). Generate x∗ from the model fm∗ (·|θm∗ ). ∗ Compute the distance ρ(S(x0 ), S(x∗ )). Accept (θm∗ , m∗ ) if ρ(S(x0 ), S(x∗ )) < . ∗ [Cornuet, Grelaud, Marin & Robert, 2009] Note When = 0 the algorithm is exact
  • 97. On some computational methods for Bayesian model choice ABC model choice ABC for model choice in GRFs ABC approximation to the Bayes factor Frequency ratio: ˆ P(M = m0 |x0 ) π(M = m1 ) BF m0 /m1 (x0 ) = × ˆ P(M = m1 |x0 ) π(M = m0 ) {mi∗ = m0 } π(M = m1 ) = × , {mi∗ = m1 } π(M = m0 ) replaced with 1 + {mi∗ = m0 } π(M = m1 ) BF m0 /m1 (x0 ) = × 1 + {mi∗ = m1 } π(M = m0 ) to avoid indeterminacy (also Bayes estimate).
  • 98. On some computational methods for Bayesian model choice ABC model choice ABC for model choice in GRFs ABC approximation to the Bayes factor Frequency ratio: ˆ P(M = m0 |x0 ) π(M = m1 ) BF m0 /m1 (x0 ) = × ˆ P(M = m1 |x0 ) π(M = m0 ) {mi∗ = m0 } π(M = m1 ) = × , {mi∗ = m1 } π(M = m0 ) replaced with 1 + {mi∗ = m0 } π(M = m1 ) BF m0 /m1 (x0 ) = × 1 + {mi∗ = m1 } π(M = m0 ) to avoid indeterminacy (also Bayes estimate).
  • 99. On some computational methods for Bayesian model choice ABC model choice Illustrations Toy example iid Bernoulli model versus two-state first-order Markov chain, i.e. n f0 (x|θ0 ) = exp θ0 I{xi =1} {1 + exp(θ0 )}n , i=1 versus n 1 f1 (x|θ1 ) = exp θ1 I{xi =xi−1 } {1 + exp(θ1 )}n−1 , 2 i=2 with priors θ0 ∼ U(−5, 5) and θ1 ∼ U(0, 6) (inspired by “phase transition” boundaries).
  • 100. On some computational methods for Bayesian model choice ABC model choice Illustrations Toy example (2) 10 5 5 BF01 BF01 0 ^ ^ 0 −5 −5 −10 −40 −20 0 10 −40 −20 0 10 BF01 BF01 (left) Comparison of the true BF m0 /m1 (x0 ) with BF m0 /m1 (x0 ) (in logs) over 2, 000 simulations and 4.106 proposals from the prior. (right) Same when using tolerance corresponding to the 1% quantile on the distances.
  • 101. On some computational methods for Bayesian model choice ABC model choice Illustrations Protein folding Superposition of the native structure (grey) with the ST1 structure (red.), the ST2 structure (orange), the ST3 structure (green), and the DT structure (blue).
  • 102. On some computational methods for Bayesian model choice ABC model choice Illustrations Protein folding (2) % seq . Id. TM-score FROST score 1i5nA (ST1) 32 0.86 75.3 1ls1A1 (ST2) 5 0.42 8.9 1jr8A (ST3) 4 0.24 8.9 1s7oA (DT) 10 0.08 7.8 Characteristics of dataset. % seq. Id.: percentage of identity with the query sequence. TM-score.: similarity between predicted and native structure (uncertainty between 0.17 and 0.4) FROST score: quality of alignment of the query onto the candidate structure (uncertainty between 7 and 9).
  • 103. On some computational methods for Bayesian model choice ABC model choice Illustrations Protein folding (3) NS/ST1 NS/ST2 NS/ST3 NS/DT BF 1.34 1.22 2.42 2.76 P(M = NS|x0 ) 0.573 0.551 0.708 0.734 Estimates of the Bayes factors between model NS and models ST1, ST2, ST3, and DT, and corresponding posterior probabilities of model NS based on an ABC-MC algorithm using 1.2 106 simulations and a tolerance equal to the 1% quantile of the distances.