Jsm09 talk

1. On some computational methods for Bayesian model choice On some computational methods for Bayesian model choice Christian P. Robert CREST-INSEE and Universit´ Paris Dauphine e http://xianblog.wordpress.com JSM 2009, Washington D.C., August 2, 2009 Joint works with M. Beaumont, N. Chopin, J.-M. Cornuet, J.-M. Marin and D. Wraith

2. On some computational methods for Bayesian model choice Outline 1 Evidence 2 Importance sampling solutions 3 Nested sampling 4 ABC model choice

3. On some computational methods for Bayesian model choice Evidence Model choice and model comparison Choice between models Several models available for the same observation vector x Mi : x ∼ fi (x|θi ), i∈I where the family I can be ﬁnite or inﬁnite

4. On some computational methods for Bayesian model choice Evidence Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi deﬁne priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj j Θj take largest π(Mi |x) to determine “best” model, or use averaged predictive π(Mj |x) fj (x |θj )πj (θj |x)dθj j Θj

8. On some computational methods for Bayesian model choice Evidence Bayes factor Deﬁnition (Bayes factors) Central quantity Θi fi (x|θi )πi (θi )dθi mi (x) Bij (x) == = Θj fj (x|θj )πj (θj )dθj mj (x) [Jeﬀreys, 1939]

9. On some computational methods for Bayesian model choice Evidence Evidence All these problems end up with a similar quantity, the evidence k = 1, . . . Zk = πk (θk )Lk (θk ) dθk , Θk aka the marginal likelihood.

10. On some computational methods for Bayesian model choice Importance sampling solutions Importance sampling revisited 1 Evidence 2 Importance sampling solutions Bridge sampling Harmonic means One bridge back 3 Nested sampling 4 ABC model choice

11. On some computational methods for Bayesian model choice Importance sampling solutions Bridge sampling Bayes factor approximation When approximating the Bayes factor f0 (x|θ0 )π0 (θ0 )dθ0 Θ0 B01 = f1 (x|θ1 )π1 (θ1 )dθ1 Θ1 use of importance functions 0 and 1 and n−1 0 n0 i i i=1 f0 (x|θ0 )π0 (θ0 )/ i 0 (θ0 ) B01 = n−1 1 n1 i i i=1 f1 (x|θ1 )π1 (θ1 )/ i 1 (θ1 )

12. On some computational methods for Bayesian model choice Importance sampling solutions Bridge sampling Bridge sampling identity Generic representation π2 (θ|x)α(θ)π1 (θ|x)dθ ˜ B12 = ∀ α(·) π1 (θ|x)α(θ)π2 (θ|x)dθ ˜ n1 1 π2 (θ1i |x)α(θ1i ) ˜ n1 i=1 ≈ n2 θji ∼ πj (θ|x) 1 π1 (θ2i |x)α(θ2i ) ˜ n2 i=1 [Gelman & Meng, 1998; Chen, Shao & Ibrahim, 2000]

14. On some computational methods for Bayesian model choice Importance sampling solutions Bridge sampling Optimal bridge sampling (2) Reason: Var(B12 ) 1 π1 (θ)π2 (θ)[n1 π1 (θ) + n2 π2 (θ)]α(θ)2 dθ 2 ≈ 2 −1 B12 n1 n2 π1 (θ)π2 (θ)α(θ) dθ δ method Drag: Dependence on the unknown normalising constants solved iteratively

15. On some computational methods for Bayesian model choice Importance sampling solutions Bridge sampling Optimal bridge sampling (2) Reason: Var(B12 ) 1 π1 (θ)π2 (θ)[n1 π1 (θ) + n2 π2 (θ)]α(θ)2 dθ 2 ≈ 2 −1 B12 n1 n2 π1 (θ)π2 (θ)α(θ) dθ δ method Drag: Dependence on the unknown normalising constants solved iteratively

16. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Zk from a posterior sample Use of the [harmonic mean] identity ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1 Eπk x = dθk = πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk no matter what the proposal ϕ(·) is. [Gelfand & Dey, 1994; Bartolucci et al., 2006] Direct exploitation of the MCMC output

17. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Zk from a posterior sample Use of the [harmonic mean] identity ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1 Eπk x = dθk = πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk no matter what the proposal ϕ(·) is. [Gelfand & Dey, 1994; Bartolucci et al., 2006] Direct exploitation of the MCMC output

18. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means RJ-MCMC interpretation If ϕ is the density of an alternative to model pik (θk )Lk (θk |x), the acceptance probability for a move from model pik (θk )Lk (θk |x) to model ϕ(θk ) is based upon ϕ(θk ) πk (θk )Lk (θk |x) (under a proper choice of prior weight) Unbiased estimator of the odds ratio [Bartolucci, Scaccia, and Mira, 2006]

19. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means An infamous example When ϕ(θk ) = πk (θk ) ˜ harmonic mean approximation to B12 is n1 1 1/L1 (θ1i |x) n1 i=1 B21 = n2 θji ∼ πj (θ|x) 1 1/L2 (θ2i |x) n2 i=1 [Newton & Raftery, 1994; Raftery et al., 2008] Infamous: Most often leads to an inﬁnite variance!!! [Radford Neal’s blog, 2008]

20. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means An infamous example When ϕ(θk ) = πk (θk ) ˜ harmonic mean approximation to B12 is n1 1 1/L1 (θ1i |x) n1 i=1 B21 = n2 θji ∼ πj (θ|x) 1 1/L2 (θ2i |x) n2 i=1 [Newton & Raftery, 1994; Raftery et al., 2008] Infamous: Most often leads to an inﬁnite variance!!! [Radford Neal’s blog, 2008]

21. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means “The Worst Monte Carlo Method Ever” “The good news is that the Law of Large Numbers guarantees that this estimator is consistent ie, it will very likely be very close to the correct answer if you use a suﬃciently large number of points from the posterior distribution. The bad news is that the number of points required for this estimator to get close to the right answer will often be greater than the number of atoms in the observable universe. The even worse news is that it’s easy for people to not realize this, and to naively accept estimates that are nowhere close to the correct value of the marginal likelihood.” [Radford Neal’s blog, Aug. 23, 2008]

22. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means “The Worst Monte Carlo Method Ever” “The good news is that the Law of Large Numbers guarantees that this estimator is consistent ie, it will very likely be very close to the correct answer if you use a suﬃciently large number of points from the posterior distribution. The bad news is that the number of points required for this estimator to get close to the right answer will often be greater than the number of atoms in the observable universe. The even worse news is that it’s easy for people to not realize this, and to naively accept estimates that are nowhere close to the correct value of the marginal likelihood.” [Radford Neal’s blog, Aug. 23, 2008]

23. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Comparison with regular importance sampling Harmonic mean: Constraint opposed to usual importance sampling constraints: ϕ(θ) must have lighter (rather than fatter) tails than πk (θk )Lk (θk ) for the approximation T (t) 1 ϕ(θk ) Z1k = 1 (t) (t) T πk (θk )Lk (θk ) t=1 to have a ﬁnite variance. E.g., use ﬁnite support kernels for ϕ

24. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Comparison with regular importance sampling Harmonic mean: Constraint opposed to usual importance sampling constraints: ϕ(θ) must have lighter (rather than fatter) tails than πk (θk )Lk (θk ) for the approximation T (t) 1 ϕ(θk ) Z1k = 1 (t) (t) T πk (θk )Lk (θk ) t=1 to have a ﬁnite variance. E.g., use ﬁnite support kernels for ϕ

25. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means HPD supports In practice, possibility to use the HPD region deﬁned by an MCMC sample as Hα = {θ; π (θ|x) ≥ qα [˜ (·|x)]} ˜ ˆ π where qα [˜ (·|x)] is the empirical α quantile of the observed ˆ π π (θi |x))’s ˜ c Construction of ϕ as uniformly supported by Hα or an ellipsoid approximation whose volume can be computed

26. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means HPD supports In practice, possibility to use the HPD region deﬁned by an MCMC sample as Hα = {θ; π (θ|x) ≥ qα [˜ (·|x)]} ˜ ˆ π where qα [˜ (·|x)] is the empirical α quantile of the observed ˆ π π (θi |x))’s ˜ c Construction of ϕ as uniformly supported by Hα or an ellipsoid approximation whose volume can be computed

27. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means HPD supports: illustration Gibbs sample of 103 parameters (θ, σ 2 ) produced for the normal model, x1 , . . . , xn ∼ N (θ, σ 2 ) with x = 0, s2 = 1 and n = 10, under Jeﬀreys’ prior, leads to a straightforward approximation to the 10% HPD region, replaced by an ellipsoid.

28. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means HPD supports: illustration Gibbs sample of 103 parameters (θ, σ 2 ) produced for the normal model, x1 , . . . , xn ∼ N (θ, σ 2 ) with x = 0, s2 = 1 and n = 10, under Jeﬀreys’ prior, leads to a straightforward approximation to the 50% HPD region, replaced by an ellipsoid.

29. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Comparison with regular importance sampling (cont’d) Compare Z1k with a standard importance sampling approximation T (t) (t) 1 πk (θk )Lk (θk ) Z2k = (t) T ϕ(θk ) t=1 (t) where the θk ’s are generated from the density ϕ(·) (with fatter tails like t’s)

30. On some computational methods for Bayesian model choice Importance sampling solutions One bridge back Approximating Zk using a mixture representation Bridge sampling recall Design a speciﬁc mixture for simulation [importance sampling] purposes, with density ϕk (θk ) ∝ ωπk (θk )Lk (θk ) + ϕ(θk ) , where ϕ(·) is arbitrary (but normalised) Note: ω is not a probability weight

31. On some computational methods for Bayesian model choice Importance sampling solutions One bridge back Approximating Zk using a mixture representation Bridge sampling recall Design a speciﬁc mixture for simulation [importance sampling] purposes, with density ϕk (θk ) ∝ ωπk (θk )Lk (θk ) + ϕ(θk ) , where ϕ(·) is arbitrary (but normalised) Note: ω is not a probability weight

32. On some computational methods for Bayesian model choice Importance sampling solutions One bridge back Approximating Z using a mixture representation (cont’d) Corresponding MCMC (=Gibbs) sampler At iteration t 1 Take δ (t) = 1 with probability (t−1) (t−1) (t−1) (t−1) (t−1) ωπk (θk )Lk (θk ) ωπk (θk )Lk (θk ) + ϕ(θk ) and δ (t) = 2 otherwise; (t) (t−1) 2 If δ (t) = 1, generate θk ∼ MCMC(θk , θk ) where MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated with the posterior πk (θk |x) ∝ πk (θk )Lk (θk ); (t) 3 If δ (t) = 2, generate θk ∼ ϕ(θk ) independently

35. On some computational methods for Bayesian model choice Importance sampling solutions One bridge back Evidence approximation by mixtures Rao-Blackwellised estimate T ˆ 1 ξ= (t) ωπk (θk )Lk (θk ) (t) (t) (t) ωπk (θk )Lk (θk ) + ϕ(θk ) , (t) T t=1 converges to ωZk /{ωZk + 1} 3k ˆ ˆ ˆ Deduce Zˆ from ω Z3k /{ω Z3k + 1} = ξ ie T (t) (t) (t) (t) (t) t=1 ωπk (θk )Lk (θk ) ωπ(θk )Lk (θk ) + ϕ(θk ) ˆ Z3k = T (t) (t) (t) (t) t=1 ϕ(θk ) ωπk (θk )Lk (θk ) + ϕ(θk ) [Bridge sampler redux]

36. On some computational methods for Bayesian model choice Importance sampling solutions One bridge back Evidence approximation by mixtures Rao-Blackwellised estimate T ˆ 1 ξ= (t) ωπk (θk )Lk (θk ) (t) (t) (t) ωπk (θk )Lk (θk ) + ϕ(θk ) , (t) T t=1 converges to ωZk /{ωZk + 1} 3k ˆ ˆ ˆ Deduce Zˆ from ω Z3k /{ω Z3k + 1} = ξ ie T (t) (t) (t) (t) (t) t=1 ωπk (θk )Lk (θk ) ωπ(θk )Lk (θk ) + ϕ(θk ) ˆ Z3k = T (t) (t) (t) (t) t=1 ϕ(θk ) ωπk (θk )Lk (θk ) + ϕ(θk ) [Bridge sampler redux]

37. On some computational methods for Bayesian model choice Importance sampling solutions One bridge back Mixture illustration For the normal example, use of the same uniform ϕ over the ellipse approximating the same HPD region Hα and ω equal to one-tenth of the true evidence. Gibbs samples of size 104

38. On some computational methods for Bayesian model choice Importance sampling solutions One bridge back Mixture illustration For the normal example, use of the same uniform ϕ over the ellipse approximating the same HPD region Hα and ω equal to one-tenth of the true evidence. Gibbs samples of size 104 and 100 Monte Carlo replicas

39. On some computational methods for Bayesian model choice Nested sampling Nested sampling 1 Evidence 2 Importance sampling solutions 3 Nested sampling Purpose Implementation Error rates Constraints 4 ABC model choice

40. On some computational methods for Bayesian model choice Nested sampling Purpose Nested sampling: Goal Skilling’s (2007) technique using the one-dimensional representation: 1 Z = Eπ [L(θ)] = ϕ(x) dx 0 with ϕ−1 (l) = P π (L(θ) > l). Note; ϕ(·) is intractable in most cases.

41. On some computational methods for Bayesian model choice Nested sampling Implementation Nested sampling: First approximation Approximate Z by a Riemann sum: N Z= (xi−1 − xi )ϕ(xi ) i=1 where the xi ’s are either: deterministic: xi = e−i/N or random: x0 = 0, xi+1 = ti xi , ti ∼ Be(N, 1) so that E[log xi ] = −i/N .

42. On some computational methods for Bayesian model choice Nested sampling Implementation Nested sampling: Alternative representation Another representation is N −1 Z= {ϕ(xi+1 ) − ϕ(xi )}xi i=0 which is a special case of N −1 Z= {L(θ(i+1) ) − L(θ(i) )}π({θ; L(θ) > L(θ(i) )}) i=0 where · · · L(θ(i+1) ) < L(θ(i) ) · · · [Lebesgue version of Riemann’s sum]

43. On some computational methods for Bayesian model choice Nested sampling Implementation Nested sampling: Alternative representation Another representation is N −1 Z= {ϕ(xi+1 ) − ϕ(xi )}xi i=0 which is a special case of N −1 Z= {L(θ(i+1) ) − L(θ(i) )}π({θ; L(θ) > L(θ(i) )}) i=0 where · · · L(θ(i+1) ) < L(θ(i) ) · · · [Lebesgue version of Riemann’s sum]

44. On some computational methods for Bayesian model choice Nested sampling Implementation Nested sampling: Second approximation Estimate (intractable) ϕ(xi ) by ϕi : Nested sampling Start with N values θ1 , . . . , θN sampled from π At iteration i, 1 Take ϕi = L(θk ), where θk is the point with smallest likelihood in the pool of θi ’s 2 Replace θk with a sample from the prior constrained to L(θ) > ϕi : the current N points are sampled from prior constrained to L(θ) > ϕi . Note that 1 π({θ; L(θ) > L(θ(i+1) )}) π({θ; L(θ) > L(θ(i) )}) ≈ 1 − N

48. On some computational methods for Bayesian model choice Nested sampling Implementation Nested sampling: Third approximation Iterate the above steps until a given stopping iteration j is reached: e.g., observe very small changes in the approximation Z; reach the maximal value of L(θ) when the likelihood is bounded and its maximum is known; truncate the integral Z at level , i.e. replace 1 1 ϕ(x) dx with ϕ(x) dx 0

49. On some computational methods for Bayesian model choice Nested sampling Error rates Approximation error Error = Z − Z j 1 = (xi−1 − xi )ϕi − ϕ(x) dx = − ϕ(x) dx i=1 0 0 j 1 + (xi−1 − xi )ϕ(xi ) − ϕ(x) dx (Quadrature Error) i=1 j + (xi−1 − xi ) {ϕi − ϕ(xi )} (Stochastic Error) i=1 [Dominated by Monte Carlo]

50. On some computational methods for Bayesian model choice Nested sampling Error rates A CLT for the Stochastic Error The (dominating) stochastic error is OP (N −1/2 ): D N 1/2 Z − Z → N (0, V ) with V =− sϕ (s)tϕ (t) log(s ∨ t) ds dt. s,t∈[ ,1] [Proof based on Donsker’s theorem]

51. On some computational methods for Bayesian model choice Nested sampling Error rates What of log Z? If the interest lies within log Z, Slutsky’s transform of the CLT: D N −1/2 log Z − log Z → N 0, V /Z2 Note: The number of simulated points equals the number of iterations j, and is a multiple of N : if one stops at ﬁrst iteration j such that e−j/N < , then: j = N − log .

54. On some computational methods for Bayesian model choice Nested sampling Constraints Sampling from constr’d priors Exact simulation from the constrained prior is intractable in most cases Skilling (2007) proposes to use MCMC but this introduces a bias (stopping rule) If implementable, then slice sampler can be devised at the same cost [or less]

57. On some computational methods for Bayesian model choice Nested sampling Constraints Banana illustration Case of a banana target made of a twisted 2D normal: x2 = x2 + βx2 − 100β 1 [Haario, Sacksman, Tamminen, 1999] β = .03, Σ = diag(100, 1)

58. On some computational methods for Bayesian model choice Nested sampling Constraints Banana illustration (2) Use of nested sampling with N = 1000, 50 MCMC steps with size 0.1, compared with a population Monte Carlo (PMC) based on 10 iterations with 5000 points per iteration and ﬁnal sample of 50000 points, using nine Student’s t components with 9 df [Wraith, Kilbinger, Benabed et al., 2009, Physica Rev. D] Evidence estimation

59. On some computational methods for Bayesian model choice Nested sampling Constraints Banana illustration (2) Use of nested sampling with N = 1000, 50 MCMC steps with size 0.1, compared with a population Monte Carlo (PMC) based on 10 iterations with 5000 points per iteration and ﬁnal sample of 50000 points, using nine Student’s t components with 9 df [Wraith, Kilbinger, Benabed et al., 2009, Physica Rev. D] E[X1 ] estimation

60. On some computational methods for Bayesian model choice Nested sampling Constraints Banana illustration (2) Use of nested sampling with N = 1000, 50 MCMC steps with size 0.1, compared with a population Monte Carlo (PMC) based on 10 iterations with 5000 points per iteration and ﬁnal sample of 50000 points, using nine Student’s t components with 9 df [Wraith, Kilbinger, Benabed et al., 2009, Physica Rev. D] E[X2 ] estimation

61. On some computational methods for Bayesian model choice ABC model choice ABC method(s) Approximate Bayesian Computation Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique: ABC algorithm For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly simulating θ ∼ π(θ) , x ∼ f (x|θ ) , until the auxiliary variable x is equal to the observed value, x = y. [Pritchard et al., 1999]

64. On some computational methods for Bayesian model choice ABC model choice ABC method(s) A as approximative When y is a continuous random variable, equality x = y is replaced with a tolerance condition, (x, y) ≤ where is a distance between summary statistics Output distributed from π(θ) Pθ { (x, y) < } ∝ π(θ| (x, y) < )

65. On some computational methods for Bayesian model choice ABC model choice ABC method(s) A as approximative When y is a continuous random variable, equality x = y is replaced with a tolerance condition, (x, y) ≤ where is a distance between summary statistics Output distributed from π(θ) Pθ { (x, y) < } ∝ π(θ| (x, y) < )

66. On some computational methods for Bayesian model choice ABC model choice ABC method(s) Dynamic area

67. On some computational methods for Bayesian model choice ABC model choice ABC method(s) ABC improvements Simulating from the prior is often poor in eﬃciency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y... [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger .... [Beaumont et al., 2002] ...or even by including in the inferential framework [Ratmann et al., 2009]

71. On some computational methods for Bayesian model choice ABC model choice ABC method(s) ABC-MCMC Markov chain (θ(t) ) created via the transition function  θ ∼ K(θ |θ(t) ) if x ∼ f (x|θ ) is such that x = y   (t+1) π(θ )K(θ(t) |θ ) θ = and u ∼ U(0, 1) ≤ π(θ(t) )K(θ |θ(t) ) ,   (t) θ otherwise, has the posterior π(θ|y) as stationary distribution [Marjoram et al, 2003]

72. On some computational methods for Bayesian model choice ABC model choice ABC method(s) ABC-MCMC Markov chain (θ(t) ) created via the transition function  θ ∼ K(θ |θ(t) ) if x ∼ f (x|θ ) is such that x = y   (t+1) π(θ )K(θ(t) |θ ) θ = and u ∼ U(0, 1) ≤ π(θ(t) )K(θ |θ(t) ) ,   (t) θ otherwise, has the posterior π(θ|y) as stationary distribution [Marjoram et al, 2003]

73. On some computational methods for Bayesian model choice ABC model choice ABC method(s) ABC-PRC Another sequential version producing a sequence of Markov (t) (t) transition kernels Kt and of samples (θ1 , . . . , θN ) (1 ≤ t ≤ T ) ABC-PRC Algorithm (t−1) 1 Pick a θ is selected at random among the previous θi ’s (t−1) with probabilities ωi (1 ≤ i ≤ N ). 2 Generate (t) (t) θi ∼ Kt (θ|θ ) , x ∼ f (x|θi ) , 3 Check that (x, y) < , otherwise start again. [Sisson et al., 2007]

74. On some computational methods for Bayesian model choice ABC model choice ABC method(s) ABC-PRC Another sequential version producing a sequence of Markov (t) (t) transition kernels Kt and of samples (θ1 , . . . , θN ) (1 ≤ t ≤ T ) ABC-PRC Algorithm (t−1) 1 Pick a θ is selected at random among the previous θi ’s (t−1) with probabilities ωi (1 ≤ i ≤ N ). 2 Generate (t) (t) θi ∼ Kt (θ|θ ) , x ∼ f (x|θi ) , 3 Check that (x, y) < , otherwise start again. [Sisson et al., 2007]

75. On some computational methods for Bayesian model choice ABC model choice ABC method(s) ABC-PRC weight (t) Probability ωi computed as (t) (t) (t) (t) ωi ∝ π(θi )Lt−1 (θ |θi ){π(θ )Kt (θi |θ )}−1 , where Lt−1 is an arbitrary transition kernel. In case Lt−1 (θ |θ) = Kt (θ|θ ) , all weights are equal under a uniform prior. Inspired from Del Moral et al. (2006), who use backward kernels Lt−1 in SMC to achieve unbiasedness with a recent (2008) ABC-SMC version

81. On some computational methods for Bayesian model choice ABC model choice ABC method(s) A mixture example (0) Toy model of Sisson et al. (2007): if θ ∼ U(−10, 10) , x|θ ∼ 0.5 N (θ, 1) + 0.5 N (θ, 1/100) , then the posterior distribution associated with y = 0 is the normal mixture θ|y = 0 ∼ 0.5 N (0, 1) + 0.5 N (0, 1/100) restricted to [−10, 10]. Furthermore, true target available as π(θ||x| < ) ∝ Φ( −θ)−Φ(− −θ)+Φ(10( −θ))−Φ(−10( +θ)) .

82. On some computational methods for Bayesian model choice ABC model choice ABC method(s) A mixture example (1) 1.0 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3 θ θ θ θ θ 1.0 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3 θ θ θ θ θ Comparison of τ = 0.15 and τ = 1/0.15 in Kt

83. On some computational methods for Bayesian model choice ABC model choice ABC-PMC A PMC version Use of the same kernel idea as ABC-PRC but with IS correction Generate a sample at iteration t by N (t−1) (t−1) πt (θ(t) ) ∝ ˆ ωj Kt (θ(t) |θj ) j=1 modulo acceptance of the associated xt , and use an importance (t) weight associated with an accepted simulation θi (t) (t) (t) ωi ∝ π(θi ) πt (θi ) . ˆ c Still likelihood free [Beaumont et al., 2009]

84. On some computational methods for Bayesian model choice ABC model choice ABC-PMC A mixture example (2) Recovery of the target, whether using a ﬁxed standard deviation of τ = 0.15 or τ = 1/0.15,

85. On some computational methods for Bayesian model choice ABC model choice ABC for model choice in GRFs Gibbs random ﬁelds Gibbs distribution The rv y = (y1 , . . . , yn ) is a Gibbs random ﬁeld associated with the graph G if 1 f (y) = exp − Vc (yc ) , Z c∈C where Z is the normalising constant, C is the set of cliques of G and Vc is any function also called potential U (y) = c∈C Vc (yc ) is the energy function c Z is usually unavailable in closed form

86. On some computational methods for Bayesian model choice ABC model choice ABC for model choice in GRFs Gibbs random ﬁelds Gibbs distribution The rv y = (y1 , . . . , yn ) is a Gibbs random ﬁeld associated with the graph G if 1 f (y) = exp − Vc (yc ) , Z c∈C where Z is the normalising constant, C is the set of cliques of G and Vc is any function also called potential U (y) = c∈C Vc (yc ) is the energy function c Z is usually unavailable in closed form

87. On some computational methods for Bayesian model choice ABC model choice ABC for model choice in GRFs Potts model Potts model Vc (y) is of the form Vc (y) = θS(y) = θ δyl =yi l∼i where l∼i denotes a neighbourhood structure In most realistic settings, summation Zθ = exp{θT S(x)} x∈X involves too many terms to be manageable and numerical approximations cannot always be trusted [Cucala, Marin, CPR & Titterington, 2009]

88. On some computational methods for Bayesian model choice ABC model choice ABC for model choice in GRFs Potts model Potts model Vc (y) is of the form Vc (y) = θS(y) = θ δyl =yi l∼i where l∼i denotes a neighbourhood structure In most realistic settings, summation Zθ = exp{θT S(x)} x∈X involves too many terms to be manageable and numerical approximations cannot always be trusted [Cucala, Marin, CPR & Titterington, 2009]

89. On some computational methods for Bayesian model choice ABC model choice ABC for model choice in GRFs Bayesian Model Choice Comparing a model with potential S0 taking values in Rp0 versus a model with potential S1 taking values in Rp1 can be done through the Bayes factor corresponding to the priors π0 and π1 on each parameter space T exp{θ0 S0 (x)}/Zθ0 ,0 π0 (dθ0 ) Bm0 /m1 (x) = T exp{θ1 S1 (x)}/Zθ1 ,1 π1 (dθ1 )

90. On some computational methods for Bayesian model choice ABC model choice ABC for model choice in GRFs Neighbourhood relations Choice to be made between M neighbourhood relations m i∼i (0 ≤ m ≤ M − 1) with Sm (x) = I{xi =xi } m i∼i driven by the posterior probabilities of the models.

91. On some computational methods for Bayesian model choice ABC model choice ABC for model choice in GRFs Model index Formalisation via a model index M that appears as a new parameter with prior distribution π(M = m) and π(θ|M = m) = πm (θm ) Computational target: P(M = m|x) ∝ fm (x|θm )πm (θm ) dθm π(M = m) , Θm

92. On some computational methods for Bayesian model choice ABC model choice ABC for model choice in GRFs Model index Formalisation via a model index M that appears as a new parameter with prior distribution π(M = m) and π(θ|M = m) = πm (θm ) Computational target: P(M = m|x) ∝ fm (x|θm )πm (θm ) dθm π(M = m) , Θm

93. On some computational methods for Bayesian model choice ABC model choice ABC for model choice in GRFs Sufficient statistics By definition, if S(x) sufficient statistic for the joint parameters (M, θ0 , . . . , θM −1 ), P(M = m|x) = P(M = m|S(x)) . For each model m, own sufficient statistic Sm (·) and S(·) = (S0 (·), . . . , SM −1 (·)) also sufficient. For Gibbs random fields, 1 2 x|M = m ∼ fm (x|θm ) = fm (x|S(x))fm (S(x)|θm ) 1 = f 2 (S(x)|θm ) n(S(x)) m where n(S(x)) = {˜ ∈ X : S(˜ ) = S(x)} x x c S(x) is therefore also sufficient for the joint parameters [Specific to Gibbs random fields!]

96. On some computational methods for Bayesian model choice ABC model choice ABC for model choice in GRFs ABC model choice algorithm ABC-MC Generate m∗ from the prior π(M = m). ∗ Generate θm∗ from the prior πm∗ (·). Generate x∗ from the model fm∗ (·|θm∗ ). ∗ Compute the distance ρ(S(x0 ), S(x∗ )). Accept (θm∗ , m∗ ) if ρ(S(x0 ), S(x∗ )) < . ∗ [Cornuet, Grelaud, Marin & Robert, 2009] Note When = 0 the algorithm is exact

97. On some computational methods for Bayesian model choice ABC model choice ABC for model choice in GRFs ABC approximation to the Bayes factor Frequency ratio: ˆ P(M = m0 |x0 ) π(M = m1 ) BF m0 /m1 (x0 ) = × ˆ P(M = m1 |x0 ) π(M = m0 ) {mi∗ = m0 } π(M = m1 ) = × , {mi∗ = m1 } π(M = m0 ) replaced with 1 + {mi∗ = m0 } π(M = m1 ) BF m0 /m1 (x0 ) = × 1 + {mi∗ = m1 } π(M = m0 ) to avoid indeterminacy (also Bayes estimate).

98. On some computational methods for Bayesian model choice ABC model choice ABC for model choice in GRFs ABC approximation to the Bayes factor Frequency ratio: ˆ P(M = m0 |x0 ) π(M = m1 ) BF m0 /m1 (x0 ) = × ˆ P(M = m1 |x0 ) π(M = m0 ) {mi∗ = m0 } π(M = m1 ) = × , {mi∗ = m1 } π(M = m0 ) replaced with 1 + {mi∗ = m0 } π(M = m1 ) BF m0 /m1 (x0 ) = × 1 + {mi∗ = m1 } π(M = m0 ) to avoid indeterminacy (also Bayes estimate).

99. On some computational methods for Bayesian model choice ABC model choice Illustrations Toy example iid Bernoulli model versus two-state ﬁrst-order Markov chain, i.e. n f0 (x|θ0 ) = exp θ0 I{xi =1} {1 + exp(θ0 )}n , i=1 versus n 1 f1 (x|θ1 ) = exp θ1 I{xi =xi−1 } {1 + exp(θ1 )}n−1 , 2 i=2 with priors θ0 ∼ U(−5, 5) and θ1 ∼ U(0, 6) (inspired by “phase transition” boundaries).

100. On some computational methods for Bayesian model choice ABC model choice Illustrations Toy example (2) 10 5 5 BF01 BF01 0 ^ ^ 0 −5 −5 −10 −40 −20 0 10 −40 −20 0 10 BF01 BF01 (left) Comparison of the true BF m0 /m1 (x0 ) with BF m0 /m1 (x0 ) (in logs) over 2, 000 simulations and 4.106 proposals from the prior. (right) Same when using tolerance corresponding to the 1% quantile on the distances.

101. On some computational methods for Bayesian model choice ABC model choice Illustrations Protein folding Superposition of the native structure (grey) with the ST1 structure (red.), the ST2 structure (orange), the ST3 structure (green), and the DT structure (blue).

102. On some computational methods for Bayesian model choice ABC model choice Illustrations Protein folding (2) % seq . Id. TM-score FROST score 1i5nA (ST1) 32 0.86 75.3 1ls1A1 (ST2) 5 0.42 8.9 1jr8A (ST3) 4 0.24 8.9 1s7oA (DT) 10 0.08 7.8 Characteristics of dataset. % seq. Id.: percentage of identity with the query sequence. TM-score.: similarity between predicted and native structure (uncertainty between 0.17 and 0.4) FROST score: quality of alignment of the query onto the candidate structure (uncertainty between 7 and 9).

103. On some computational methods for Bayesian model choice ABC model choice Illustrations Protein folding (3) NS/ST1 NS/ST2 NS/ST3 NS/DT BF 1.34 1.22 2.42 2.76 P(M = NS|x0 ) 0.573 0.551 0.708 0.734 Estimates of the Bayes factors between model NS and models ST1, ST2, ST3, and DT, and corresponding posterior probabilities of model NS based on an ABC-MC algorithm using 1.2 106 simulations and a tolerance equal to the 1% quantile of the distances.

Jsm09 talk

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (8)

Similaire à Jsm09 talk

Similaire à Jsm09 talk (20)

Plus de Christian Robert

Plus de Christian Robert (20)

Dernier

Dernier (20)

Jsm09 talk