Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

ABC-Gibbs

Talk on the ABC-Gibbs paper of Clarté, Robert, Ryder, and Stoehr

  • Identifiez-vous pour voir les commentaires

  • Soyez le premier à aimer ceci

ABC-Gibbs

  1. 1. Component-wise approximate Bayesian computation via Gibbs-like steps Christian P. Robert(1,2) Joint work with Grégoire Clarté(1) , Robin Ryder(1) , Julien Stoehr(1) (1) Université Paris-Dauphine, (2) University of Warwick Université Paris-Dauphine Approximate Bayesian Computation @ Clermont
  2. 2. ABC postdoc positions 2 post-doc positions open with the ABSint ANR research grant: Focus on approximate Bayesian techniques like ABC, variational Bayes, PAC-Bayes, Bayesian non-parametrics, scalable MCMC, and related topics. A potential direction of research would be the derivation of new Bayesian tools for model checking in such c omplex environments. Terms: up to 24 months, no teaching duty attached, primarily located in Université Paris-Dauphine, with supported periods in Oxford (J. Rousseau) [barring no-deal Brexit!] and visits to Mont- pellier (J.-M. Marin). No hard deadline. If interested, send application to me: bayesianstatistics@gmail.com
  3. 3. Approximate Bayesian computation (ABC) ABC is a computational method which stemmed from population ge- netics models about 20 years ago to deal with generative intractable distribution. [Tavaré et al., 1997; Beaumont et al., 2002] Settings of interest: the likelihood function f(x | θ) does not admit a closed form as a function of θ and/or is computationally too costly. 1. Model relying on a latent process z ∈ Z f(x | θ) = Z f(y, z | θ)µ(dz). 2. Model with intractable normalising constant f(x | θ) = 1 Z(θ) q(x | θ), where Z(θ) = X q(x | θ)µ(dx).
  4. 4. Approximate Bayesian computation (ABC) Bayesian settings: the target is π(θ | xobs ) ∝ π(θ)f(xobs | θ). Algorithm: Vanilla ABC Input: observed dataset xobs , number of iterations N, threshold ε, summary statistic s. for i = 1, . . . , N do θi ∼ π(·) xi ∼ f(· | θi) end return θi d(s(xobs ), s(xi)) ≤ ε s(xobs) ε (θi, S(xi))
  5. 5. Approximate Bayesian computation (ABC) Bayesian settings: the target is π(θ | xobs ) ∝ π(θ)f(xobs | θ). Algorithm: Vanilla ABC Input: observed dataset xobs , number of iterations N, threshold ε, summary statistic s. for i = 1, . . . , N do θi ∼ π(·) xi ∼ f(· | θi) end return θi d(s(xobs ), s(xi)) ≤ ε s(xobs) ε (θi, S(xi)) Ouput: distributed according to π(θ)Pθ d(S(xobs ), S(x)) < ε ∝ π(θ | d(S(xobs ), S(x)) < ε) ∝ πε(θ | s, xob
  6. 6. Approximate Bayesian computation (ABC) Two particular situations: π∞(θ | s, xobs ) ∝ π(θ) and π0(θ | s, xobs ) ∝ π(θ | s(xobs ))= π(θ | xobs ) Some difficulties raised by the vanilla version: Calibration of the threshold ε: from a regression or a k-nearest neighbour perspective. [Beaumont et al., 2002; Wilkinson, 2013; Biau et al., 2013] Selection of the summary statistic S: advances consider semi-automatic procedure using a pilot-run ABC or random forest methodology. [Fearnhead and Prangle, 2012; Prangle et al., 2014; Raynal et al., 2018] Simulating from the prior is often poor in efficiency: solutions consist in modifying the proposal distribution on θ to increase the density of x’s within the vicinity of y. [Marjoram et al., 2003; Toni et al., 2008]
  7. 7. A first example : hierarchical moving average model α µ1 µ2 µn. . . x1 x2 xn. . . σ σ1 σ2 σn. . . First parameter hierarchy: α = (α1, α2, α3) ∼ E(1)⊗3 ;. Independently for each i ∈ {1, . . . , n}, (βi,1, βi,2, βi,3) ∼ Dir(α1, α2, α3); µi = (βi,1 − βi,2, 2(βi,1 + βi,2) − 1). Second parameter hierarchy: σ = (σ1, σ2) ∼ C+ (1)⊗2 . Independently for each i ∈ {1, . . . , n}, σi ∼ IG(σ1, σ2). Model for xi: independently for each i ∈ {1, . . . , n}, xi ∼ MA2(µi, σi), i.e., for all j in N xi,j = yj + µi,1yj−1 + µi,2yj−2 , with yj ∼ N(0, σ2 i ).
  8. 8. A first example : toy dataset Settings: n = 5 times series of length T = 100 hierarchical model with 13 parameters. Figure: ABC posterior distribu- tion of µ1,1 along with the prior distribution (black line). Size of ABC reference table: N = 5.5 · 106 . ABC posterior sample size: 1000.
  9. 9. A first example : toy dataset Settings: n = 5 times series of length T = 100 hierarchical model with 13 parameters. Figure: ABC posterior distribu- tion of µ1,1 along with the prior distribution (black line). Size of ABC reference table: N = 5.5 · 106 . ABC posterior sample size: 1000. Not enough simulations to reach a decent threshold. Not enough time to produce enough simulations.
  10. 10. The Gibbs Sampler Our idea: combining ABC with Gibbs sampler in order to improve its ability to efficiently explorer Θ ⊂ Rn when the number n of pa- rameters increases.
  11. 11. The Gibbs Sampler Our idea: combining ABC with Gibbs sampler in order to improve its ability to efficiently explorer Θ ⊂ Rn when the number n of pa- rameters increases. The Gibbs Sampler produces a Markov chain with a target joint dis- tribution π by alternatively sampling from each of its conditionals. [Geman and Geman, 1984] Algorithm: Gibbs sampler Input: observed dataset xobs , number of iterations N, starting point θ(0) = (θ (0) 1 , . . . , θ (0) n ). for i = 1, . . . , N do for k = 1, . . . , n do θ (i) k ∼ π · | θ (i) 1 , . . . , θ (i) k−1, θ (i−1) k+1 , . . . , θ (i−1) n , xobs end end return θ(0) , . . . , θ(N)
  12. 12. Component-wise ABC Algorithm: Component-wise ABC Input: observed dataset xobs , number of iterations N, starting point θ(0) = (θ (0) 1 , . . . , θ (0) n ), threshold ε = (ε1, . . . , εn), statistics s1, . . . , sn. for i = 1, . . . , N do for j = 1, . . . , n do θ (i) j ∼ πεj (· | xobs , sj, θ (i) 1 , . . . , θ (i) j−1, θ (i−1) j+1 , . . . , θ (i−1) n ) end end return θ(0) , . . . , θ(N)
  13. 13. Component-wise ABC Algorithm: Component-wise ABC Input: observed dataset xobs , number of iterations N, starting point θ(0) = (θ (0) 1 , . . . , θ (0) n ), threshold ε = (ε1, . . . , εn), statistics s1, . . . , sn. for i = 1, . . . , N do for j = 1, . . . , n do θ (i) j ∼ πεj (· | xobs , sj, θ (i) 1 , . . . , θ (i) j−1, θ (i−1) j+1 , . . . , θ (i−1) n ) end end return θ(0) , . . . , θ(N) Questions: Is there a limiting distribution ν∞ ε to the algorithm? What is the nature of this limiting distribution?
  14. 14. OUTLINE 1 Hierarchical models 2 General case 3 Take home messages
  15. 15. OUTLINE 1 Hierarchical models 2 General case 3 Take home messages
  16. 16. ABC within Gibbs: Hierarchical models α µ1 µ2 µn. . . x1 x2 xn. . . Hierarchical Bayes models: often allow for simplified conditional distributions thanks to partial independence properties, e.g., xj | µj ∼ π(xj | µj), µj | α i.i.d. ∼ π(µj | α), α ∼ π(α). Algorithm: Component-wise ABC sampler for hierarchical model Input: observed dataset xobs , number of iterations N, thresholds εα and εµ, summary statistics sα and sµ. for i = 1, . . . , N do for j = 1, . . . , n do µ (i) j ∼ πεµ (· | xobs j , sµ, α(i−1) ) end α(i) ∼ πεα (· | µ(i) , sα) end
  17. 17. ABC within Gibbs: Hierarchical models Assumption: n = 1. Theorem (Clarté et al. [2019]) Assume there exists a non-empty convex set C with positive prior measure such that κ1 = inf sα(µ)∈C π(Bsα(µ), α/4) > 0 , κ2 = inf α inf sα(µ)∈C πεµ (Bsα(µ),3 α/2 | xobs , sµ, α) > 0 , κ3 = inf α πεµ (sα(µ) ∈ C | xobs , sµ, α) > 0 , Then the Markov chain converges geometrically in total variation distance to a stationary distribution ν∞ ε , with geometric rate 1 − κ1κ2κ2 3. If the prior on α is defined on a compact set, then the assumptions are satisfied.
  18. 18. ABC within Gibbs: Hierarchical models Theorem (Clarté et al. [2019]) Assume that, L0 = sup εα sup µ, ˜µ πεα (· | sα, µ) − π0(· | sα, ˜µ) TV < 1/2 , L1(εα) = sup µ πεα (· | sα, µ) − π0(· | sα, µ) TV −−−−→ εα→0 0 L2(εµ) = sup α πεµ (· | xobs , sµ, α) − π0(· | xobs , sµ, α) TV −−−−→ εµ→0 0 . Then, ν∞ ε − ν∞ 0 TV ≤ L1(εα) + L2(εµ) 1 − 2L0 −−−→ ε→0 0.
  19. 19. ABC within Gibbs: Hierarchical models Compatibility issue: ν∞ 0 is the limiting distribution associated to Gibbs conditionals with different acceptance events, e.g., different statis- tics π(α)π(sα(µ) | α) and π(µ)f(sµ(xobs ) | α, µ). Conditionals may then be incompatible and the limiting distribution not a genuine posterior [incoherent use of data] unknown [except for a specific version] possibly far from a genuine posterior Proposition (Clarté et al. [2019]) If sα is jointly sufficient, when the precision ε goes to zero, ABC within Gibbs and ABC have the same limiting distribution.
  20. 20. Hierarchical models: toy example Model: α ∼ U([0 ; 20]), (µ1, . . . , µn) | α ∼ N(α, 1)⊗n , (xi,1, . . . , xi,K) | µi ∼ N (µi, 0.1) ⊗K . Numerical experiment: n = 20, K = 10, Pseudo observation generated for α = 1.7, Algorithms runs for a constant budget: Ntot = N × Nε = 21000. We look at the estimates for µ1 whose value for the pseudo obser- vations is 3.04.
  21. 21. Hierarchical models: toy example Figure: comparison of the sampled densities of µ1 (left) and α (right) [dot-dashed line corresponds to the true posterior] 0 1 2 3 4 0 2 4 6 0.0 0.5 1.0 1.5 2.0 −4 −2 0 2 4 Method ABC Gibbs Simple ABC
  22. 22. Hierarchical models: moving average example [introduction] Pseudo observations: xobs 1 generated for µ1 = (−0.06, −0.22). 0 1 2 3 −1.0 −0.5 0.0 0.5 1.0 value density type ABCGibbs ABCsimple prior 1st parameter, 1st coordinate −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 b1 b2 0.2 0.4 0.6 0.8 level 1st parameter simple −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 b1b2 2.5 5.0 7.5 10.0 level 1st parameter gibbs Separation from the prior for identical number of simulations.
  23. 23. Hierarchical models: moving average example [introduction] Real dataset: measures of 8GHz daily flux intensity emitted by 7 stellar objects from the NRL GBI website: http://ese.nrl.navy. mil/. [Lazio et al., 2008] 0 1 2 3 −1.0 −0.5 0.0 0.5 1.0 value density type ABCGibbs ABCsimple prior 1st parameter, 1st coordinate −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 b1 b2 2 4 6 8 level 1st parameter gibbs Separation from the prior for identical number of simulations.
  24. 24. Hierarchical models: moving average example [introduction] Real dataset: measures of 8GHz daily flux intensity emitted by 7 stellar objects from the NRL GBI website: http://ese.nrl.navy. mil/. [Lazio et al., 2008] 0 1 2 3 −1.0 −0.5 0.0 0.5 1.0 value density type ABCGibbs ABCsimple prior 1st parameter, 1st coordinate −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 b1 b2 0.2 0.4 0.6 level 1st parameter simple −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 b1 b2 2 4 6 8 level 1st parameter gibbs Separation from the prior for identical number of simulations.
  25. 25. Hierarchical models: g&k example Model: the g-and-k distribution is defined through the inverse of its cdf. It is easy to simulate from but there is no closed-form formula for the pdf: r ∈ (0, 1) → A+B 1 + 0.8 1 − exp(−gΦ−1 (r) 1 + exp(−gΦ−1(r) 1 + Φ−1 (r)2 k Φ−1 (r). α A1 A2 ... An x1 x2 xn ... B g k
  26. 26. Hierarchical models: g&k example Assumption: B, g and k known, inference on α and Ai solely. 1 2 3 4 Hyperparameter −7.5 −7.0 −6.5 −6.0 −5.5 −5.0−7.5 −7.0 −6.5 −6.0 −5.5 −5.0−7.5 −7.0 −6.5 −6.0 −5.5 −5.0−7.5 −7.0 −6.5 −6.0 −5.5 −5.0−7.5 −7.0 −6.5 −6.0 −5.5 −5.0 0 2 4 6 8 value density Method ABC Gibbs ABC−SMC vanilla ABC
  27. 27. OUTLINE 1 Hierarchical models 2 General case 3 Take home messages
  28. 28. ABC within Gibbs: general case A general two-parameter model: (θ1, θ2) x Algorithm: ABC within Gibbs for i = 1, . . . , N do θ (i) 2 ∼ πε2 (· | θ (i−1) 1 , s2, xobs ) θ (i) 1 ∼ πε1 (· | θ (i) 2 , s1, xobs ) end return (θ (i) 1 , θ (i) 2 )i=2,...,N
  29. 29. ABC within Gibbs: general case A general two-parameter model: (θ1, θ2) x Algorithm: ABC within Gibbs for i = 1, . . . , N do θ (i) 2 ∼ πε2 (· | θ (i−1) 1 , s2, xobs ) θ (i) 1 ∼ πε1 (· | θ (i) 2 , s1, xobs ) end return (θ (i) 1 , θ (i) 2 )i=2,...,N Theorem (Clarté et al. [2019]) Assume that there exists 0 < κ < 1/2 such that sup θ1, ˜θ1 πε2 (· | xobs , s2, θ1) − πε2 (· | xobs , s2, ˜θ1) TV = κ. The Markov chain then converges geometrically in total variation distance to a stationary distribution ν∞ ε , with geometric rate 1 − 2κ.
  30. 30. ABC within Gibbs: general case Additional assumption: θ1 and θ2 are a priori independent Theorem (Clarté et al. [2019]) Assume that κ1 = inf θ1,θ2 π(Bs1(xobs),ε1 | θ1, θ2) > 0 , κ2 = inf θ1,θ2 π(Bs2(xobs), 2 | θ1, θ2) > 0 , κ3 = sup θ1, ˜θ1,θ2 π(· | θ1, θ2) − π(· | ˜θ1, θ2) TV < 1/2 . Then the Markov chain converges in total variation distance to a stationary distribution ν∞ ε with geometric rate 1 − κ1κ2(1 − 2κ3).
  31. 31. ABC within Gibbs: general case For both situations, a limiting distribution exists when the thresholds go to 0. Theorem (Clarté et al. [2019]) Assume that L0 = sup ε2 sup θ1, ˜θ1 πε2 (· | xobs , s2, θ1) − π0(· | xobs , s2, ˜θ1) TV < 1/2 , L1(ε1) = sup θ2 πε1 (· | xobs , s1, θ2) − π0(· | xobs , s1, θ2) TV −−−−→ ε1→0 0 , L2(ε2) = sup θ1 πε2 (· | xobs , s2, θ1) − π0(· | xobs , s2, θ1) TV −−−−→ ε2→0 0 . Then ν∞ ε − ν∞ 0 TV ≤ L1(ε1) + L2(ε2) 1 − 2L0 −−−→ ε→0 0.
  32. 32. ABC within Gibbs: general case Compatibility issue: the general case inherits the compatibility issue already noticed in the hierarchical setting. Proposition (Clarté et al. [2019]) 1. If sθ1 and sθ2 are conditionally sufficient, the conditionals are compatible and , when the precision goes to zero, ABC within Gibbs and ABC have the same limiting distribution. 2. If π(θ1, θ2) = π(θ1)π(θ2) and sθ1 = sθ2 , when the precision goes to zero, ABC within Gibbs and ABC have the same limiting distribution.
  33. 33. General case: g&k example Figure: posterior densities for parameters A1, . . . , A4 1 2 3 4 −3 −2 −1 0 −3 −2 −1 0 −3 −2 −1 0 −3 −2 −1 0 0 2 4 value density Method ABC Gibbs ABC−SMC vanilla ABC
  34. 34. General case: g&k example Figure: posterior densities for α, B, g and k. B g hyperparameter k −2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2 0 2 4 6 8 value density Method ABC Gibbs ABC−SMC vanilla ABC
  35. 35. Explicit limiting distribution For model xj | µj ∼ π(xj | µj) , µj | α i.i.d. ∼ π(µj | α) , α ∼ π(α) alternative ABC based on: ˜π(α, µ | xobs ) ∝ π(α)q(µ) generate a new µ π(˜µ | α)1d(sα(µ),sα( ˜µ))<εα d˜µ × f(˜x | µ)π(xobs | µ) with q arbitrary distribution on µ
  36. 36. Explicit limiting distribution For model xj | µj ∼ π(xj | µj) , µj | α i.i.d. ∼ π(µj | α) , α ∼ π(α) induces full conditionals ˜π(α | µ) ∝ π(α) π(˜µ | α)1d(sα(µ),sα( ˜µ))<εα d˜x and ˜π(µ | α, xobs ) ∝ q(µ) π(˜µ | α)1d(sα(µ),sα( ˜µ))<εα d˜µ × f(˜x | µ)π(xobs | µ)1d(sµ(xobs),sµ( ˜x))<εµ d˜x now compatible with new artificial joint
  37. 37. Explicit limiting distribution For model xj | µj ∼ π(xj | µj) , µj | α i.i.d. ∼ π(µj | α) , α ∼ π(α) that is, prior simulations of α ∼ π(α) and of ˜µ ∼ π(˜µ | α) until d(sα(µ), sα(˜µ)) < εα simulation of µ from instrumental q(µ) and of auxiliary variables ˜µ and ˜x until both constraints satisfied
  38. 38. Explicit limiting distribution For model xj | µj ∼ π(xj | µj) , µj | α i.i.d. ∼ π(µj | α) , α ∼ π(α) Resulting Gibbs sampler stationary for posterior proportional to π(α, µ) q(sα(µ)) projection f(sµ(xobs ) | µ) projection that is, for likelihood associated with sµ(xobs ) and prior distribution proportional to π(α, µ)q(sα(µ)) [exact!]
  39. 39. OUTLINE 1 Hierarchical models 2 General case 3 Take home messages
  40. 40. Take home messages Under certain conditions to specify,
  41. 41. Take home messages We provide theoretical guarantee on the convergence of ABC within Gibbs. • Result n°1: a limiting distribution ν∞ ε exists when the sample size grows • Result n°2: a limiting distribution ν∞ 0 exists when the thresh- old goes to 0 • Result n°3: ν∞ 0 is the posterior distribution π(θ | s(xobs )). The method inherits issues from vanilla ABC, namely the choice of the statistics [plus compatibility of the condition- als]. In practice, ABC within Gibbs exhibits better performances than vanilla ABC and SMC-ABC [even when conditions not satisfied]
  42. 42. Take home messages We provide theoretical guarantee on the convergence of ABC within Gibbs. • Result n°1: a limiting distribution ν∞ ε exists when the sample size grows • Result n°2: a limiting distribution ν∞ 0 exists when the thresh- old goes to 0 • Result n°3: ν∞ 0 is the posterior distribution π(θ | s(xobs )). The method inherits issues from vanilla ABC, namely the choice of the statistics [plus compatibility of the condition- als]. In practice, ABC within Gibbs exhibits better performances than vanilla ABC and SMC-ABC [even when conditions not satisfied] Thank you!
  43. 43. ABC workshops [A]BayesComp, Gainesville, Florida, Jan 7-10 2020 ABC in Grenoble, France, March 18-19 2020 ISBA(BC), Kunming, China, June 26-30 2020 ABC in Longyearbyen, Svalbard, April 8-9 2021 [??]
  44. 44. Bibliography I M. A. Beaumont, W. Zhang, and D. J. Balding. Approximate Bayesian Computation in Population Genetics. Genetics, 162(4):2025–2035, 2002. G. Biau, F. Cérou, and A. Guyader. New insights into Approximate Bayesian Computation. Annales de l’Institut Henri Poincaré (B) Prob- abilités et Statistiques, in press, 2013. G. Clarté, C. P. Robert, R. Ryder, and J. Stoehr. Component-wise ap- proximate Bayesian computation via Gibbs-like steps. arXiv preprint arXiv:1905.13599, 2019. P. Fearnhead and D. Prangle. Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 74(3):419–474, 2012. S. Geman and D. Geman. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6):721–741, 1984.
  45. 45. Bibliography II T. J. W. Lazio, E. B. Waltman, F. D. Ghigo, R. Fiedler, R. S. Foster, and a. K. J. Johnston. A Dual-Frequency, Multiyear Monitoring Program of Compact Radio Sources. The Astrophysical Journal Supplement Se- ries, 136:265, December 2008. doi: 10.1086/322531. P. Marjoram, J. Molitor, V. Plagnol, and S. Tavaré. Markov chain Monte Carlo without likelihoods. Proceedings of the National Academy of Sci- ences, 100(26):15324–15328, 2003. D. Prangle, P. Fearnhead, M. P. Cox, P. J. Biggs, and N. P. French. Semi- automatic selection of summary statistics for ABC model choice. Statistical applications in genetics and molecular biology, 13(1):67–82, 2014. L. Raynal, J.-M. Marin, P. Pudlo, M. Ribatet, C. P. Robert, and A. Es- toup. ABC random forests for Bayesian parameter inference. Bioin- formatics, 2018. doi: 10.1093/bioinformatics/bty867. S. Tavaré, D. J. Balding, R. C. Griffiths, and P. Donnelly. Inferring Coa- lescence Times From DNA Sequence Data. Genetics, 145(2):505–518, 1997.
  46. 46. Bibliography III T. Toni, D. Welch, N. Strelkowa, A. Ipsen, and M. P. H. Stumpf. Ap- proximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. Journal of the Royal Soci- ety Interface, 6(31):187–202, 2008. R. D. Wilkinson. Approximate Bayesian computation (ABC) gives ex- act results under the assumption of model error. Statistical Applica- tions in Genetics and Molecular Biology, 12(2):129–141, 2013.

×