There are many approaches to Bayesian computation with intractable likelihoods, including the exchange algorithm and approximate Bayesian computation (ABC). A serious drawback of these algorithms is that they do not scale well for models with a large state space. Markov random fields, such as the Ising/Potts model and exponential random graph model (ERGM), are particularly challenging because the number of discrete variables increases linearly with the size of the image or graph. The likelihood of these models cannot be computed directly, due to the presence of an intractable normalising constant. In this context, it is necessary to employ algorithms that provide a suitable compromise between accuracy and computational cost.
Bayesian indirect likelihood (BIL) is a class of methods that approximate the likelihood function using a surrogate model. This model can be trained using a pre-computation step, utilising massively parallel hardware to simulate auxiliary variables. We review various types of surrogate model that can be used in BIL. In the case of the Potts model, we introduce a parametric approximation to the score function that incorporates its known properties, such as heteroskedasticity and critical temperature. We demonstrate this method on 2D satellite remote sensing and 3D computed tomography (CT) images. We achieve a hundredfold improvement in the elapsed runtime, compared to the exchange algorithm or ABC. Our algorithm has been implemented in the R package “bayesImageS,” which is available from CRAN.
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
R package bayesImageS: Scalable Inference for Intractable Likelihoods
1. Image Analysis Intractable Likelihood Markov Chain Monte Carlo Conclusion
R package bayesImageS:
Scalable Inference for Intractable Likelihoods
Matt Moores
RSS Annual Conference
September 6, 2017
1 / 24
2. Image Analysis Intractable Likelihood Markov Chain Monte Carlo Conclusion
Outline
1 Image Analysis
R package bayesImageS
2 Intractable Likelihood
Ising/Potts model
3 Markov Chain Monte Carlo
Exchange algorithm
Approximate Bayesian computation (ABC)
Bayesian indirect likelihood (BIL)
2 / 24
3. Image Analysis Intractable Likelihood Markov Chain Monte Carlo Conclusion
Motivation
Image analysis often involves:
Large datasets, with millions of pixels
Multiple images with similar characteristics
For example: satellite remote sensing (Landsat, MODIS), medical imaging (CT scans,
MRI)
Table: Scale of common types of images
Number Landsat CT slices
of pixels (90m2/px) (512×512)
26 0.06km2
. . .
56 14.06km2
0.1
106 900.00km2
3.8
156 10251.56km2
43.5
3 / 24
4. Image Analysis Intractable Likelihood Markov Chain Monte Carlo Conclusion
Statistical Computation
Many statistical algorithms (MCMC, EM) are inherently iterative.
Strategies for improving scalability:
Compiled code (e.g. using Rcpp)
Parallel execution
Offline precomputation
Streaming inference
Subsampling
Multi-level and multi-resolution methods
4 / 24
Dirk Eddelbuettel (2013) Seamless R and C++ integration with Rcpp
5. Image Analysis Intractable Likelihood Markov Chain Monte Carlo Conclusion
bayesImageS
An R package for Bayesian image segmentation using the hidden Potts model:
RcppArmadillo for fast computation in C++
OpenMP for parallelism
§
l i b r a r y ( bayesImageS )
p r i o r s ← l i s t ("k"=3,"mu"=rep (0 ,3) , "mu.sd"=sigma ,
"sigma"=sigma , "sigma.nu"=c (1 ,1 ,1) , "beta"=c (0 ,3))
mh ← l i s t ( algorithm="pseudo" , bandwidth =0.2)
r e s u l t ← mcmcPotts ( y , neigh , block ,NULL,55000 ,5000 , p r i o r s ,mh)
5 / 24
Eddelbuettel & Sanderson (2014) RcppArmadillo: Accelerating R with high-performance C++ linear
algebra. CSDA 71
6. Image Analysis Intractable Likelihood Markov Chain Monte Carlo Conclusion
Bayesian computational methods
bayesImageS supports methods for classifying the pixels:
Chequerboard Gibbs sampling (Winkler 2003)
Swendsen-Wang (1987)
and also methods for updating the smoothing parameter β:
Pseudolikelihood (Ryd´en & Titterington 1998)
Thermodynamic integration (Gelman & Meng 1998)
Exchange algorithm (Murray, Ghahramani & MacKay 2006)
Approximate Bayesian computation (Grelaud et al. 2009)
Sequential Monte Carlo (ABC-SMC) with pre-computation
(Del Moral, Doucet & Jasra 2012; Moores et al. 2015)
6 / 24
7. Image Analysis Intractable Likelihood Markov Chain Monte Carlo Conclusion
Pixel Classification
Joint distribution of observed pixel intensities y = {yi }n
i=1
and latent labels z = {zi }n
i=1:
p(y, z|µ, σ2
, β) = p(y|µ, σ2
, z)p(z|β) (1)
Additive Gaussian noise:
yi |zi =j
iid
∼ N µj , σ2
j (2)
Potts model:
π(zi |zi , β) =
exp {β i∼ δ(zi , z )}
k
j=1 exp {β i∼ δ(j, z )}
(3)
7 / 24
Potts (1952) Proceedings of the Cambridge Philosophical Society 48(1)
10. Image Analysis Intractable Likelihood Markov Chain Monte Carlo Conclusion
Doubly-intractable posterior
p(β|z) =
C−1(β)eβS(z)π(β)
β C−1(β)eβS(z)π(dβ)
(4)
The normalising constant has computational complexity O(nkn):
C(β) =
z∈Z
eβS(z)
(5)
S(z) is the sufficient statistic of the Potts model:
S(z) =
i∼ ∈E
δ(zi , z ) (6)
where E is the set of all unique neighbour pairs.
10 / 24
11. Image Analysis Intractable Likelihood Markov Chain Monte Carlo Conclusion
Expectation of S(z)
0 1 2 3 4
051015
β
µ
k
2
3
4
(a) n = 12 & k ∈ {2, 3, 4}
0 1 2 3 4
051015
β
µ
n
4
6
9
12
(b) k = 3 & n ∈ {4, 6, 9, 12}
Figure: Distribution of Ez|β[S(z)]
11 / 24
12. Image Analysis Intractable Likelihood Markov Chain Monte Carlo Conclusion
Standard deviation of S(z)
0 1 2 3 4
0.00.51.01.52.02.53.0
β
σ
k
2
3
4
(a) n = 12 & k ∈ {2, 3, 4}
0 1 2 3 4
0.00.51.01.52.02.53.0
β
σ
n
4
6
9
12
(b) k = 3 & n ∈ {4, 6, 9, 12}
Figure: Distribution of σz|β[S(z)]
12 / 24
13. Image Analysis Intractable Likelihood Markov Chain Monte Carlo Conclusion
Exchange Algorithm
Algorithm 1 Exchange Algorithm
1: for all iterations t = 1, . . . , T do
2: Draw proposed parameter value β ∼ q(β |βt−1)
3: Generate w|β by (perfect) sampling from Eq. (3)
4: Calculate the Metropolis-Hastings ratio:
ρ =
q(βt−1|β )π(β )C(βt−1)eβ S(z)
q(β |βt−1)π(βt−1)C(β )eβt−1S(z)
C(β )eβt−1S(w)
C(βt−1)eβ S(w)
5: Draw u ∼ Uniform[0, 1]
6: if u < min(1, ρ) then
7: βt ← β else βt ← βt−1
8: end if
9: end for
13 / 24
Murray, Ghahramani & MacKay (2006) Proc. 22nd
Conf. UAI
14. Image Analysis Intractable Likelihood Markov Chain Monte Carlo Conclusion
Approximate Bayesian Computation
Algorithm 2 ABC-MCMC
1: for all iterations t = 1, . . . , T do
2: Draw proposed parameter value β ∼ q(β |βt−1)
3: Generate w|β by sampling from Eq. (3)
4: Draw u ∼ Uniform[0, 1]
5: if u < min 1, π(β )q(βt−1|β )
π(βt−1)q(β |βt−1) and S(w) − S(z) < then
6: βt ← β else βt ← βt−1
7: end if
8: end for
14 / 24
Marjoram, Molitor, Plagnol & Tavar´e (2003) PNAS 100(26)
Grelaud, Robert, Marin, Rodolphe & Taly (2009) Bayesian Analysis 4(2)
15. Image Analysis Intractable Likelihood Markov Chain Monte Carlo Conclusion
Scalability
Computational cost is dominated by simulation of auxiliary variables (exchange
algorithm) or pseudo-data (ABC)
1e+02 1e+03 1e+04 1e+05 1e+06
0.010.050.505.0050.00
number of pixels
elapsedtime(hours)
exchange
ABC−MCMC
(a) 2D images, k = 3
1e+02 1e+03 1e+04 1e+05 1e+06
0.010.050.505.0050.00
number of pixels
elapsedtime(hours)
exchange
ABC−MCMC
(b) 3D images, k = 3
15 / 24
Moores, Pettitt & Mengersen (2015) arXiv:1503.08066 [stat.CO]
16. Image Analysis Intractable Likelihood Markov Chain Monte Carlo Conclusion
Precomputation Step
The distribution of the summary statistics f (S(w) | β) is independent of the observed
data y and the labels z
By simulating pseudo-data for values of β, we can create a binding function φ(β)
for a surrogate model fA(S(w) | φ(β))
This binding function can be reused across multiple datasets, amortising its
computational cost
By replacing S(w) with our surrogate model, we avoid the need to simulate
pseudo-data or auxiliary variables during model fitting.
16 / 24
Moores, Drovandi, Mengsersen & Robert (2015) Statistics & Computing 25(1)
Drovandi, Pettitt & Lee (2015) Statistical Science 30(1)
17. Image Analysis Intractable Likelihood Markov Chain Monte Carlo Conclusion
Piecewise linear model
0.0 0.5 1.0 1.5 2.0 2.5 3.0
1000015000200002500030000
β
ES(z)
(a) ˆφµ(β)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
050100150200250300350
β
σS(z)
(b) ˆφσ(β)
Figure: Binding functions for S(w) | β with n = 56
, k = 3
17 / 24
19. Image Analysis Intractable Likelihood Markov Chain Monte Carlo Conclusion
Parametric surrogate model for Ez|β[S(z)]
The binding function for the expectation is available as an integral curve:
ˆφµ(β) =
E0 + βV0 +
β
0 (Vmax − V0)e−φ1
√
βcrit −βdβ : 0 ≤ β < βcrit
Eβcrit
+
β
βcrit
Vmax e−φ2
√
β−βcrit dβ : β ≥ βcrit
0.0 0.5 1.0 1.5 2.0 2.5 3.0
1000015000200002500030000
S(z)
19 / 24
20. Image Analysis Intractable Likelihood Markov Chain Monte Carlo Conclusion
Bayesian indirect likelihood using fA(S(w) | φ(β))
Algorithm 3 BIL
1: Generate ws|βs for sample points βs, where s = 1, . . . , S
2: Fit the binding functions ˆφσ2 (β) & ˆφµ(β)
3: for all iterations t = 1, . . . , T do
4: Draw proposed parameter value β ∼ q(β |βt−1)
5: Approximate the Radon-Nikod´ym derivative:
ρ =
q(βt−1|β )π(β ) fA S(z) | ˆφµ(β ), ˆφσ2 (β )
q(β |βt−1)π(βt−1) fA S(z) | ˆφµ(βt−1), ˆφσ2 (βt−1)
6: Draw u ∼ Uniform[0, 1]
7: if u < min(1, ρ) then
8: βt ← β else βt ← βt−1
9: end if
10: end for 20 / 24
21. Image Analysis Intractable Likelihood Markov Chain Monte Carlo Conclusion
Summary
It is feasible to use MCMC for image analysis of realistic datasets
but auxiliary variable methods don’t scale well
requires parallelized implementation in C++ or Fortran
RcppArmadillo & OpenMP are a good combination
faster algorithms are available, such as iterated conditional modes (ICM)
or variational Bayes (VB)
Scalability of Bayesian computation for intractable likelihoods can be improved by
pre-computing a surrogate model fA(S(w) | φ(β))
Pre-computation took 1.4 hours on a 16 core Xeon server
for 987 values of β with 15,625 pixels
(13.4 hours for 978,380 pixels)
Average runtime for model fitting improved from 107 hours (exchange algorithm)
or 115 hours (ABC-MCMC) to only 4 hours using the parametric auxiliary model
21 / 24
22. Appendix
For Further Reading I
M. Moores, A. N. Pettitt & K. Mengersen
Scalable Bayesian inference for the inverse temperature of a hidden Potts model.
arXiv:1503.08066 [stat.CO], 2015.
M. Moores, C. C. Drovandi, K. Mengersen & C. P. Robert
Pre-processing for approximate Bayesian computation in image analysis.
Statistics & Computing 25(1): 23–33, 2015.
C. C. Drovandi, M. Moores & R. J. Boys
Accelerating pseudo-marginal MCMC using Gaussian processes.
To appear in Computational Statistics & Data Analysis, 2017.
M. Moores & K. Mengersen
bayesImageS: Bayesian methods for image segmentation using a Potts model.
R package version 0.4-0, 2017. https://CRAN.R-project.org/package=bayesImageS
22 / 24
23. Appendix
For Further Reading II
C. C. Drovandi, A. N. Pettitt & A. Lee
Bayesian indirect inference using a parametric auxiliary model.
Statist. Sci. 30(1): 72–95, 2015.
C. C. Drovandi, A. N. Pettitt & M. J. Faddy
Approximate Bayesian computation using indirect inference.
J. R. Stat. Soc. Ser. C 60(3): 317–37, 2011.
R. G. Everitt
Bayesian Parameter Estimation for Latent Markov Random Fields and Social Networks.
J. Comput. Graph. Stat., 21(4): 940–60, 2012.
A. Grelaud, C. P. Robert, J.-M. Marin, F. Rodolphe & J.-F. Taly
ABC likelihood-free methods for model choice in Gibbs random fields.
Bayesian Analysis, 4(2): 317–36, 2009.
23 / 24
24. Appendix
For Further Reading III
D. K. Pickard
Inference for Discrete Markov Fields: The Simplest Nontrivial Case.
J. Am. Stat. Assoc., 82(397): 90–96, 1987.
D. Feng & L. Tierney
PottsUtils: Utility Functions of the Potts Models.
R package version 0.3-2 http://CRAN.R-project.org/package=PottsUtils
I. Murray, Z. Ghahramani & D. J. C. MacKay
MCMC for Doubly-intractable Distributions.
In Proc. 22nd
Conf. UAI, AUAI Press, 359–366, 2006.
P. Marjoram, J. Molitor, V. Plagnol & S. Tavar´e
Markov chain Monte Carlo without likelihoods.
Proc. Natl Acad. Sci., 100(26): 15324–15328, 2003.
24 / 24