R package bayesImageS: Scalable Inference for Intractable Likelihoods

Image Analysis Intractable Likelihood Markov Chain Monte Carlo Conclusion
R package bayesImageS:
Scalable Inference for Intractable Likelihoods
Matt Moores
RSS Annual Conference
September 6, 2017
1 / 24

Outline
1 Image Analysis
R package bayesImageS
2 Intractable Likelihood
Ising/Potts model
3 Markov Chain Monte Carlo
Exchange algorithm
Approximate Bayesian computation (ABC)
Bayesian indirect likelihood (BIL)
2 / 24

Motivation
Image analysis often involves:
Large datasets, with millions of pixels
Multiple images with similar characteristics
For example: satellite remote sensing (Landsat, MODIS), medical imaging (CT scans,
MRI)
Table: Scale of common types of images
Number Landsat CT slices
of pixels (90m2/px) (512×512)
26 0.06km2
. . .
56 14.06km2
0.1
106 900.00km2
3.8
156 10251.56km2
43.5
3 / 24

Statistical Computation
Many statistical algorithms (MCMC, EM) are inherently iterative.
Strategies for improving scalability:
Compiled code (e.g. using Rcpp)
Parallel execution
Oﬄine precomputation
Streaming inference
Subsampling
Multi-level and multi-resolution methods
4 / 24
Dirk Eddelbuettel (2013) Seamless R and C++ integration with Rcpp

bayesImageS
An R package for Bayesian image segmentation using the hidden Potts model:
RcppArmadillo for fast computation in C++
OpenMP for parallelism
§
l i b r a r y ( bayesImageS )
p r i o r s ← l i s t ("k"=3,"mu"=rep (0 ,3) , "mu.sd"=sigma ,
"sigma"=sigma , "sigma.nu"=c (1 ,1 ,1) , "beta"=c (0 ,3))
mh ← l i s t ( algorithm="pseudo" , bandwidth =0.2)
r e s u l t ← mcmcPotts ( y , neigh , block ,NULL,55000 ,5000 , p r i o r s ,mh)
5 / 24
Eddelbuettel & Sanderson (2014) RcppArmadillo: Accelerating R with high-performance C++ linear
algebra. CSDA 71

Bayesian computational methods
bayesImageS supports methods for classifying the pixels:
Chequerboard Gibbs sampling (Winkler 2003)
Swendsen-Wang (1987)
and also methods for updating the smoothing parameter β:
Pseudolikelihood (Ryd´en & Titterington 1998)
Thermodynamic integration (Gelman & Meng 1998)
Exchange algorithm (Murray, Ghahramani & MacKay 2006)
Approximate Bayesian computation (Grelaud et al. 2009)
Sequential Monte Carlo (ABC-SMC) with pre-computation
(Del Moral, Doucet & Jasra 2012; Moores et al. 2015)
6 / 24

Pixel Classiﬁcation
Joint distribution of observed pixel intensities y = {yi }n
i=1
and latent labels z = {zi }n
i=1:
p(y, z|µ, σ2
, β) = p(y|µ, σ2
, z)p(z|β) (1)
Additive Gaussian noise:
yi |zi =j
iid
∼ N µj , σ2
j (2)
Potts model:
π(zi |zi , β) =
exp {β i∼ δ(zi , z )}
k
j=1 exp {β i∼ δ(j, z )}
(3)
7 / 24
Potts (1952) Proceedings of the Cambridge Philosophical Society 48(1)

Chequerboard Gibbs
A 2D or 3D regular lattice with ﬁrst-order neighbourhood ∂i:
◦ ◦ ◦ ◦ ◦
◦ ◦ • ◦ ◦
◦ • × • ◦
◦ ◦ • ◦ ◦
◦ ◦ ◦ ◦ ◦
can be partitioned into 2 blocks:
• ◦ • ◦ • ◦ • ◦ • ◦
◦ • ◦ • ◦ • ◦ • ◦ •
• ◦ • ◦ • ◦ • ◦ • ◦
◦ • ◦ • ◦ • ◦ • ◦ •
• ◦ • ◦ • ◦ • ◦ • ◦
◦ • ◦ • ◦ • ◦ • ◦ •
• ◦ • ◦ • ◦ • ◦ • ◦
◦ • ◦ • ◦ • ◦ • ◦ •
• ◦ • ◦ • ◦ • ◦ • ◦
◦ • ◦ • ◦ • ◦ • ◦ •
so that z◦ are conditionally independent, given z•
8 / 24
Roberts & Sahu (1997) JRSS B 59(2): 291–317
Winkler (2nd
ed., 2003) Image analysis, random ﬁelds and MCMC methods

Inverse Temperature
(a) β = 0.1 (b) β = 0.5 (c) β = 0.85 (d) β = 0.95
(e) β = 1.005 (f) β = 1.15 9 / 24

Doubly-intractable posterior
p(β|z) =
C−1(β)eβS(z)π(β)
β C−1(β)eβS(z)π(dβ)
(4)
The normalising constant has computational complexity O(nkn):
C(β) =
z∈Z
eβS(z)
(5)
S(z) is the suﬃcient statistic of the Potts model:
S(z) =
i∼ ∈E
δ(zi , z ) (6)
where E is the set of all unique neighbour pairs.
10 / 24

Expectation of S(z)
0 1 2 3 4
051015
β
µ
k
2
3
4
(a) n = 12 & k ∈ {2, 3, 4}
0 1 2 3 4
051015
β
µ
n
4
6
9
12
(b) k = 3 & n ∈ {4, 6, 9, 12}
Figure: Distribution of Ez|β[S(z)]
11 / 24

Standard deviation of S(z)
0 1 2 3 4
0.00.51.01.52.02.53.0
β
σ
k
2
3
4
(a) n = 12 & k ∈ {2, 3, 4}
0 1 2 3 4
0.00.51.01.52.02.53.0
β
σ
n
4
6
9
12
(b) k = 3 & n ∈ {4, 6, 9, 12}
Figure: Distribution of σz|β[S(z)]
12 / 24

Exchange Algorithm
Algorithm 1 Exchange Algorithm
1: for all iterations t = 1, . . . , T do
2: Draw proposed parameter value β ∼ q(β |βt−1)
3: Generate w|β by (perfect) sampling from Eq. (3)
4: Calculate the Metropolis-Hastings ratio:
ρ =
q(βt−1|β )π(β )C(βt−1)eβ S(z)
q(β |βt−1)π(βt−1)C(β )eβt−1S(z)
C(β )eβt−1S(w)
C(βt−1)eβ S(w)
5: Draw u ∼ Uniform[0, 1]
6: if u < min(1, ρ) then
7: βt ← β else βt ← βt−1
8: end if
9: end for
13 / 24
Murray, Ghahramani & MacKay (2006) Proc. 22nd
Conf. UAI

Approximate Bayesian Computation
Algorithm 2 ABC-MCMC
3: Generate w|β by sampling from Eq. (3)
5: if u < min 1, π(β )q(βt−1|β )
π(βt−1)q(β |βt−1) and S(w) − S(z) < then
7: end if
8: end for
14 / 24
Marjoram, Molitor, Plagnol & Tavar´e (2003) PNAS 100(26)
Grelaud, Robert, Marin, Rodolphe & Taly (2009) Bayesian Analysis 4(2)

Scalability
Computational cost is dominated by simulation of auxiliary variables (exchange
algorithm) or pseudo-data (ABC)
1e+02 1e+03 1e+04 1e+05 1e+06
0.010.050.505.0050.00
number of pixels
elapsedtime(hours)
exchange
ABC−MCMC
(a) 2D images, k = 3
1e+02 1e+03 1e+04 1e+05 1e+06
0.010.050.505.0050.00
number of pixels
elapsedtime(hours)
exchange
ABC−MCMC
(b) 3D images, k = 3
15 / 24
Moores, Pettitt & Mengersen (2015) arXiv:1503.08066 [stat.CO]

Precomputation Step
The distribution of the summary statistics f (S(w) | β) is independent of the observed
data y and the labels z
By simulating pseudo-data for values of β, we can create a binding function φ(β)
for a surrogate model fA(S(w) | φ(β))
This binding function can be reused across multiple datasets, amortising its
computational cost
By replacing S(w) with our surrogate model, we avoid the need to simulate
pseudo-data or auxiliary variables during model ﬁtting.
16 / 24
Moores, Drovandi, Mengsersen & Robert (2015) Statistics & Computing 25(1)
Drovandi, Pettitt & Lee (2015) Statistical Science 30(1)

Piecewise linear model
0.0 0.5 1.0 1.5 2.0 2.5 3.0
1000015000200002500030000
β
ES(z)
(a) ˆφµ(β)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
050100150200250300350
β
σS(z)
(b) ˆφσ(β)
Figure: Binding functions for S(w) | β with n = 56
, k = 3
17 / 24

Parametric surrogate model for Vz|β[S(z)]
ˆφσ2 (β) =
V0 + (Vmax − V0)e−φ1
√
βcrit −β : 0 ≤ β < βcrit
Vmax e−φ2
√
β−βcrit : β ≥ βcrit
(7)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
020000400006000080000100000
β
σ2
skew−Laplace
skew−Gaussian
exp(−φ β − βcrit)
18 / 24

Parametric surrogate model for Ez|β[S(z)]
The binding function for the expectation is available as an integral curve:
ˆφµ(β) =
E0 + βV0 +
β
0 (Vmax − V0)e−φ1
√
βcrit −βdβ : 0 ≤ β < βcrit
Eβcrit
+
β
βcrit
Vmax e−φ2
√
β−βcrit dβ : β ≥ βcrit
0.0 0.5 1.0 1.5 2.0 2.5 3.0
1000015000200002500030000
S(z)
19 / 24

Bayesian indirect likelihood using fA(S(w) | φ(β))
Algorithm 3 BIL
1: Generate ws|βs for sample points βs, where s = 1, . . . , S
2: Fit the binding functions ˆφσ2 (β) & ˆφµ(β)
5: Approximate the Radon-Nikod´ym derivative:
ρ =
q(βt−1|β )π(β ) fA S(z) | ˆφµ(β ), ˆφσ2 (β )
q(β |βt−1)π(βt−1) fA S(z) | ˆφµ(βt−1), ˆφσ2 (βt−1)
7: if u < min(1, ρ) then
9: end if
10: end for 20 / 24

Summary
It is feasible to use MCMC for image analysis of realistic datasets
but auxiliary variable methods don’t scale well
requires parallelized implementation in C++ or Fortran
RcppArmadillo & OpenMP are a good combination
faster algorithms are available, such as iterated conditional modes (ICM)
or variational Bayes (VB)
Scalability of Bayesian computation for intractable likelihoods can be improved by
pre-computing a surrogate model fA(S(w) | φ(β))
Pre-computation took 1.4 hours on a 16 core Xeon server
for 987 values of β with 15,625 pixels
(13.4 hours for 978,380 pixels)
Average runtime for model ﬁtting improved from 107 hours (exchange algorithm)
or 115 hours (ABC-MCMC) to only 4 hours using the parametric auxiliary model
21 / 24

Appendix
For Further Reading I
M. Moores, A. N. Pettitt & K. Mengersen
Scalable Bayesian inference for the inverse temperature of a hidden Potts model.
arXiv:1503.08066 [stat.CO], 2015.
M. Moores, C. C. Drovandi, K. Mengersen & C. P. Robert
Pre-processing for approximate Bayesian computation in image analysis.
Statistics & Computing 25(1): 23–33, 2015.
C. C. Drovandi, M. Moores & R. J. Boys
Accelerating pseudo-marginal MCMC using Gaussian processes.
To appear in Computational Statistics & Data Analysis, 2017.
M. Moores & K. Mengersen
bayesImageS: Bayesian methods for image segmentation using a Potts model.
R package version 0.4-0, 2017. https://CRAN.R-project.org/package=bayesImageS
22 / 24

Appendix
For Further Reading II
C. C. Drovandi, A. N. Pettitt & A. Lee
Bayesian indirect inference using a parametric auxiliary model.
Statist. Sci. 30(1): 72–95, 2015.
C. C. Drovandi, A. N. Pettitt & M. J. Faddy
Approximate Bayesian computation using indirect inference.
J. R. Stat. Soc. Ser. C 60(3): 317–37, 2011.
R. G. Everitt
Bayesian Parameter Estimation for Latent Markov Random Fields and Social Networks.
J. Comput. Graph. Stat., 21(4): 940–60, 2012.
A. Grelaud, C. P. Robert, J.-M. Marin, F. Rodolphe & J.-F. Taly
ABC likelihood-free methods for model choice in Gibbs random ﬁelds.
Bayesian Analysis, 4(2): 317–36, 2009.
23 / 24

Appendix
For Further Reading III
D. K. Pickard
Inference for Discrete Markov Fields: The Simplest Nontrivial Case.
J. Am. Stat. Assoc., 82(397): 90–96, 1987.
D. Feng & L. Tierney
PottsUtils: Utility Functions of the Potts Models.
R package version 0.3-2 http://CRAN.R-project.org/package=PottsUtils
I. Murray, Z. Ghahramani & D. J. C. MacKay
MCMC for Doubly-intractable Distributions.
In Proc. 22nd
Conf. UAI, AUAI Press, 359–366, 2006.
P. Marjoram, J. Molitor, V. Plagnol & S. Tavar´e
Markov chain Monte Carlo without likelihoods.
Proc. Natl Acad. Sci., 100(26): 15324–15328, 2003.
24 / 24

R package bayesImageS: Scalable Inference for Intractable Likelihoods

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (19)

Similaire à R package bayesImageS: Scalable Inference for Intractable Likelihoods

Similaire à R package bayesImageS: Scalable Inference for Intractable Likelihoods (20)

Plus de Matt Moores

Plus de Matt Moores (11)

Dernier

Dernier (20)

R package bayesImageS: Scalable Inference for Intractable Likelihoods