QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces - Marcelo Pereyra, Mar 22, 2018
1. Proximal algorithms in probability spaces
Dr. Marcelo Pereyra
http://www.macs.hw.ac.uk/∼mp71/
Maxwell Institute for Mathematical Sciences, Heriot-Watt University
March 2018, SAMSI workshop, USA.
M. Pereyra (MI — HWU) LMS 17 0 / 18
3. Imaging inverse problems
We are interested in an unknown x ∈ Rd
.
We measure y, related to x by a statistical model p(y x).
The recovery of x from y is ill-posed or ill-conditioned, resulting in
significant uncertainty about x.
For example, in many imaging problems
y = Ax + w,
for some operator A that is rank-deficient, and additive noise w.
x y ˆx
M. Pereyra (MI — HWU) LMS 17 2 / 18
4. The Bayesian framework
We use priors to reduce uncertainty and deliver accurate results.
Given the prior p(x), the posterior distribution of x given y
p(x y) = p(y x)p(x)/p(y)
models our knowledge about x after observing y.
In this talk we consider that p(x y) is log-concave; i.e.,
p(x y) = exp{−φ(x)}/∫
Rd
exp{−φ(x)}dx ,
where φ(x) is a l.s.c. convex function (often φ ∉ C1
).
M. Pereyra (MI — HWU) LMS 17 3 / 18
5. Maximum-a-posteriori (MAP) estimation
The predominant Bayesian approach in imaging is MAP estimation
ˆxMAP = argmax
x∈Rd
p(x y),
= argmin
x∈Rd
φ(x),
(1)
that can be computed efficiently by proximal convex optimisation.
Typically φ = f + g, with ∇f Lipchitz and g ∉ C1
and prox-easy.
However, knowledge of ˆxMAP provides little information about p(x y)!!!
M. Pereyra (MI — HWU) LMS 17 4 / 18
6. Uncertainty quantification in radio-interferometry
3C2888 ˆxMMSE local credible intervals
M31 ˆxMMSE local credible intervals
Figure : Radio galaxies (size 256 × 256 pixels) - computing time approx. 5 min.
M. Pereyra (MI — HWU) LMS 17 5 / 18
8. Sparse image deblurring
Estimation of reg. param. β by marginal maximum likelihood
ˆβ = argmax
β∈R+
p(y β), with p(y β) ∝ ∫ exp(− y − Ax 2
/2σ2
− β x 1)dx
y ˆxMAP Reg. param β
Figure : Maximum marginal likelihood estimation of regularisation parameter β.
Computing time 0.75 secs..
M. Pereyra (MI — HWU) LMS 17 7 / 18
9. Sparse image deblurring
Bayesian credible region C∗
α = {x p(x y) ≥ γα} with
P[x ∈ Cα y] = 1 − α, and p(x y) ∝ exp(− y − Ax 2
/2σ2
− β x 1)
y ˆxMAP uncertainty estimates
Figure : Live-cell microscopy data (Zhu et al., 2012). Uncertainty analysis
(±78nm × ±125nm) in close agreement with the experimental precision ±80nm.
Computing time 4 minutes. M = 105
iterations.
M. Pereyra (MI — HWU) LMS 17 8 / 18
10. Inference by Markov chain Monte Carlo integration
Monte Carlo integration
Given a set of samples X1,...,XM distributed according to p(x y), we
approximate posterior expectations and probabilities
1
M
M
∑
m=1
h(Xm) → E{h(x) y}, as M → ∞
Guarantees from CLTs, e.g., 1√
M
∑M
m=1 h(Xm) ∼ N[E{h(x) y},Σ].
Markov chain Monte Carlo:
Construct a Markov kernel Xm+1 Xm ∼ K(⋅ Xm) such that the Markov
chain X1,...,XM has p(x y) as stationary distribution.
MCMC simulation in high-dimensional spaces is very challenging.
M. Pereyra (MI — HWU) LMS 17 9 / 18
11. Unadjusted Langevin algorithm
Suppose for now that p(x y) ∈ C1
. Then, we can generate samples by
mimicking a Langevin diffusion process that converges to p(x y) as t → ∞,
X dXt =
1
2
∇log p (Xt y)dt + dWt, 0 ≤ t ≤ T, X(0) = x0.
where W is the n-dimensional Brownian motion.
Because solving Xt exactly is generally not possible, we use an Euler
Maruyama approximation and obtain the “unadjusted Langevin algorithm”
ULA Xm+1 = Xm + δ∇log p(Xm y) +
√
2δZm+1, Zm+1 ∼ N(0,In),
which is remarkably efficient when φ(x) = −log p(x y) is l.s.c. convex and
L-Lipchitz differentiable with δ < L−1
.
M. Pereyra (MI — HWU) LMS 17 10 / 18
12. Proximal ULA
If φ ∉ C1
, we use the (Moreau-Yoshida regularised) proximal ULA
MYULA Xm+1 = (1 − δ
λ )Xm − δ∇f {Xm} + δ
λ proxλ
g {Xm} +
√
2δZm+1,
for any suitable decomposition φ = f + g, λ > 0, and δ < λ/(λLf + 1).
Non-asymptotic estimation error bound:
For any > 0, there exist δ > 0 and M ∈ N such that ∀δ < δ and ∀M ≥ M
δx0 QM
δ − p TV < + λL2
g .
If f + g is strongly convex outside some ball, then M scales as
O(d log(d)). See Durmus et al. (2017) for other convergence results.
M. Pereyra (MI — HWU) LMS 17 11 / 18
14. Wasserstein space
P2(Rd
): set of probability measures with finite second moment.
Wasserstein distance of order 2
W2(µ,ν) = inf
π∈Π(µ,ν)
∫
Rd ×Rd
x − y 2
π(x,y)dxdy
with Π(µ,ν) is the set of measures that have µ and ν as marginals.
Kullbar-Liebler divergence
KL(µ,ν) = ∫
Rd
log [
ν(x)
µ(x)
]ν(x)dx
Note: KL(µ,ν) = 0 iff µ = ν.
M. Pereyra (MI — HWU) LMS 17 13 / 18
15. A functional F P2(Rd
) → (−∞,∞] is λ-displacement convex iff
F(µt) ≤ F(µ0) + F(µ1) − λt(1 − t)W2(µ0,µ1)
where µt is the point t ∈ [0,1] along the constant speed geodesic
between µ0 and µ1, i.e., W2(µt,µs) = (t − s)W2(µ0,µ1).
If π ∈ P2 is λ-strongly log-concave (i.e., −log π strongly convex), then
KL(µ,π) is λ-displacement convex w.r.t. µ.
M. Pereyra (MI — HWU) LMS 17 14 / 18
16. Proximal optimisation in the Wasserstein space
Key Result (R. Jordan, D. Kinderlehrer, and F. Otto. 1998)
The sequence of distributions generated by the iterative algorithm
µ(t+1)
= argmin
ν∈P2(Rd )
KL(ν,π) + W2(µ(t)
,ν)
(proximal-point method in P2 with metric W2) is equivalent to that of
dXt =
1
2
∇log p (Xt y)dt + dWt, 0 ≤ t ≤ T, X(0) ∼ µµ0,
evaluated at times t = 1,... (their results use a different notation and
involve the Fokker-Plank representation of the SDE).
MYULA is essentially a sampling approximation of this algorithm for
minimising KL(µ,πλ) where πλ is the Moreau approximation of π.
M. Pereyra (MI — HWU) LMS 17 15 / 18
17. Open problems:
What do other (e.g., accelerated, primal-dual, etc.) optimisation
algorithms look like in P2 with metric W2?
What are the MCMC methods that we can derive from them?
Do they inherit their good convergence properties?
M. Pereyra (MI — HWU) LMS 17 16 / 18
18. Thank you!
Bibliography:
Durmus, A., Moulines, E., and Pereyra, M. (2017). Efficient Bayesian computation by
proximal Markov chain Monte Carlo: when Langevin meets Moreau. SIAM J. Imaging
Sci. to appear.
Zhu, L., Zhang, W., Elnatan, D., and Huang, B. (2012). Faster STORM using
compressed sensing. Nat. Meth., 9(7):721–723.
M. Pereyra (MI — HWU) LMS 17 17 / 18