QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces - Marcelo Pereyra, Mar 22, 2018

Proximal algorithms in probability spaces
Dr. Marcelo Pereyra
http://www.macs.hw.ac.uk/∼mp71/
Maxwell Institute for Mathematical Sciences, Heriot-Watt University
March 2018, SAMSI workshop, USA.
M. Pereyra (MI — HWU) LMS 17 0 / 18

Outline
1 Background
2 Proximal algorithms in probability spaces

Imaging inverse problems
We are interested in an unknown x ∈ Rd
.
We measure y, related to x by a statistical model p(y x).
The recovery of x from y is ill-posed or ill-conditioned, resulting in
signiﬁcant uncertainty about x.
For example, in many imaging problems
y = Ax + w,
for some operator A that is rank-deﬁcient, and additive noise w.
x y ˆx

The Bayesian framework
We use priors to reduce uncertainty and deliver accurate results.
Given the prior p(x), the posterior distribution of x given y
p(x y) = p(y x)p(x)/p(y)
models our knowledge about x after observing y.
In this talk we consider that p(x y) is log-concave; i.e.,
p(x y) = exp{−φ(x)}/∫
Rd
exp{−φ(x)}dx ,
where φ(x) is a l.s.c. convex function (often φ ∉ C1
).

Maximum-a-posteriori (MAP) estimation
The predominant Bayesian approach in imaging is MAP estimation
ˆxMAP = argmax
x∈Rd
p(x y),
= argmin
x∈Rd
φ(x),
(1)
that can be computed eﬃciently by proximal convex optimisation.
Typically φ = f + g, with ∇f Lipchitz and g ∉ C1
and prox-easy.
However, knowledge of ˆxMAP provides little information about p(x y)!!!

Uncertainty quantiﬁcation in radio-interferometry
3C2888 ˆxMMSE local credible intervals
M31 ˆxMMSE local credible intervals
Figure : Radio galaxies (size 256 × 256 pixels) - computing time approx. 5 min.

Bayesian model selection
p(Mk y) = p(Mk )∫ p(x,y Mk )dx/p(y) with
p(x,y M1) ∝ exp[−( y − H1x 2
/2σ2
) − βTV (x)],
p(x,y M2) ∝ exp[−( y − H2x 2
/2σ2
) − βTV (x)].
Boat image deblurring experiment (comp. time 30 minutes p/model):
observation y
(5 × 5 uniform blur, BSNR 40dB)
ˆxM1 (PSNR 34dB)
p(M1 y) = 0.96
ˆxM2 (PSNR 33dB)
p(M2 y) = 0.04

Sparse image deblurring
Estimation of reg. param. β by marginal maximum likelihood
ˆβ = argmax
β∈R+
p(y β), with p(y β) ∝ ∫ exp(− y − Ax 2
/2σ2
− β x 1)dx
y ˆxMAP Reg. param β
Figure : Maximum marginal likelihood estimation of regularisation parameter β.
Computing time 0.75 secs..

Sparse image deblurring
Bayesian credible region C∗
α = {x p(x y) ≥ γα} with
P[x ∈ Cα y] = 1 − α, and p(x y) ∝ exp(− y − Ax 2
/2σ2
− β x 1)
y ˆxMAP uncertainty estimates
Figure : Live-cell microscopy data (Zhu et al., 2012). Uncertainty analysis
(±78nm × ±125nm) in close agreement with the experimental precision ±80nm.
Computing time 4 minutes. M = 105
iterations.

Inference by Markov chain Monte Carlo integration
Monte Carlo integration
Given a set of samples X1,...,XM distributed according to p(x y), we
approximate posterior expectations and probabilities
1
M
M
∑
m=1
h(Xm) → E{h(x) y}, as M → ∞
Guarantees from CLTs, e.g., 1√
M
∑M
m=1 h(Xm) ∼ N[E{h(x) y},Σ].
Markov chain Monte Carlo:
Construct a Markov kernel Xm+1 Xm ∼ K(⋅ Xm) such that the Markov
chain X1,...,XM has p(x y) as stationary distribution.
MCMC simulation in high-dimensional spaces is very challenging.

Unadjusted Langevin algorithm
Suppose for now that p(x y) ∈ C1
. Then, we can generate samples by
mimicking a Langevin diffusion process that converges to p(x y) as t → ∞,
X dXt =
1
2
∇log p (Xt y)dt + dWt, 0 ≤ t ≤ T, X(0) = x0.
where W is the n-dimensional Brownian motion.
Because solving Xt exactly is generally not possible, we use an Euler
Maruyama approximation and obtain the “unadjusted Langevin algorithm”
ULA Xm+1 = Xm + δ∇log p(Xm y) +
√
2δZm+1, Zm+1 ∼ N(0,In),
which is remarkably efficient when φ(x) = −log p(x y) is l.s.c. convex and
L-Lipchitz differentiable with δ < L−1
.

Proximal ULA
If φ ∉ C1
, we use the (Moreau-Yoshida regularised) proximal ULA
MYULA Xm+1 = (1 − δ
λ )Xm − δ∇f {Xm} + δ
λ proxλ
g {Xm} +
√
2δZm+1,
for any suitable decomposition φ = f + g, λ > 0, and δ < λ/(λLf + 1).
Non-asymptotic estimation error bound:
For any > 0, there exist δ > 0 and M ∈ N such that ∀δ < δ and ∀M ≥ M
δx0 QM
δ − p TV < + λL2
g .
If f + g is strongly convex outside some ball, then M scales as
O(d log(d)). See Durmus et al. (2017) for other convergence results.

Outline
1 Background
2 Proximal algorithms in probability spaces

Wasserstein space
P2(Rd
): set of probability measures with ﬁnite second moment.
Wasserstein distance of order 2
W2(µ,ν) = inf
π∈Π(µ,ν)
∫
Rd ×Rd
x − y 2
π(x,y)dxdy
with Π(µ,ν) is the set of measures that have µ and ν as marginals.
Kullbar-Liebler divergence
KL(µ,ν) = ∫
Rd
log [
ν(x)
µ(x)
]ν(x)dx
Note: KL(µ,ν) = 0 iﬀ µ = ν.

A functional F P2(Rd
) → (−∞,∞] is λ-displacement convex iﬀ
F(µt) ≤ F(µ0) + F(µ1) − λt(1 − t)W2(µ0,µ1)
where µt is the point t ∈ [0,1] along the constant speed geodesic
between µ0 and µ1, i.e., W2(µt,µs) = (t − s)W2(µ0,µ1).
If π ∈ P2 is λ-strongly log-concave (i.e., −log π strongly convex), then
KL(µ,π) is λ-displacement convex w.r.t. µ.

Proximal optimisation in the Wasserstein space
Key Result (R. Jordan, D. Kinderlehrer, and F. Otto. 1998)
The sequence of distributions generated by the iterative algorithm
µ(t+1)
= argmin
ν∈P2(Rd )
KL(ν,π) + W2(µ(t)
,ν)
(proximal-point method in P2 with metric W2) is equivalent to that of
dXt =
1
2
∇log p (Xt y)dt + dWt, 0 ≤ t ≤ T, X(0) ∼ µµ0,
evaluated at times t = 1,... (their results use a diﬀerent notation and
involve the Fokker-Plank representation of the SDE).
MYULA is essentially a sampling approximation of this algorithm for
minimising KL(µ,πλ) where πλ is the Moreau approximation of π.

Open problems:
What do other (e.g., accelerated, primal-dual, etc.) optimisation
algorithms look like in P2 with metric W2?
What are the MCMC methods that we can derive from them?
Do they inherit their good convergence properties?

Thank you!
Bibliography:
Durmus, A., Moulines, E., and Pereyra, M. (2017). Eﬃcient Bayesian computation by
proximal Markov chain Monte Carlo: when Langevin meets Moreau. SIAM J. Imaging
Sci. to appear.
Zhu, L., Zhang, W., Elnatan, D., and Huang, B. (2012). Faster STORM using
compressed sensing. Nat. Meth., 9(7):721–723.

Comparison with PxMALA

QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces - Marcelo Pereyra, Mar 22, 2018

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces - Marcelo Pereyra, Mar 22, 2018

Similaire à QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces - Marcelo Pereyra, Mar 22, 2018 (20)

Plus de The Statistical and Applied Mathematical Sciences Institute

Plus de The Statistical and Applied Mathematical Sciences Institute (20)

Dernier

Dernier (20)

QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces - Marcelo Pereyra, Mar 22, 2018