The document describes the anisotropic Metropolis adjusted Langevin algorithm (AMALA) for simulation of random variables in high dimensions. AMALA addresses limitations of the isotropic Metropolis adjusted Langevin algorithm (MALA) by using an anisotropic covariance matrix based on the local gradient of the target distribution. The algorithm is shown to have geometric ergodicity under certain conditions on the target distribution. AMALA is then proposed for use within a stochastic algorithm for maximum likelihood estimation of incomplete data models.
Anisotropic Metropolis Adjusted Langevin Algorithm: convergence and utility in Stochastic EM algorithm
1. Anisotropic Metropolis adjusted Langevin algorithm:
Convergence and utility in stochastic EM algorithm.
´ `
Stephanie Allassonniere
´
CMAP, Ecole Polytechnique
BigMC, January 2012
Join work with Estelle Kuhn (INRA, France)
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 1 / 42
2. Introduction
Introduction:
Where does the problem came from?
Image analysis: Compare two observations via the quantification of
the deformation from one to the other (D’Arcy Thompson, 1917)
Each element of a population is a smooth deformation of a template
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 2 / 42
3. Introduction
Introduction:
Where does the problem came from?
Image analysis: Compare two observations via the quantification of
the deformation from one to the other (D’Arcy Thompson, 1917)
Registration
Each element of a population is a smooth deformation of a template
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 2 / 42
4. Introduction
Introduction:
Where does the problem came from?
Image analysis: Compare two observations via the quantification of
the deformation from one to the other (D’Arcy Thompson, 1917)
Registration
Each element of a population is a smooth deformation of a template
Template estimation
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 2 / 42
5. Introduction
Introduction:
Where does the problem came from?
Image analysis: Compare two observations via the quantification of
the deformation from one to the other (D’Arcy Thompson, 1917)
Registration
Each element of a population is a smooth deformation of a template
Template estimation / Mean
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 2 / 42
6. Introduction
Introduction:
Where does the problem came from?
Image analysis: Compare two observations via the quantification of
the deformation from one to the other (D’Arcy Thompson, 1917)
Registration / Variance
Each element of a population is a smooth deformation of a template
Template estimation / Mean
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 2 / 42
8. Introduction
Introduction:
Where does the problem came from?
Deformable template model: (u = voxel, vu its position)
y (u) = I0 (vu − m(vu )) + σ (u) ,
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 3 / 42
9. Introduction
Introduction:
Where does the problem came from?
Deformable template model: (u = voxel, vu its position)
y (u) = I0 (vu − m(vu )) + σ (u) ,
Template I0 and geometry Law (m) estimation
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 3 / 42
10. Introduction
Introduction:
Where does the problem came from?
Deformable template model: (u = voxel, vu its position)
y (u) = I0 (vu − m(vu )) + σ (u) ,
Template I0 and geometry Law (m) estimation
High dimensional setting, Low sample size
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 3 / 42
11. Introduction
Introduction:
Where does the problem came from?
Deformable template model: (u = voxel, vu its position)
y (u) = I0 (vu − m(vu )) + σ (u) ,
Template I0 and geometry Law (m) estimation
High dimensional setting, Low sample size
Considering the LDDMM framework through the shooting equations
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 3 / 42
12. Introduction
Outline:
1. AMALA: simulation of random variables in high dimension
Anisotropic MALA description
Convergence property
2. AMALA within stochastic algorithm for parameter estimation
Maximum likelihood estimation for incomplete data
setting
AMALA-SAEM
Convergence properties
3. Experiments
BME-Template model: small deformation setting
BME-Template model: LDDMM setting
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 4 / 42
13. Introduction
Outline:
1. AMALA: simulation of random variables in high dimension
Anisotropic MALA description
Convergence property
2. AMALA within stochastic algorithm for parameter estimation
Maximum likelihood estimation for incomplete data
setting
AMALA-SAEM
Convergence properties
3. Experiments
BME-Template model: small deformation setting
BME-Template model: LDDMM setting
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 5 / 42
14. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)
Introduction:
General setting:
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 6 / 42
15. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)
Introduction:
General setting:
Simulation of random variable in high dimension settings: → Gibbs
Sampler not useful
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 6 / 42
16. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)
Introduction:
General setting:
Simulation of random variable in high dimension settings: → Gibbs
Sampler not useful
Metropolis Adjusted Langevin Algorithm (MALA)
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 6 / 42
17. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)
Introduction:
General setting:
Simulation of random variable in high dimension settings: → Gibbs
Sampler not useful
Metropolis Adjusted Langevin Algorithm (MALA)
Target distribution: π
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 6 / 42
18. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)
Introduction:
General setting:
Simulation of random variable in high dimension settings: → Gibbs
Sampler not useful
Metropolis Adjusted Langevin Algorithm (MALA)
Target distribution: π
At iteration k of this algorithm, Xk the current value
Simulate Xc w.r.t. N (Xk + δD(Xk ), δIdd )
where D(x) = max(b,| blog π(x)|) log π(x).
Update Xk+1 = Xc with probability
α(Xk , Xc ) = min 1, qMALA (Xk ,Xc(Xc ,Xk ) and Xk+1 = Xk otherwise.
π(Xc )qMALA
)π(Xk )
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 6 / 42
19. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)
Introduction:
General setting:
Simulation of random variable in high dimension settings: → Gibbs
Sampler not useful
Metropolis Adjusted Langevin Algorithm (MALA)
Target distribution: π
At iteration k of this algorithm, Xk the current value
Simulate Xc w.r.t. N (Xk + δD(Xk ), δIdd )
where D(x) = max(b,| blog π(x)|) log π(x).
Update Xk+1 = Xc with probability
α(Xk , Xc ) = min 1, qMALA (Xk ,Xc(Xc ,Xk ) and Xk+1 = Xk otherwise.
π(Xc )qMALA
)π(Xk )
Problem: isotropic covariance matrix = numerically trapped
(α(Xk , Xc ) = 0)
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 6 / 42
20. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)
Introduction:
General setting:
Simulation of random variable in high dimension settings: → Gibbs
Sampler not useful
Metropolis Adjusted Langevin Algorithm (MALA)
Target distribution: π
At iteration k of this algorithm, Xk the current value
Simulate Xc w.r.t. N (Xk + δD(Xk ), δIdd )
where D(x) = max(b,| blog π(x)|) log π(x).
Update Xk+1 = Xc with probability
α(Xk , Xc ) = min 1, qMALA (Xk ,Xc(Xc ,Xk ) and Xk+1 = Xk otherwise.
π(Xc )qMALA
)π(Xk )
Problem: isotropic covariance matrix = numerically trapped
(α(Xk , Xc ) = 0)
→ Anisotropic Metropolis Adjusted Langevin Algorithm (AMALA)
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 6 / 42
21. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Description of the algorithm
How including anisotropy?
Following the magnitude of the gradient
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 7 / 42
22. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Description of the algorithm
How including anisotropy?
Following the magnitude of the gradient
First approximation: independence of directions
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 7 / 42
23. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Description of the algorithm
How including anisotropy?
Following the magnitude of the gradient
First approximation: independence of directions
Bounded covariance (same as bounded drift)
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 7 / 42
24. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Description of the algorithm
Anisotropic Metropolis Adjusted Langevin Algorithm (AMALA)
For all k = 1 : kend Iterates of Markov chain
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 8 / 42
25. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Description of the algorithm
Anisotropic Metropolis Adjusted Langevin Algorithm (AMALA)
For all k = 1 : kend Iterates of Markov chain
Sample Xc with respect to
N (Xk + δD(Xk ), δΣ(Xk ))
b
with D(Xk ) = max(b,| log π(Xk )|) log π(Xk ) and
Σ(Xk ) = Idd + diag ([ log π(Xk )]2 ∧ b), ... , ([ log π(Xk )]2 ∧ b)
1 d
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 8 / 42
26. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Description of the algorithm
Anisotropic Metropolis Adjusted Langevin Algorithm (AMALA)
For all k = 1 : kend Iterates of Markov chain
Sample Xc with respect to
N (Xk + δD(Xk ), δΣ(Xk ))
b
with D(Xk ) = max(b,| log π(Xk )|) log π(Xk ) and
Σ(Xk ) = Idd + diag ([ log π(Xk )]2 ∧ b), ... , ([ log π(Xk )]2 ∧ b)
1 d
Compute the acceptance ratio
π(Xc )qc (Xc , Xk )
α(Xk , Xc ) = min 1,
qc (Xk , Xc )π(Xk )
(qc = the pdf of this distribution).
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 8 / 42
27. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Description of the algorithm
Anisotropic Metropolis Adjusted Langevin Algorithm (AMALA)
For all k = 1 : kend Iterates of Markov chain
Sample Xc with respect to
N (Xk + δD(Xk ), δΣ(Xk ))
b
with D(Xk ) = max(b,| log π(Xk )|) log π(Xk ) and
Σ(Xk ) = Idd + diag ([ log π(Xk )]2 ∧ b), ... , ([ log π(Xk )]2 ∧ b)
1 d
Compute the acceptance ratio
π(Xc )qc (Xc , Xk )
α(Xk , Xc ) = min 1,
qc (Xk , Xc )π(Xk )
(qc = the pdf of this distribution).
Sample Xk+1 = Xc with probability α(Xk , Xc ) and Xk+1 = Xk with
probability 1 − α(Xk , Xc ) = Acceptation/reject
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 8 / 42
28. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chain
Geometric ergodicity of the Markov chain
Condition:
π super-exponential: Smoothness condition on the target distribution
(B1) The density π is positive with continuous first derivative such that:
lim n(x). log π(x) = −∞ (1)
|x|→∞
and
lim sup n(x).m(x) < 0 (2)
|x|→∞
x
where is the gradient operator in Rd , n(x) = |x| is the unit vector
π(x)
pointing in the direction of x and m(x) = is the unit vector in
| π(x)|
the direction of the gradient of the stationary distribution at point x.
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 9 / 42
29. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chain
Geometric ergodicity of the Markov chain
Result:
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 10 / 42
30. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chain
Geometric ergodicity of the Markov chain
Result:
Existence of a small set
Π(x, A) ≥ εν(A)1C (x), ∀x ∈ X and ∀A ∈ B
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 10 / 42
31. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chain
Geometric ergodicity of the Markov chain
Result:
Existence of a small set
Π(x, A) ≥ εν(A)1C (x), ∀x ∈ X and ∀A ∈ B
Drift condition: pulls the chain back into the small set
ΠV (x) ≤ λV (x) + b1C (x) .
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 10 / 42
32. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chain
Geometric ergodicity of the Markov chain
Result:
Existence of a small set
Π(x, A) ≥ εν(A)1C (x), ∀x ∈ X and ∀A ∈ B
Drift condition: pulls the chain back into the small set
ΠV (x) ≤ λV (x) + b1C (x) .
Geometric ergodicity
|Πn V (x) − π(x)|
sup ≤ Rρn . (3)
x∈X V (x)
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 10 / 42
33. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chain
Experiments on synthetic data
Target: 10 dimensional Gaussian distribution with zero mean and
diagonal covariance matrix with diagonal coefficients randomly picked
between 1 and 2500
Comparison of AMALA and symmetric random walk
500, 000 iterations for each algorithm starting at zero
Mean squared jump distance (MSJD) in stationarity:
AMALA 0.1504 - random walk 0.0407.
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 11 / 42
34. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chain
Experiments on synthetic data
Figure: Autocorrelation functions of the AMALA (red) and the random walk
(blue) samplers for four of the ten components of the Gaussian 10 dimensional
distribution.
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 12 / 42
35. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chain
Why not using exising MALA-like algorithms?
Optimised MALA-like algorithms are usually adaptive
Good performances in practice
Good theoretical properties
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 13 / 42
36. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chain
Why not using exising MALA-like algorithms?
Optimised MALA-like algorithms are usually adaptive
Good performances in practice
Good theoretical properties
However
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 13 / 42
37. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chain
Why not using exising MALA-like algorithms?
Optimised MALA-like algorithms are usually adaptive
Good performances in practice
Good theoretical properties
However
Numerical problem at the first iterations (not yet stationary):
convergence time?
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 13 / 42
38. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chain
Why not using exising MALA-like algorithms?
Optimised MALA-like algorithms are usually adaptive
Good performances in practice
Good theoretical properties
However
Numerical problem at the first iterations (not yet stationary):
convergence time?
Most important: Our goal = parameter estimation
AMALA = one tool inside another algorithm
Adaptive + estimation algorithm = numerical issues: too many
degree of freedom
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 13 / 42
39. Applying AMALA within SAEM
Outline:
1. AMALA: simulation of random variables in high dimension
Anisotropic MALA description
Convergence property
2. AMALA within stochastic algorithm for parameter estimation
Maximum likelihood estimation for incomplete data
setting
AMALA-SAEM
Convergence properties
3. Experiments
BME-Template model: small deformation setting
BME-Template model: LDDMM setting
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 14 / 42
40. Applying AMALA within SAEM Maximum likelihood estimation for incomplete data setting
Maximum likelihood estimation for incomplete data setting
y ∈ Rn : observed data
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 15 / 42
41. Applying AMALA within SAEM Maximum likelihood estimation for incomplete data setting
Maximum likelihood estimation for incomplete data setting
y ∈ Rn : observed data
z ∈ Rl : missing data
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 15 / 42
42. Applying AMALA within SAEM Maximum likelihood estimation for incomplete data setting
Maximum likelihood estimation for incomplete data setting
y ∈ Rn : observed data
z ∈ Rl : missing data
(y , z) ∈ Rn+l : complete data
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 15 / 42
43. Applying AMALA within SAEM Maximum likelihood estimation for incomplete data setting
Maximum likelihood estimation for incomplete data setting
y ∈ Rn : observed data
z ∈ Rl : missing data
(y , z) ∈ Rn+l : complete data
P = {f (y , z; θ), θ ∈ Θ}: family of parametric pdfs on Rn+l
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 15 / 42
44. Applying AMALA within SAEM Maximum likelihood estimation for incomplete data setting
Maximum likelihood estimation for incomplete data setting
y ∈ Rn : observed data
z ∈ Rl : missing data
(y , z) ∈ Rn+l : complete data
P = {f (y , z; θ), θ ∈ Θ}: family of parametric pdfs on Rn+l
Assumption:
∃θ ∈ Θ s.t. the complete data likelihood q(y , z; θ) = f (y , z; θ)
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 15 / 42
45. Applying AMALA within SAEM Maximum likelihood estimation for incomplete data setting
Maximum likelihood estimation for incomplete data setting
y ∈ Rn : observed data
z ∈ Rl : missing data
(y , z) ∈ Rn+l : complete data
P = {f (y , z; θ), θ ∈ Θ}: family of parametric pdfs on Rn+l
Assumption:
∃θ ∈ Θ s.t. the complete data likelihood q(y , z; θ) = f (y , z; θ)
Observed likelihood:
g (y ; θ) = f (y , z; θ)µ(dz). (4)
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 15 / 42
46. Applying AMALA within SAEM Maximum likelihood estimation for incomplete data setting
Maximum likelihood estimation for incomplete data setting
y ∈ Rn : observed data
z ∈ Rl : missing data
(y , z) ∈ Rn+l : complete data
P = {f (y , z; θ), θ ∈ Θ}: family of parametric pdfs on Rn+l
Assumption:
∃θ ∈ Θ s.t. the complete data likelihood q(y , z; θ) = f (y , z; θ)
Observed likelihood:
g (y ; θ) = f (y , z; θ)µ(dz). (4)
n
Given a sample of observations (yi )1≤i≤n = y1
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 15 / 42
47. Applying AMALA within SAEM Maximum likelihood estimation for incomplete data setting
Maximum likelihood estimation for incomplete data setting
y ∈ Rn : observed data
z ∈ Rl : missing data
(y , z) ∈ Rn+l : complete data
P = {f (y , z; θ), θ ∈ Θ}: family of parametric pdfs on Rn+l
Assumption:
∃θ ∈ Θ s.t. the complete data likelihood q(y , z; θ) = f (y , z; θ)
Observed likelihood:
g (y ; θ) = f (y , z; θ)µ(dz). (4)
Given a sample of observations (yi )1≤i≤n = y1n
ˆ
Find: θg in Θ s.t.
ˆ n
θg = arg max g (y1 ; θ)
θ∈Θ
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 15 / 42
48. Applying AMALA within SAEM Description of the algorithm
AMALA-SAEM
Incomplete data setting + maximum likelihood estimation = EM
algorithm
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 16 / 42
49. Applying AMALA within SAEM Description of the algorithm
AMALA-SAEM
Incomplete data setting + maximum likelihood estimation = EM
algorithm
General case −→ E step not tractable
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 16 / 42
50. Applying AMALA within SAEM Description of the algorithm
AMALA-SAEM
Incomplete data setting + maximum likelihood estimation = EM
algorithm
General case −→ E step not tractable
Stochastic Approximation EM for convergence properties
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 16 / 42
51. Applying AMALA within SAEM Description of the algorithm
AMALA-SAEM
Incomplete data setting + maximum likelihood estimation = EM
algorithm
General case −→ E step not tractable
Stochastic Approximation EM for convergence properties
with MCMC method for simulation step.
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 16 / 42
52. Applying AMALA within SAEM Description of the algorithm
AMALA-SAEM
Incomplete data setting + maximum likelihood estimation = EM
algorithm
General case −→ E step not tractable
Stochastic Approximation EM for convergence properties
with MCMC method for simulation step.
→ AMALA-SAEM: using AMALA as the MCMC method
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 16 / 42
53. Applying AMALA within SAEM Description of the algorithm
Description of the algorithm
Assumption: model in the exponential family = all information carried by
sufficient statistics S
For k = 1 : kend Iteration of SAEM
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 17 / 42
54. Applying AMALA within SAEM Description of the algorithm
Description of the algorithm
Assumption: model in the exponential family = all information carried by
sufficient statistics S
For k = 1 : kend Iteration of SAEM
Sample zk through a single AMALA step (simulation and
acceptation/reject) using current parameter θk−1
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 17 / 42
55. Applying AMALA within SAEM Description of the algorithm
Description of the algorithm
Assumption: model in the exponential family = all information carried by
sufficient statistics S
For k = 1 : kend Iteration of SAEM
Sample zk through a single AMALA step (simulation and
acceptation/reject) using current parameter θk−1
Compute the stochastic approximation
sk = sk−1 + γk (S(zk ) − sk−1 ) ,
where (γk )k is a sequence of positive step sizes.
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 17 / 42
56. Applying AMALA within SAEM Description of the algorithm
Description of the algorithm
Assumption: model in the exponential family = all information carried by
sufficient statistics S
For k = 1 : kend Iteration of SAEM
Sample zk through a single AMALA step (simulation and
acceptation/reject) using current parameter θk−1
Compute the stochastic approximation
sk = sk−1 + γk (S(zk ) − sk−1 ) ,
where (γk )k is a sequence of positive step sizes.
Update the parameter
ˆ
θk = θ(sk ).
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 17 / 42
57. Applying AMALA within SAEM Description of the algorithm
Description of the algorithm
Assumption: model in the exponential family = all information carried by
sufficient statistics S
For k = 1 : kend Iteration of SAEM
Sample zk through a single AMALA step (simulation and
acceptation/reject) using current parameter θk−1
Compute the stochastic approximation
sk = sk−1 + γk (S(zk ) − sk−1 ) ,
where (γk )k is a sequence of positive step sizes.
Update the parameter
ˆ
θk = θ(sk ).
Can require truncation on random boundaries for convergence purposes
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 17 / 42
58. Applying AMALA within SAEM Convergence properties
Convergence properties
Conditions:
Smoothness of the model (classic conditions for convergence of
stochastic approximation and EM)
Condition for AMALA geometric ergodicity (B1)
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 18 / 42
59. Applying AMALA within SAEM Convergence properties
Convergence properties
Conditions:
Smoothness of the model (classic conditions for convergence of
stochastic approximation and EM)
Condition for AMALA geometric ergodicity (B1)
Results:
Convergence of (sk ) a.s. towards critical point of mean field of the
problem
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 18 / 42
60. Applying AMALA within SAEM Convergence properties
Convergence properties
Conditions:
Smoothness of the model (classic conditions for convergence of
stochastic approximation and EM)
Condition for AMALA geometric ergodicity (B1)
Results:
Convergence of (sk ) a.s. towards critical point of mean field of the
problem
Convergence of estimated parameters (θk ) a.s. towards critical point
of observed likelihood
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 18 / 42
61. Applying AMALA within SAEM Convergence properties
Convergence properties
Conditions:
Smoothness of the model (classic conditions for convergence of
stochastic approximation and EM)
Condition for AMALA geometric ergodicity (B1)
Results:
Convergence of (sk ) a.s. towards critical point of mean field of the
problem
Convergence of estimated parameters (θk ) a.s. towards critical point
of observed likelihood
√
Central limit theorem for (θk ) with rate 1/ γk
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 18 / 42
62. Applying AMALA within SAEM Convergence properties
Conditions for the SA to converge
Define for any V : X → [1, ∞] and any g : X → Rm the norm
g (z)
g V = sup .
z∈X V (z)
(A1’) S is an open subset of Rm , h : S → Rm is continuous and there exists
a continuously differentiable function w : S → [0, ∞[ with the
following properties.
(i) There exists an M0 > 0 such that
L {s ∈ S, w (s), h(s) = 0} ⊂ {s ∈ S, w (s) < M0 } .
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 19 / 42
63. Applying AMALA within SAEM Convergence properties
Conditions for the SA to converge (2)
(ii) There exists a closed convex set Sa ⊂ S for which
s → s + ρHs (z) ∈ Sa for any ρ ∈ [0, 1] and (z, s) ∈ X × Sa (Sa is
absorbing) and such that for any M1 ∈]M0 , ∞], the set WM1 ∩ Sa is a
compact set of S where WM1 {s ∈ S, w (s) ≤ M1 }.
(iii) For any s ∈ SL w (s), h(s) < 0.
(iv) The closure of w (L) has an empty interior.
(A2’) For any s ∈ S, Hs : X → S is measurable and Hs (z) πs (dz) < ∞.
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 20 / 42
64. Applying AMALA within SAEM Convergence properties
Conditions for the SA to converge (3)
(A3”) There exist a function V : X → [1, ∞] such that
{z ∈ X , V (z) < ∞} = ∅, constants a ∈]0, 1], p ≥ 2 , r > 0 and
q ≥ 1 such that for any compact subset K ⊂ S,
(i)
sup Hs V < ∞, (5)
s∈K
sup ( gs V + Πs gs V) < ∞, (6)
s∈K
−a
sup s −s { gs − gs Vq + Πs gs − Πs gs Vq} < ∞, (7)
s,s ∈K
where for anys ∈ S a solution of the Poisson equation
g − Πs g = Hs − πs (Hs ) is denoted by gs .
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 21 / 42
65. Applying AMALA within SAEM Convergence properties
Conditions for the SA to converge (4)
(ii) For any sequence ε = (εk )k≥0 satisfying εk < ¯ for an ¯ sufficiently
small, for any sequence γ = (γk )k≥0 , there exist a constant C such
that and for any z ∈ X ,
sup sup Eγ V p (zk )1σ(K)∧ν(ε)≥k ≤ C V p+r (z) ,
z,s (8)
s∈K k≥0
where ν(ε) = inf{k ≥ 1, sk − sk−1 ≥ εk } and
σ(K) = inf{k ≥ 1, sk ∈ K} and the expectation is related to the
/
non-homogeneous Markov chain ((zk , sk ))k≥0 using the step-size
sequence γ = (γk )k≥0 .
(A4) The sequences γ = (γk )k≥0 and ε = (εk )k≥0 are non-increasing,
∞
positive and satisfy: γk = ∞, lim εk = 0 and
k=0 k→∞
∞
{γk + γk εa + (γk ε−1 )p } < ∞, where a and p are defined in (A3”).
2
k k
k=1
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 22 / 42
66. Applying AMALA within SAEM Convergence properties
Condition for AMALA-SAEM to converge
(M1) The parameter space Θ is an open subset of Rp . The complete
data likelihood function is given by:
f (y , z; θ) = exp {−ψ(θ) + S(z), φ(θ) } ,
where S is a Borel function on Rl taking its values in an open subset
S of Rm . Moreover, the convex hull of S(Rl ) is included in S, and,
for all θ in Θ,
||S(z)||pθ (z)µ(dz) < ∞.
(M2) The functions ψ and φ are twice continuously differentiable on
Θ.
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 23 / 42
67. Applying AMALA within SAEM Convergence properties
Condition for AMALA-SAEM to converge (2)
(M3) The function ¯ : Θ → S defined as
s
¯(θ)
s S(z)pθ (z)µ(dz)
is continuously differentiable on Θ.
(M4) The function l : Θ → R defined as the observed-data
log-likelihood
l(θ) log g (y ; θ) = log f (y , z; θ)µ(dz)
is continuously differentiable on Θ and
∂θ f (y , z; θ)µ(dz) = ∂θ f (y , z; θ)µ(dz).
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 24 / 42
68. Applying AMALA within SAEM Convergence properties
Condition for AMALA-SAEM to converge (3)
ˆ
(M5) There exists a function θ : S → Θ, such that:
ˆ
∀s ∈ S, ∀θ ∈ Θ, L(s; θ(s)) ≥ L(s; θ).
ˆ
Moreover, the function θ is continuously differentiable on S.
ˆ
(M6) The functions l : Θ → R and θ : S → Θ are m times
differentiable.
(M7)
(i) There exists an M0 > 0 such that
ˆ ˆ
s ∈ S, ∂s l(θ(s)) = 0 ⊂ {s ∈ S, −l(θ(s)) < M0 } .
¯ ˆ
(ii) For all M1 > M0 , the set Conv (S(Rl )) ∩ {s ∈ S, −l(θ(s)) ≤ M1 } is a
compact set of S.
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 25 / 42
69. Applying AMALA within SAEM Convergence properties
Condition for AMALA-SAEM to converge (4)
(M8) There exists a polynomial function P of degree 2 such that for
all z ∈ X
||S(z)|| ≤ |P(z)| .
(B3) For any compact subset K of S, there exists a polynomial
function Q of the hidden variable such that
sup | z log pθ(s) (z)| ≤ |Q(z)|
ˆ
s∈K
.
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 26 / 42
70. Application on Bayesian Mixed effect template estimation
Outline:
1. AMALA: simulation of random variables in high dimension
Anisotropic MALA description
Convergence property
2. AMALA within stochastic algorithm for parameter estimation
Maximum likelihood estimation for incomplete data
setting
AMALA-SAEM
Convergence properties
3. Experiments
BME-Template model: small deformation setting
BME-Template model: LDDMM setting
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 27 / 42
71. Application on Bayesian Mixed effect template estimation Description of the BME Template model
BME Template model with small deformations
Deformable template model: (u = voxel, vu its position)
y (u) = I0 (vu − m(vu )) + σ (u) ,
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 28 / 42
72. Application on Bayesian Mixed effect template estimation Description of the BME Template model
BME Template model with small deformations
Deformable template model: (u = voxel, vu its position)
y (u) = I0 (vu − m(vu )) + σ (u) ,
Parametric template and deformation:
kp
Iα (v ) = (Kp α)(v ) = Kp (v , rp,k )αj and
j=1
kg
mz (v ) = (Kg z(v ) = Kg (v , rg ,k )z j .
j=1
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 28 / 42
73. Application on Bayesian Mixed effect template estimation Description of the BME Template model
BME Template model with small deformations
Deformable template model: (u = voxel, vu its position)
y (u) = I0 (vu − m(vu )) + σ (u) ,
Parametric template and deformation:
kp
Iα (v ) = (Kp α)(v ) = Kp (v , rp,k )αj and
j=1
kg
mz (v ) = (Kg z(v ) = Kg (v , rg ,k )z j .
j=1
Generative model:
z ∼ ⊗n N2kg (0, Γg ) | Γg ,
i=1
y ∼ ⊗n N|Λ| (mzi Iα , σ 2 Id) | z, α, σ 2 ,
i=1
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 28 / 42
74. Application on Bayesian Mixed effect template estimation Description of the BME Template model
BME Template model with small deformations
Deformable template model: (u = voxel, vu its position)
y (u) = I0 (vu − m(vu )) + σ (u) ,
Parametric template and deformation:
kp
Iα (v ) = (Kp α)(v ) = Kp (v , rp,k )αj and
j=1
kg
mz (v ) = (Kg z(v ) = Kg (v , rg ,k )z j .
j=1
Generative model:
z ∼ ⊗n N2kg (0, Γg ) | Γg ,
i=1
y ∼ ⊗n N|Λ| (mzi Iα , σ 2 Id) | z, α, σ 2 ,
i=1
Bayesian framework → MAP estimator (= penalised MLE)
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 28 / 42
75. Application on Bayesian Mixed effect template estimation Results on the template estimation
Training sets
Figure: Left: Training set (inverse video). Right: Noisy training set (inverse
video).
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 29 / 42
76. Application on Bayesian Mixed effect template estimation Results on the template estimation
Estimated templates
Algorithm/ FAM-EM H.G.-SAEM AMALA-SAEM
Noise level
No Noise
Noisy
of Variance 1
Figure: Estimated templates using different algorithms and two level of noise.
The training set includes 20 images per digit.
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 30 / 42
77. Application on Bayesian Mixed effect template estimation Results on the covariance matrix estimation
Estimated geometric variability
Figure: Synthetic samples generated with respect to the BME template model.
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 31 / 42
78. Application on Bayesian Mixed effect template estimation CLT empirical proof
Empirical proof of the CLT
Figure: Evolution of the estimation of the noise variance along the SAEM
iterations. Left: original data. Right: Noisy training set.
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 32 / 42
79. Application on Bayesian Mixed effect template estimation CLT empirical proof
Figure: Evolution of the estimation of the noise variance along the SAEM
iterations. Test of convergence towards the Gaussian distribution of the estimated
parameters.
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 33 / 42
80. Application on Bayesian Mixed effect template estimation Medical image template estimation
Corpus callosum data base
Figure: Medical image template estimation: 10 Corpus callosum and splenium
training images among the 47 available.
Figure: Grey level mean. FAM-EM estimated template. Hybrid Gibbs - SAEM
estimated template.AMALA-SAEM estimation .
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 34 / 42
81. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting
BME Template model with LDDMM
Deformable template model: (u = voxel, vu its position)
y (u) = I0 (φ−1 (vu )) + σ (u) ,
β(0)
kp
Parametric template: Iα (v ) = (Kp α)(v ) = Kp (v , rp,k )αj and φ
j=1
LDDMM solution of shooting with initial momentum β(0).
Generative model:
z ∼ ⊗n N2kg (0, Γg ) | Γg ,
i=1
y ∼ ⊗n N|Λ| (φβ(0) Iα , σ 2 Id) | z, α, σ 2 ,
i=1
Bayesian framework → MAP estimator (= penalised MLE)
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 35 / 42
82. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting
LDDMM: parametric deformation:
Fix some control points: c(t) = (c1 (t), ..., cng (t))
Choose a kernel Kg
Start from an initial momentum β(0) = β 1 (0), ..., β ng (0)
Then, Hamiltonian System → Time evolution of both momenta and
control points
dc = ∂H (c, β) = K (c(t))β(t)
dt g
∂β
(9)
dβ
∂H 1
= − (c, β) = − c(t) K (β(t), β(t))
dt ∂c 2
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 36 / 42
83. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting
LDDMM: parametric deformation (2):
Interpolating on any point of the domain:
ng
vt (r ) = (Kg β(t))(r ) = Kg (r , ck (t))β k (t) ∀r ∈ D (10)
k=1
Deformation = solution of the flow equation:
∂φβ(0) (t)
= vt ◦ φβ(0) (t) (11)
φ ∂t
0 = Id .
φβ(0) = φβ(0) (1)
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 37 / 42
84. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting
Gradient computation
E (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2
1 Reg(φ1 )
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 38 / 42
85. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting
Gradient computation
E (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2
1 Reg(φ1 )
S0
S0 = {(ci , βi )}i
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 38 / 42
86. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting
Gradient computation
E (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2
1 Reg(φ1 )
S0
S0 = {(ci , βi )}i
dS(t)
= F (S(t)) S(0) = S0
dt
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 38 / 42
87. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting
Gradient computation
E (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2
1 Reg(φ1 )
S0
S0 = {(ci , βi )}i
dS(t)
= F (S(t)) S(0) = S0
dt
dy (t)
= G (S(t), y (t)) y (1) = y
dt
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 38 / 42
88. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting
Gradient computation
E (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2
1 Reg(φ1 )
S0 A(yk (0))
S0 = {(ci , βi )}i
dS(t)
= F (S(t)) S(0) = S0
dt
dy (t)
= G (S(t), y (t)) y (1) = y
dt
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 38 / 42
89. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting
Gradient computation
E (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2
1 Reg(φ1 )
S0 A(yk (0)) L(S0 )=β(0)t Γg (q(0),q(0))β(0)
S0 = {(ci , βi )}i
dS(t)
= F (S(t)) S(0) = S0
dt
dy (t)
= G (S(t), y (t)) y (1) = y
dt
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 38 / 42
90. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting
Gradient computation
E (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2
1 Reg(φ1 )
S0 A(yk (0)) L(S0 )=β(0)t Γg (q(0),q(0))β(0)
S0 = {(ci , βi )}i
dS(t)
= F (S(t)) S(0) = S0
dt
dy (t)
= G (S(t), y (t)) y (1) = y
dt
T
S0 E = dS0 y (0) y (0) A + S0 L
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 38 / 42
91. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting
Gradient computation
E (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2
1 Reg(φ1 )
S0 A(yk (0)) L(S0 )=β(0)t Γg (q(0),q(0))β(0)
S0 = {(ci , βi )}i
dS(t)
= F (S(t)) S(0) = S0
dt
dy (t)
= G (S(t), y (t)) y (1) = y
dt
T
S0 E = dS0 y (0) y (0) A + S0 L
yk (0) A = 2 (I0 (yk (0)) − I (yk (1))) yk (0) I0
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 38 / 42
92. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting
Gradient computation
E (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2
1 Reg(φ1 )
S0 A(yk (0)) L(S0 )=β(0)t Γg (q(0),q(0))β(0)
S0 = {(ci , βi )}i
dS(t)
= F (S(t)) S(0) = S0
dt
dy (t)
= G (S(t), y (t)) y (1) = y
dt
T
S0 E = dS0 y (0) y (0) A + S0 L
yk (0) A = 2 (I0 (yk (0)) − I (yk (1))) yk (0) I0
- Momenta decrease image discrepancy
- Control Points attracted by image contours
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 38 / 42
93. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting
Gradient computation
E (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2
1 Reg(φ1 )
S0 A(yk (0)) L(S0 )=β(0)t Γg (q(0),q(0))β(0)
S0 = {(ci , βi )}i
dS(t)
= F (S(t)) S(0) = S0
dt
dy (t)
= G (S(t), y (t)) y (1) = y
dt
dη(t)
= ∂S(t) G T η(t), η(0) = y (0) A
dt
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 38 / 42
94. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting
Gradient computation
E (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2
1 Reg(φ1 )
S0 A(yk (0)) L(S0 )=β(0)t Γg (q(0),q(0))β(0)
S0 = {(ci , βi )}i
dS(t)
= F (S(t)) S(0) = S0
dt
dy (t)
= G (S(t), y (t)) y (1) = y
dt
dη(t)
= ∂S(t) G T η(t), η(0) = y (0) A
dt
dξ(t)
= ∂y (t) G T η(t) − dF T ξ(t), ξ(1) = 0
dt
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 38 / 42
95. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting
Gradient computation
E (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2
1 Reg(φ1 )
S0 A(yk (0)) L(S0 )=β(0)t Γg (q(0),q(0))β(0)
S0 = {(ci , βi )}i
dS(t)
= F (S(t)) S(0) = S0
dt
dy (t)
= G (S(t), y (t)) y (1) = y
dt
dη(t)
= ∂S(t) G T η(t), η(0) = y (0) A
dt
dξ(t)
= ∂y (t) G T η(t) − dF T ξ(t), ξ(1) = 0
dt
S0 E = ξ(0) + S0 L
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 38 / 42
96. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting
Using LDDMM deformations via shooting (preliminary results)
AMALA : GH :
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 39 / 42
97. Conclusion
Conclusion
Good performances (as accurate as other algorithms)
Reduce computational time
Can handle the movement of control points in practice (theory to
confirm)
Can handle sparsity of the template ( model selection)
Removing control points ? In practice, why not... theory ?
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 40 / 42
98. Conclusion
Conclusion
Good performances (as accurate as other algorithms)
Reduce computational time
Can handle the movement of control points in practice (theory to
confirm)
Can handle sparsity of the template ( model selection)
Removing control points ? In practice, why not... theory ?
Thank you !
St´phanie Allassonni`re (CMAP)
e e AMALA BigMC, January 2012 40 / 42