SlideShare une entreprise Scribd logo
1  sur  100
Télécharger pour lire hors ligne
Anisotropic Metropolis adjusted Langevin algorithm:
    Convergence and utility in stochastic EM algorithm.

                                   ´                 `
                                 Stephanie Allassonniere
                                          ´
                                    CMAP, Ecole Polytechnique


                                    BigMC, January 2012



Join work with Estelle Kuhn (INRA, France)


St´phanie Allassonni`re (CMAP)
  e                 e                        AMALA              BigMC, January 2012   1 / 42
Introduction


Introduction:


Where does the problem came from?
      Image analysis: Compare two observations via the quantification of
      the deformation from one to the other (D’Arcy Thompson, 1917)




      Each element of a population is a smooth deformation of a template



 St´phanie Allassonni`re (CMAP)
   e                 e                     AMALA       BigMC, January 2012   2 / 42
Introduction


Introduction:


Where does the problem came from?
      Image analysis: Compare two observations via the quantification of
      the deformation from one to the other (D’Arcy Thompson, 1917)




                                  Registration
      Each element of a population is a smooth deformation of a template



 St´phanie Allassonni`re (CMAP)
   e                 e                      AMALA      BigMC, January 2012   2 / 42
Introduction


Introduction:


Where does the problem came from?
      Image analysis: Compare two observations via the quantification of
      the deformation from one to the other (D’Arcy Thompson, 1917)




                                    Registration
      Each element of a population is a smooth deformation of a template
                                  Template estimation


 St´phanie Allassonni`re (CMAP)
   e                 e                        AMALA     BigMC, January 2012   2 / 42
Introduction


Introduction:


Where does the problem came from?
      Image analysis: Compare two observations via the quantification of
      the deformation from one to the other (D’Arcy Thompson, 1917)




                                    Registration
      Each element of a population is a smooth deformation of a template
                                  Template estimation / Mean


 St´phanie Allassonni`re (CMAP)
   e                 e                        AMALA            BigMC, January 2012   2 / 42
Introduction


Introduction:


Where does the problem came from?
      Image analysis: Compare two observations via the quantification of
      the deformation from one to the other (D’Arcy Thompson, 1917)




                                    Registration / Variance
      Each element of a population is a smooth deformation of a template
                                  Template estimation / Mean


 St´phanie Allassonni`re (CMAP)
   e                 e                        AMALA            BigMC, January 2012   2 / 42
Introduction


Introduction:



Where does the problem came from?




 St´phanie Allassonni`re (CMAP)
   e                 e                     AMALA   BigMC, January 2012   3 / 42
Introduction


Introduction:



Where does the problem came from?


      Deformable template model: (u = voxel, vu its position)

                                  y (u) = I0 (vu − m(vu )) + σ (u) ,




 St´phanie Allassonni`re (CMAP)
   e                 e                             AMALA               BigMC, January 2012   3 / 42
Introduction


Introduction:



Where does the problem came from?


      Deformable template model: (u = voxel, vu its position)

                                  y (u) = I0 (vu − m(vu )) + σ (u) ,

      Template I0 and geometry Law (m) estimation




 St´phanie Allassonni`re (CMAP)
   e                 e                             AMALA               BigMC, January 2012   3 / 42
Introduction


Introduction:



Where does the problem came from?


      Deformable template model: (u = voxel, vu its position)

                                  y (u) = I0 (vu − m(vu )) + σ (u) ,

      Template I0 and geometry Law (m) estimation


      High dimensional setting, Low sample size




 St´phanie Allassonni`re (CMAP)
   e                 e                             AMALA               BigMC, January 2012   3 / 42
Introduction


Introduction:



Where does the problem came from?


      Deformable template model: (u = voxel, vu its position)

                                  y (u) = I0 (vu − m(vu )) + σ (u) ,

      Template I0 and geometry Law (m) estimation


      High dimensional setting, Low sample size


      Considering the LDDMM framework through the shooting equations



 St´phanie Allassonni`re (CMAP)
   e                 e                             AMALA               BigMC, January 2012   3 / 42
Introduction


Outline:



               1. AMALA: simulation of random variables in high dimension
                     Anisotropic MALA description
                     Convergence property
               2. AMALA within stochastic algorithm for parameter estimation

                           Maximum likelihood estimation for incomplete data
                           setting
                           AMALA-SAEM
                           Convergence properties
               3. Experiments
                      BME-Template model: small deformation setting
                      BME-Template model: LDDMM setting


 St´phanie Allassonni`re (CMAP)
   e                 e                        AMALA            BigMC, January 2012   4 / 42
Introduction


Outline:



               1. AMALA: simulation of random variables in high dimension
                     Anisotropic MALA description
                     Convergence property
               2. AMALA within stochastic algorithm for parameter estimation

                           Maximum likelihood estimation for incomplete data
                           setting
                           AMALA-SAEM
                           Convergence properties
               3. Experiments
                      BME-Template model: small deformation setting
                      BME-Template model: LDDMM setting


 St´phanie Allassonni`re (CMAP)
   e                 e                        AMALA            BigMC, January 2012   5 / 42
Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)


Introduction:



General setting:




  St´phanie Allassonni`re (CMAP)
    e                 e                                 AMALA   BigMC, January 2012   6 / 42
Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)


Introduction:



General setting:
        Simulation of random variable in high dimension settings: → Gibbs
        Sampler not useful




  St´phanie Allassonni`re (CMAP)
    e                 e                                 AMALA   BigMC, January 2012   6 / 42
Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)


Introduction:



General setting:
        Simulation of random variable in high dimension settings: → Gibbs
        Sampler not useful
        Metropolis Adjusted Langevin Algorithm (MALA)




  St´phanie Allassonni`re (CMAP)
    e                 e                                 AMALA   BigMC, January 2012   6 / 42
Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)


Introduction:



General setting:
        Simulation of random variable in high dimension settings: → Gibbs
        Sampler not useful
        Metropolis Adjusted Langevin Algorithm (MALA)
               Target distribution: π




  St´phanie Allassonni`re (CMAP)
    e                 e                                 AMALA   BigMC, January 2012   6 / 42
Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)


Introduction:



General setting:
        Simulation of random variable in high dimension settings: → Gibbs
        Sampler not useful
        Metropolis Adjusted Langevin Algorithm (MALA)
               Target distribution: π
               At iteration k of this algorithm, Xk the current value
               Simulate Xc w.r.t. N (Xk + δD(Xk ), δIdd )
               where D(x) = max(b,| blog π(x)|) log π(x).
               Update Xk+1 = Xc with probability
               α(Xk , Xc ) = min 1, qMALA (Xk ,Xc(Xc ,Xk ) and Xk+1 = Xk otherwise.
                                     π(Xc )qMALA
                                                  )π(Xk )




  St´phanie Allassonni`re (CMAP)
    e                 e                                 AMALA       BigMC, January 2012   6 / 42
Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)


Introduction:



General setting:
        Simulation of random variable in high dimension settings: → Gibbs
        Sampler not useful
        Metropolis Adjusted Langevin Algorithm (MALA)
               Target distribution: π
               At iteration k of this algorithm, Xk the current value
               Simulate Xc w.r.t. N (Xk + δD(Xk ), δIdd )
               where D(x) = max(b,| blog π(x)|) log π(x).
               Update Xk+1 = Xc with probability
               α(Xk , Xc ) = min 1, qMALA (Xk ,Xc(Xc ,Xk ) and Xk+1 = Xk otherwise.
                                     π(Xc )qMALA
                                                  )π(Xk )

        Problem: isotropic covariance matrix = numerically trapped
        (α(Xk , Xc ) = 0)


  St´phanie Allassonni`re (CMAP)
    e                 e                                 AMALA       BigMC, January 2012   6 / 42
Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)


Introduction:


General setting:
        Simulation of random variable in high dimension settings: → Gibbs
        Sampler not useful
        Metropolis Adjusted Langevin Algorithm (MALA)
               Target distribution: π
               At iteration k of this algorithm, Xk the current value
               Simulate Xc w.r.t. N (Xk + δD(Xk ), δIdd )
               where D(x) = max(b,| blog π(x)|) log π(x).
               Update Xk+1 = Xc with probability
               α(Xk , Xc ) = min 1, qMALA (Xk ,Xc(Xc ,Xk ) and Xk+1 = Xk otherwise.
                                     π(Xc )qMALA
                                                  )π(Xk )

        Problem: isotropic covariance matrix = numerically trapped
        (α(Xk , Xc ) = 0)
  → Anisotropic Metropolis Adjusted Langevin Algorithm (AMALA)

  St´phanie Allassonni`re (CMAP)
    e                 e                                 AMALA       BigMC, January 2012   6 / 42
Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)   Description of the algorithm


How including anisotropy?




        Following the magnitude of the gradient




  St´phanie Allassonni`re (CMAP)
    e                 e                                 AMALA                               BigMC, January 2012   7 / 42
Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)   Description of the algorithm


How including anisotropy?




        Following the magnitude of the gradient



        First approximation: independence of directions




  St´phanie Allassonni`re (CMAP)
    e                 e                                 AMALA                               BigMC, January 2012   7 / 42
Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)   Description of the algorithm


How including anisotropy?




        Following the magnitude of the gradient



        First approximation: independence of directions



        Bounded covariance (same as bounded drift)




  St´phanie Allassonni`re (CMAP)
    e                 e                                 AMALA                               BigMC, January 2012   7 / 42
Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)   Description of the algorithm


Anisotropic Metropolis Adjusted Langevin Algorithm (AMALA)

        For all k = 1 : kend Iterates of Markov chain




  St´phanie Allassonni`re (CMAP)
    e                 e                                 AMALA                               BigMC, January 2012   8 / 42
Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)   Description of the algorithm


Anisotropic Metropolis Adjusted Langevin Algorithm (AMALA)

        For all k = 1 : kend Iterates of Markov chain
        Sample Xc with respect to
                                          N (Xk + δD(Xk ), δΣ(Xk ))
                                            b
        with D(Xk ) =           max(b,|     log π(Xk )|)     log π(Xk ) and

         Σ(Xk ) = Idd + diag ([ log π(Xk )]2 ∧ b), ... , ([ log π(Xk )]2 ∧ b)
                                           1                           d




  St´phanie Allassonni`re (CMAP)
    e                 e                                 AMALA                               BigMC, January 2012   8 / 42
Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)   Description of the algorithm


Anisotropic Metropolis Adjusted Langevin Algorithm (AMALA)

        For all k = 1 : kend Iterates of Markov chain
        Sample Xc with respect to
                                          N (Xk + δD(Xk ), δΣ(Xk ))
                                            b
        with D(Xk ) =           max(b,|     log π(Xk )|)     log π(Xk ) and

         Σ(Xk ) = Idd + diag ([ log π(Xk )]2 ∧ b), ... , ([ log π(Xk )]2 ∧ b)
                                           1                           d


        Compute the acceptance ratio
                                                                 π(Xc )qc (Xc , Xk )
                               α(Xk , Xc ) = min 1,
                                                                 qc (Xk , Xc )π(Xk )
        (qc = the pdf of this distribution).




  St´phanie Allassonni`re (CMAP)
    e                 e                                 AMALA                               BigMC, January 2012   8 / 42
Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)   Description of the algorithm


Anisotropic Metropolis Adjusted Langevin Algorithm (AMALA)

        For all k = 1 : kend Iterates of Markov chain
        Sample Xc with respect to
                                          N (Xk + δD(Xk ), δΣ(Xk ))
                                            b
        with D(Xk ) =           max(b,|     log π(Xk )|)     log π(Xk ) and

         Σ(Xk ) = Idd + diag ([ log π(Xk )]2 ∧ b), ... , ([ log π(Xk )]2 ∧ b)
                                           1                           d


        Compute the acceptance ratio
                                                                 π(Xc )qc (Xc , Xk )
                               α(Xk , Xc ) = min 1,
                                                                 qc (Xk , Xc )π(Xk )
        (qc = the pdf of this distribution).
        Sample Xk+1 = Xc with probability α(Xk , Xc ) and Xk+1 = Xk with
        probability 1 − α(Xk , Xc ) = Acceptation/reject

  St´phanie Allassonni`re (CMAP)
    e                 e                                 AMALA                               BigMC, January 2012   8 / 42
Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)   Geometric ergodicity of the chain


Geometric ergodicity of the Markov chain



                                   Condition:
        π super-exponential: Smoothness condition on the target distribution
       (B1) The density π is positive with continuous first derivative such that:

                                                lim n(x). log π(x) = −∞                                            (1)
                                              |x|→∞

               and
                                                   lim sup n(x).m(x) < 0                                           (2)
                                                    |x|→∞

                                                                                        x
               where         is the gradient operator in Rd , n(x) =                   |x|   is the unit vector
                                                                                π(x)
               pointing in the direction of x and m(x) =           is the unit vector in
                                                                              | π(x)|
               the direction of the gradient of the stationary distribution at point x.



  St´phanie Allassonni`re (CMAP)
    e                 e                                 AMALA                                BigMC, January 2012   9 / 42
Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)   Geometric ergodicity of the chain


Geometric ergodicity of the Markov chain



                                                         Result:




  St´phanie Allassonni`re (CMAP)
    e                 e                                 AMALA                             BigMC, January 2012   10 / 42
Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)   Geometric ergodicity of the chain


Geometric ergodicity of the Markov chain



                                                         Result:
        Existence of a small set

                         Π(x, A) ≥ εν(A)1C (x), ∀x ∈ X and ∀A ∈ B




  St´phanie Allassonni`re (CMAP)
    e                 e                                 AMALA                             BigMC, January 2012   10 / 42
Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)   Geometric ergodicity of the chain


Geometric ergodicity of the Markov chain



                                                         Result:
        Existence of a small set

                         Π(x, A) ≥ εν(A)1C (x), ∀x ∈ X and ∀A ∈ B

        Drift condition: pulls the chain back into the small set

                                         ΠV (x) ≤ λV (x) + b1C (x) .




  St´phanie Allassonni`re (CMAP)
    e                 e                                 AMALA                             BigMC, January 2012   10 / 42
Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)   Geometric ergodicity of the chain


Geometric ergodicity of the Markov chain



                                                         Result:
        Existence of a small set

                         Π(x, A) ≥ εν(A)1C (x), ∀x ∈ X and ∀A ∈ B

        Drift condition: pulls the chain back into the small set

                                         ΠV (x) ≤ λV (x) + b1C (x) .

        Geometric ergodicity

                                               |Πn V (x) − π(x)|
                                        sup                      ≤ Rρn .                                        (3)
                                       x∈X           V (x)



  St´phanie Allassonni`re (CMAP)
    e                 e                                 AMALA                             BigMC, January 2012   10 / 42
Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)   Geometric ergodicity of the chain


Experiments on synthetic data




        Target: 10 dimensional Gaussian distribution with zero mean and
        diagonal covariance matrix with diagonal coefficients randomly picked
        between 1 and 2500


        Comparison of AMALA and symmetric random walk


        500, 000 iterations for each algorithm starting at zero


        Mean squared jump distance (MSJD) in stationarity:
        AMALA 0.1504 - random walk 0.0407.



  St´phanie Allassonni`re (CMAP)
    e                 e                                 AMALA                             BigMC, January 2012   11 / 42
Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)   Geometric ergodicity of the chain


Experiments on synthetic data




Figure: Autocorrelation functions of the AMALA (red) and the random walk
(blue) samplers for four of the ten components of the Gaussian 10 dimensional
distribution.
  St´phanie Allassonni`re (CMAP)
    e                 e                                 AMALA                             BigMC, January 2012   12 / 42
Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)   Geometric ergodicity of the chain


Why not using exising MALA-like algorithms?



        Optimised MALA-like algorithms are usually adaptive
        Good performances in practice
        Good theoretical properties




  St´phanie Allassonni`re (CMAP)
    e                 e                                 AMALA                             BigMC, January 2012   13 / 42
Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)   Geometric ergodicity of the chain


Why not using exising MALA-like algorithms?



        Optimised MALA-like algorithms are usually adaptive
        Good performances in practice
        Good theoretical properties

                                                     However




  St´phanie Allassonni`re (CMAP)
    e                 e                                 AMALA                             BigMC, January 2012   13 / 42
Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)   Geometric ergodicity of the chain


Why not using exising MALA-like algorithms?



        Optimised MALA-like algorithms are usually adaptive
        Good performances in practice
        Good theoretical properties

                                                     However

        Numerical problem at the first iterations (not yet stationary):
        convergence time?




  St´phanie Allassonni`re (CMAP)
    e                 e                                 AMALA                             BigMC, January 2012   13 / 42
Anisotropic Metropolis Adjusted Langevin algorithm (AMALA)   Geometric ergodicity of the chain


Why not using exising MALA-like algorithms?



        Optimised MALA-like algorithms are usually adaptive
        Good performances in practice
        Good theoretical properties

                                                     However

        Numerical problem at the first iterations (not yet stationary):
        convergence time?
        Most important: Our goal = parameter estimation
        AMALA = one tool inside another algorithm
        Adaptive + estimation algorithm = numerical issues: too many
        degree of freedom


  St´phanie Allassonni`re (CMAP)
    e                 e                                 AMALA                             BigMC, January 2012   13 / 42
Applying AMALA within SAEM


Outline:



               1. AMALA: simulation of random variables in high dimension
                     Anisotropic MALA description
                     Convergence property
               2. AMALA within stochastic algorithm for parameter estimation

                           Maximum likelihood estimation for incomplete data
                           setting
                           AMALA-SAEM
                           Convergence properties
               3. Experiments
                      BME-Template model: small deformation setting
                      BME-Template model: LDDMM setting


 St´phanie Allassonni`re (CMAP)
   e                 e                            AMALA       BigMC, January 2012   14 / 42
Applying AMALA within SAEM   Maximum likelihood estimation for incomplete data setting


Maximum likelihood estimation for incomplete data setting

      y ∈ Rn : observed data




 St´phanie Allassonni`re (CMAP)
   e                 e                            AMALA                            BigMC, January 2012       15 / 42
Applying AMALA within SAEM   Maximum likelihood estimation for incomplete data setting


Maximum likelihood estimation for incomplete data setting

      y ∈ Rn : observed data
      z ∈ Rl : missing data




 St´phanie Allassonni`re (CMAP)
   e                 e                            AMALA                            BigMC, January 2012       15 / 42
Applying AMALA within SAEM   Maximum likelihood estimation for incomplete data setting


Maximum likelihood estimation for incomplete data setting

      y ∈ Rn : observed data
      z ∈ Rl : missing data
      (y , z) ∈ Rn+l : complete data




 St´phanie Allassonni`re (CMAP)
   e                 e                            AMALA                            BigMC, January 2012       15 / 42
Applying AMALA within SAEM   Maximum likelihood estimation for incomplete data setting


Maximum likelihood estimation for incomplete data setting

      y ∈ Rn : observed data
      z ∈ Rl : missing data
      (y , z) ∈ Rn+l : complete data
      P = {f (y , z; θ), θ ∈ Θ}: family of parametric pdfs on Rn+l




 St´phanie Allassonni`re (CMAP)
   e                 e                            AMALA                            BigMC, January 2012       15 / 42
Applying AMALA within SAEM   Maximum likelihood estimation for incomplete data setting


Maximum likelihood estimation for incomplete data setting

      y ∈ Rn : observed data
      z ∈ Rl : missing data
      (y , z) ∈ Rn+l : complete data
      P = {f (y , z; θ), θ ∈ Θ}: family of parametric pdfs on Rn+l
      Assumption:
      ∃θ ∈ Θ s.t. the complete data likelihood q(y , z; θ) = f (y , z; θ)




 St´phanie Allassonni`re (CMAP)
   e                 e                            AMALA                            BigMC, January 2012       15 / 42
Applying AMALA within SAEM   Maximum likelihood estimation for incomplete data setting


Maximum likelihood estimation for incomplete data setting

      y ∈ Rn : observed data
      z ∈ Rl : missing data
      (y , z) ∈ Rn+l : complete data
      P = {f (y , z; θ), θ ∈ Θ}: family of parametric pdfs on Rn+l
      Assumption:
      ∃θ ∈ Θ s.t. the complete data likelihood q(y , z; θ) = f (y , z; θ)
      Observed likelihood:

                                     g (y ; θ) =       f (y , z; θ)µ(dz).                                    (4)




 St´phanie Allassonni`re (CMAP)
   e                 e                            AMALA                            BigMC, January 2012       15 / 42
Applying AMALA within SAEM   Maximum likelihood estimation for incomplete data setting


Maximum likelihood estimation for incomplete data setting

      y ∈ Rn : observed data
      z ∈ Rl : missing data
      (y , z) ∈ Rn+l : complete data
      P = {f (y , z; θ), θ ∈ Θ}: family of parametric pdfs on Rn+l
      Assumption:
      ∃θ ∈ Θ s.t. the complete data likelihood q(y , z; θ) = f (y , z; θ)
      Observed likelihood:

                                     g (y ; θ) =       f (y , z; θ)µ(dz).                                    (4)


                                                   n
      Given a sample of observations (yi )1≤i≤n = y1




 St´phanie Allassonni`re (CMAP)
   e                 e                            AMALA                            BigMC, January 2012       15 / 42
Applying AMALA within SAEM   Maximum likelihood estimation for incomplete data setting


Maximum likelihood estimation for incomplete data setting

      y ∈ Rn : observed data
      z ∈ Rl : missing data
      (y , z) ∈ Rn+l : complete data
      P = {f (y , z; θ), θ ∈ Θ}: family of parametric pdfs on Rn+l
      Assumption:
      ∃θ ∈ Θ s.t. the complete data likelihood q(y , z; θ) = f (y , z; θ)
      Observed likelihood:

                                     g (y ; θ) =       f (y , z; θ)µ(dz).                                    (4)


      Given a sample of observations (yi )1≤i≤n = y1n
            ˆ
      Find: θg in Θ s.t.
                             ˆ                  n
                             θg = arg max g (y1 ; θ)
                                                       θ∈Θ

 St´phanie Allassonni`re (CMAP)
   e                 e                            AMALA                            BigMC, January 2012       15 / 42
Applying AMALA within SAEM   Description of the algorithm


AMALA-SAEM




      Incomplete data setting + maximum likelihood estimation = EM
      algorithm




 St´phanie Allassonni`re (CMAP)
   e                 e                            AMALA                               BigMC, January 2012   16 / 42
Applying AMALA within SAEM   Description of the algorithm


AMALA-SAEM




      Incomplete data setting + maximum likelihood estimation = EM
      algorithm


      General case −→ E step not tractable




 St´phanie Allassonni`re (CMAP)
   e                 e                            AMALA                               BigMC, January 2012   16 / 42
Applying AMALA within SAEM   Description of the algorithm


AMALA-SAEM




      Incomplete data setting + maximum likelihood estimation = EM
      algorithm


      General case −→ E step not tractable


      Stochastic Approximation EM for convergence properties




 St´phanie Allassonni`re (CMAP)
   e                 e                            AMALA                               BigMC, January 2012   16 / 42
Applying AMALA within SAEM   Description of the algorithm


AMALA-SAEM




      Incomplete data setting + maximum likelihood estimation = EM
      algorithm


      General case −→ E step not tractable


      Stochastic Approximation EM for convergence properties


      with MCMC method for simulation step.




 St´phanie Allassonni`re (CMAP)
   e                 e                            AMALA                               BigMC, January 2012   16 / 42
Applying AMALA within SAEM   Description of the algorithm


AMALA-SAEM



      Incomplete data setting + maximum likelihood estimation = EM
      algorithm


      General case −→ E step not tractable


      Stochastic Approximation EM for convergence properties


      with MCMC method for simulation step.

           → AMALA-SAEM: using AMALA as the MCMC method



 St´phanie Allassonni`re (CMAP)
   e                 e                            AMALA                               BigMC, January 2012   16 / 42
Applying AMALA within SAEM   Description of the algorithm


Description of the algorithm


Assumption: model in the exponential family = all information carried by
sufficient statistics S
      For k = 1 : kend Iteration of SAEM




 St´phanie Allassonni`re (CMAP)
   e                 e                            AMALA                               BigMC, January 2012   17 / 42
Applying AMALA within SAEM   Description of the algorithm


Description of the algorithm


Assumption: model in the exponential family = all information carried by
sufficient statistics S
      For k = 1 : kend Iteration of SAEM
      Sample zk through a single AMALA step (simulation and
      acceptation/reject) using current parameter θk−1




 St´phanie Allassonni`re (CMAP)
   e                 e                            AMALA                               BigMC, January 2012   17 / 42
Applying AMALA within SAEM   Description of the algorithm


Description of the algorithm


Assumption: model in the exponential family = all information carried by
sufficient statistics S
      For k = 1 : kend Iteration of SAEM
      Sample zk through a single AMALA step (simulation and
      acceptation/reject) using current parameter θk−1
      Compute the stochastic approximation

                                   sk = sk−1 + γk (S(zk ) − sk−1 ) ,

      where (γk )k is a sequence of positive step sizes.




 St´phanie Allassonni`re (CMAP)
   e                 e                            AMALA                               BigMC, January 2012   17 / 42
Applying AMALA within SAEM   Description of the algorithm


Description of the algorithm


Assumption: model in the exponential family = all information carried by
sufficient statistics S
      For k = 1 : kend Iteration of SAEM
      Sample zk through a single AMALA step (simulation and
      acceptation/reject) using current parameter θk−1
      Compute the stochastic approximation

                                   sk = sk−1 + γk (S(zk ) − sk−1 ) ,

      where (γk )k is a sequence of positive step sizes.
      Update the parameter
                                                      ˆ
                                                 θk = θ(sk ).



 St´phanie Allassonni`re (CMAP)
   e                 e                            AMALA                               BigMC, January 2012   17 / 42
Applying AMALA within SAEM   Description of the algorithm


Description of the algorithm

Assumption: model in the exponential family = all information carried by
sufficient statistics S
      For k = 1 : kend Iteration of SAEM
      Sample zk through a single AMALA step (simulation and
      acceptation/reject) using current parameter θk−1
      Compute the stochastic approximation

                                   sk = sk−1 + γk (S(zk ) − sk−1 ) ,

      where (γk )k is a sequence of positive step sizes.
      Update the parameter
                                                      ˆ
                                                 θk = θ(sk ).


Can require truncation on random boundaries for convergence purposes

 St´phanie Allassonni`re (CMAP)
   e                 e                            AMALA                               BigMC, January 2012   17 / 42
Applying AMALA within SAEM   Convergence properties


Convergence properties




                                                Conditions:
      Smoothness of the model (classic conditions for convergence of
      stochastic approximation and EM)
      Condition for AMALA geometric ergodicity (B1)




 St´phanie Allassonni`re (CMAP)
   e                 e                            AMALA                         BigMC, January 2012   18 / 42
Applying AMALA within SAEM   Convergence properties


Convergence properties




                                                Conditions:
      Smoothness of the model (classic conditions for convergence of
      stochastic approximation and EM)
      Condition for AMALA geometric ergodicity (B1)
                                                   Results:
      Convergence of (sk ) a.s. towards critical point of mean field of the
      problem




 St´phanie Allassonni`re (CMAP)
   e                 e                            AMALA                         BigMC, January 2012   18 / 42
Applying AMALA within SAEM   Convergence properties


Convergence properties




                                                Conditions:
      Smoothness of the model (classic conditions for convergence of
      stochastic approximation and EM)
      Condition for AMALA geometric ergodicity (B1)
                                                   Results:
      Convergence of (sk ) a.s. towards critical point of mean field of the
      problem
      Convergence of estimated parameters (θk ) a.s. towards critical point
      of observed likelihood




 St´phanie Allassonni`re (CMAP)
   e                 e                            AMALA                         BigMC, January 2012   18 / 42
Applying AMALA within SAEM   Convergence properties


Convergence properties




                                                Conditions:
      Smoothness of the model (classic conditions for convergence of
      stochastic approximation and EM)
      Condition for AMALA geometric ergodicity (B1)
                                                   Results:
      Convergence of (sk ) a.s. towards critical point of mean field of the
      problem
      Convergence of estimated parameters (θk ) a.s. towards critical point
      of observed likelihood
                                                  √
      Central limit theorem for (θk ) with rate 1/ γk



 St´phanie Allassonni`re (CMAP)
   e                 e                            AMALA                         BigMC, January 2012   18 / 42
Applying AMALA within SAEM   Convergence properties


 Conditions for the SA to converge




 Define for any V : X → [1, ∞] and any g : X → Rm the norm

                                                            g (z)
                                          g   V   = sup           .
                                                    z∈X     V (z)

(A1’) S is an open subset of Rm , h : S → Rm is continuous and there exists
      a continuously differentiable function w : S → [0, ∞[ with the
      following properties.
         (i) There exists an M0 > 0 such that

                        L      {s ∈ S,        w (s), h(s) = 0} ⊂ {s ∈ S, w (s) < M0 } .




  St´phanie Allassonni`re (CMAP)
    e                 e                             AMALA                         BigMC, January 2012   19 / 42
Applying AMALA within SAEM   Convergence properties


 Conditions for the SA to converge (2)




         (ii) There exists a closed convex set Sa ⊂ S for which
              s → s + ρHs (z) ∈ Sa for any ρ ∈ [0, 1] and (z, s) ∈ X × Sa (Sa is
              absorbing) and such that for any M1 ∈]M0 , ∞], the set WM1 ∩ Sa is a
              compact set of S where WM1 {s ∈ S, w (s) ≤ M1 }.
        (iii) For any s ∈ SL      w (s), h(s) < 0.
        (iv) The closure of w (L) has an empty interior.

(A2’) For any s ∈ S, Hs : X → S is measurable and                                Hs (z) πs (dz) < ∞.




  St´phanie Allassonni`re (CMAP)
    e                 e                            AMALA                         BigMC, January 2012   20 / 42
Applying AMALA within SAEM       Convergence properties


 Conditions for the SA to converge (3)



(A3”) There exist a function V : X → [1, ∞] such that
      {z ∈ X , V (z) < ∞} = ∅, constants a ∈]0, 1], p ≥ 2 , r > 0 and
      q ≥ 1 such that for any compact subset K ⊂ S,
          (i)

                                                   sup Hs           V   < ∞,                                 (5)
                                                   s∈K
                                           sup ( gs      V   + Πs gs       V)   < ∞,                         (6)
                                            s∈K
                                           −a
                         sup        s −s     { gs − gs         Vq   + Πs gs − Πs gs          Vq}   < ∞,      (7)
                       s,s ∈K

                where for anys ∈ S a solution of the Poisson equation
                g − Πs g = Hs − πs (Hs ) is denoted by gs .



   St´phanie Allassonni`re (CMAP)
     e                 e                            AMALA                              BigMC, January 2012   21 / 42
Applying AMALA within SAEM   Convergence properties


Conditions for the SA to converge (4)


        (ii) For any sequence ε = (εk )k≥0 satisfying εk < ¯ for an ¯ sufficiently
             small, for any sequence γ = (γk )k≥0 , there exist a constant C such
             that and for any z ∈ X ,

                               sup sup Eγ V p (zk )1σ(K)∧ν(ε)≥k ≤ C V p+r (z) ,
                                        z,s                                                            (8)
                               s∈K k≥0

               where ν(ε) = inf{k ≥ 1, sk − sk−1 ≥ εk } and
               σ(K) = inf{k ≥ 1, sk ∈ K} and the expectation is related to the
                                    /
               non-homogeneous Markov chain ((zk , sk ))k≥0 using the step-size
               sequence γ = (γk )k≥0 .
(A4) The sequences γ = (γk )k≥0 and ε = (εk )k≥0 are non-increasing,
                                       ∞
       positive and satisfy:                γk = ∞, lim εk = 0 and
                                      k=0               k→∞
        ∞
            {γk + γk εa + (γk ε−1 )p } < ∞, where a and p are defined in (A3”).
              2
                      k        k
       k=1


  St´phanie Allassonni`re (CMAP)
    e                 e                            AMALA                         BigMC, January 2012   22 / 42
Applying AMALA within SAEM   Convergence properties


Condition for AMALA-SAEM to converge



      (M1) The parameter space Θ is an open subset of Rp . The complete
      data likelihood function is given by:

                            f (y , z; θ) = exp {−ψ(θ) + S(z), φ(θ) } ,

      where S is a Borel function on Rl taking its values in an open subset
      S of Rm . Moreover, the convex hull of S(Rl ) is included in S, and,
      for all θ in Θ,
                                         ||S(z)||pθ (z)µ(dz) < ∞.

      (M2) The functions ψ and φ are twice continuously differentiable on
      Θ.



 St´phanie Allassonni`re (CMAP)
   e                 e                            AMALA                         BigMC, January 2012   23 / 42
Applying AMALA within SAEM    Convergence properties


Condition for AMALA-SAEM to converge (2)


      (M3) The function ¯ : Θ → S defined as
                        s

                                      ¯(θ)
                                      s                S(z)pθ (z)µ(dz)

      is continuously differentiable on Θ.
      (M4) The function l : Θ → R defined as the observed-data
      log-likelihood

                            l(θ)      log g (y ; θ) = log          f (y , z; θ)µ(dz)

      is continuously differentiable on Θ and

                          ∂θ       f (y , z; θ)µ(dz) =          ∂θ f (y , z; θ)µ(dz).


 St´phanie Allassonni`re (CMAP)
   e                 e                            AMALA                          BigMC, January 2012   24 / 42
Applying AMALA within SAEM   Convergence properties


Condition for AMALA-SAEM to converge (3)


                                   ˆ
      (M5) There exists a function θ : S → Θ, such that:
                                                   ˆ
                              ∀s ∈ S, ∀θ ∈ Θ, L(s; θ(s)) ≥ L(s; θ).
                             ˆ
      Moreover, the function θ is continuously differentiable on S.
                                          ˆ
      (M6) The functions l : Θ → R and θ : S → Θ are m times
      differentiable.

      (M7)
        (i) There exists an M0 > 0 such that

                                         ˆ                      ˆ
                             s ∈ S, ∂s l(θ(s)) = 0 ⊂ {s ∈ S, −l(θ(s)) < M0 } .

                                            ¯                    ˆ
       (ii) For all M1 > M0 , the set Conv (S(Rl )) ∩ {s ∈ S, −l(θ(s)) ≤ M1 } is a
            compact set of S.

 St´phanie Allassonni`re (CMAP)
   e                 e                            AMALA                         BigMC, January 2012   25 / 42
Applying AMALA within SAEM   Convergence properties


Condition for AMALA-SAEM to converge (4)




      (M8) There exists a polynomial function P of degree 2 such that for
      all z ∈ X
                               ||S(z)|| ≤ |P(z)| .
      (B3) For any compact subset K of S, there exists a polynomial
      function Q of the hidden variable such that

                                    sup |    z   log pθ(s) (z)| ≤ |Q(z)|
                                                      ˆ
                                    s∈K
      .




 St´phanie Allassonni`re (CMAP)
   e                 e                             AMALA                        BigMC, January 2012   26 / 42
Application on Bayesian Mixed effect template estimation


Outline:



               1. AMALA: simulation of random variables in high dimension
                     Anisotropic MALA description
                     Convergence property
               2. AMALA within stochastic algorithm for parameter estimation

                            Maximum likelihood estimation for incomplete data
                            setting
                            AMALA-SAEM
                            Convergence properties
               3. Experiments
                      BME-Template model: small deformation setting
                      BME-Template model: LDDMM setting


 St´phanie Allassonni`re (CMAP)
   e                 e                                  AMALA   BigMC, January 2012   27 / 42
Application on Bayesian Mixed effect template estimation   Description of the BME Template model


BME Template model with small deformations

      Deformable template model: (u = voxel, vu its position)

                                  y (u) = I0 (vu − m(vu )) + σ (u) ,




 St´phanie Allassonni`re (CMAP)
   e                 e                                  AMALA                          BigMC, January 2012   28 / 42
Application on Bayesian Mixed effect template estimation   Description of the BME Template model


BME Template model with small deformations

      Deformable template model: (u = voxel, vu its position)

                                  y (u) = I0 (vu − m(vu )) + σ (u) ,

      Parametric template and deformation:
                                          kp
      Iα (v ) = (Kp α)(v ) =                   Kp (v , rp,k )αj and
                                         j=1
                                          kg
      mz (v ) = (Kg z(v ) =                    Kg (v , rg ,k )z j .
                                        j=1




 St´phanie Allassonni`re (CMAP)
   e                 e                                  AMALA                          BigMC, January 2012   28 / 42
Application on Bayesian Mixed effect template estimation   Description of the BME Template model


BME Template model with small deformations

      Deformable template model: (u = voxel, vu its position)

                                  y (u) = I0 (vu − m(vu )) + σ (u) ,

      Parametric template and deformation:
                                          kp
      Iα (v ) = (Kp α)(v ) =                   Kp (v , rp,k )αj and
                                         j=1
                                          kg
      mz (v ) = (Kg z(v ) =                    Kg (v , rg ,k )z j .
                                        j=1
      Generative model:
                    z ∼ ⊗n N2kg (0, Γg ) | Γg ,
                   
                          i=1

                                   y ∼ ⊗n N|Λ| (mzi Iα , σ 2 Id) | z, α, σ 2 ,
                              
                                        i=1



 St´phanie Allassonni`re (CMAP)
   e                 e                                  AMALA                          BigMC, January 2012   28 / 42
Application on Bayesian Mixed effect template estimation   Description of the BME Template model


BME Template model with small deformations

      Deformable template model: (u = voxel, vu its position)

                                  y (u) = I0 (vu − m(vu )) + σ (u) ,

      Parametric template and deformation:
                                          kp
      Iα (v ) = (Kp α)(v ) =                   Kp (v , rp,k )αj and
                                         j=1
                                          kg
      mz (v ) = (Kg z(v ) =                    Kg (v , rg ,k )z j .
                                        j=1
      Generative model:
                    z ∼ ⊗n N2kg (0, Γg ) | Γg ,
                   
                          i=1

                                   y ∼ ⊗n N|Λ| (mzi Iα , σ 2 Id) | z, α, σ 2 ,
                              
                                        i=1

      Bayesian framework → MAP estimator (= penalised MLE)
 St´phanie Allassonni`re (CMAP)
   e                 e                                  AMALA                          BigMC, January 2012   28 / 42
Application on Bayesian Mixed effect template estimation   Results on the template estimation


Training sets




Figure: Left: Training set (inverse video). Right: Noisy training set (inverse
video).




 St´phanie Allassonni`re (CMAP)
   e                 e                                  AMALA                            BigMC, January 2012   29 / 42
Application on Bayesian Mixed effect template estimation   Results on the template estimation


Estimated templates


       Algorithm/                        FAM-EM                     H.G.-SAEM                     AMALA-SAEM
       Noise level


        No Noise



         Noisy
     of Variance 1


Figure: Estimated templates using different algorithms and two level of noise.
The training set includes 20 images per digit.

 St´phanie Allassonni`re (CMAP)
   e                 e                                  AMALA                            BigMC, January 2012   30 / 42
Application on Bayesian Mixed effect template estimation   Results on the covariance matrix estimation


Estimated geometric variability




Figure: Synthetic samples generated with respect to the BME template model.
 St´phanie Allassonni`re (CMAP)
   e                 e                                  AMALA                            BigMC, January 2012   31 / 42
Application on Bayesian Mixed effect template estimation   CLT empirical proof


Empirical proof of the CLT




Figure: Evolution of the estimation of the noise variance along the SAEM
iterations. Left: original data. Right: Noisy training set.




 St´phanie Allassonni`re (CMAP)
   e                 e                                  AMALA                      BigMC, January 2012   32 / 42
Application on Bayesian Mixed effect template estimation   CLT empirical proof




Figure: Evolution of the estimation of the noise variance along the SAEM
iterations. Test of convergence towards the Gaussian distribution of the estimated
parameters.


 St´phanie Allassonni`re (CMAP)
   e                 e                                  AMALA                      BigMC, January 2012   33 / 42
Application on Bayesian Mixed effect template estimation   Medical image template estimation


Corpus callosum data base




Figure: Medical image template estimation: 10 Corpus callosum and splenium
training images among the 47 available.




Figure: Grey level mean. FAM-EM estimated template. Hybrid Gibbs - SAEM
estimated template.AMALA-SAEM estimation .

 St´phanie Allassonni`re (CMAP)
   e                 e                                  AMALA                           BigMC, January 2012   34 / 42
Application on Bayesian Mixed effect template estimation   Results on the template estimation using LDDMM shooting


BME Template model with LDDMM


      Deformable template model: (u = voxel, vu its position)

                                   y (u) = I0 (φ−1 (vu )) + σ (u) ,
                                                β(0)

                                                                            kp
      Parametric template: Iα (v ) = (Kp α)(v ) =                                Kp (v , rp,k )αj and φ
                                                                           j=1
      LDDMM solution of shooting with initial momentum β(0).
      Generative model:

                   z ∼ ⊗n N2kg (0, Γg ) | Γg ,
                  
                         i=1

                                  y ∼ ⊗n N|Λ| (φβ(0) Iα , σ 2 Id) | z, α, σ 2 ,
                            
                                       i=1

      Bayesian framework → MAP estimator (= penalised MLE)

 St´phanie Allassonni`re (CMAP)
   e                 e                                  AMALA                          BigMC, January 2012       35 / 42
Application on Bayesian Mixed effect template estimation   Results on the template estimation using LDDMM shooting


LDDMM: parametric deformation:



      Fix some control points: c(t) = (c1 (t), ..., cng (t))
      Choose a kernel Kg
      Start from an initial momentum β(0) = β 1 (0), ..., β ng (0)
Then, Hamiltonian System → Time evolution of both momenta and
control points

             dc = ∂H (c, β) = K (c(t))β(t)
            
            
             dt                  g
                      ∂β
                                                                                                                 (9)
                   dβ
                  
                                           ∂H            1
                                  = −         (c, β) = −               c(t) K (β(t), β(t))
                  
                  
                    dt                     ∂c            2



 St´phanie Allassonni`re (CMAP)
   e                 e                                  AMALA                          BigMC, January 2012       36 / 42
Application on Bayesian Mixed effect template estimation   Results on the template estimation using LDDMM shooting


LDDMM: parametric deformation (2):


Interpolating on any point of the domain:
                                                    ng
            vt (r ) = (Kg β(t))(r ) =                    Kg (r , ck (t))β k (t) ∀r ∈ D                         (10)
                                                  k=1

Deformation = solution of the flow equation:
                    
                    
                    
                       ∂φβ(0) (t)
                    
                                  = vt ◦ φβ(0) (t)                                                             (11)
                     φ ∂t
                    
                    
                        0         = Id .

                                              φβ(0) = φβ(0) (1)



 St´phanie Allassonni`re (CMAP)
   e                 e                                   AMALA                         BigMC, January 2012       37 / 42
Application on Bayesian Mixed effect template estimation   Results on the template estimation using LDDMM shooting


Gradient computation

E (ci , βi ) =        k   (I0 (φ−1 (yk )) − I (yk ))2 +σ 2
                                1                                                Reg(φ1 )




 St´phanie Allassonni`re (CMAP)
   e                 e                                  AMALA                          BigMC, January 2012       38 / 42
Application on Bayesian Mixed effect template estimation   Results on the template estimation using LDDMM shooting


Gradient computation

E (ci , βi ) =        k   (I0 (φ−1 (yk )) − I (yk ))2 +σ 2
                                1                                                Reg(φ1 )
     S0
                                             
                                              S0 = {(ci , βi )}i
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             




 St´phanie Allassonni`re (CMAP)
   e                 e                                  AMALA                          BigMC, January 2012       38 / 42
Application on Bayesian Mixed effect template estimation   Results on the template estimation using LDDMM shooting


Gradient computation

E (ci , βi ) =        k   (I0 (φ−1 (yk )) − I (yk ))2 +σ 2
                                1                                                Reg(φ1 )
     S0
                                             
                                              S0 = {(ci , βi )}i
                                             
                                             
                                              dS(t)
                                             
                                                     = F (S(t))                          S(0) = S0
                                              dt
                                             
                                             
                                             
                                             




 St´phanie Allassonni`re (CMAP)
   e                 e                                  AMALA                          BigMC, January 2012       38 / 42
Application on Bayesian Mixed effect template estimation   Results on the template estimation using LDDMM shooting


Gradient computation

E (ci , βi ) =        k   (I0 (φ−1 (yk )) − I (yk ))2 +σ 2
                                1                                                Reg(φ1 )
     S0
                                             
                                              S0 = {(ci , βi )}i
                                             
                                             
                                              dS(t)
                                             
                                                      = F (S(t))        S(0) = S0
                                              dt
                                              dy (t)
                                             
                                                      = G (S(t), y (t)) y (1) = y
                                             
                                             
                                                 dt




 St´phanie Allassonni`re (CMAP)
   e                 e                                  AMALA                          BigMC, January 2012       38 / 42
Application on Bayesian Mixed effect template estimation   Results on the template estimation using LDDMM shooting


Gradient computation

E (ci , βi ) =        k   (I0 (φ−1 (yk )) − I (yk ))2 +σ 2
                                1                                                Reg(φ1 )
     S0                               A(yk (0))
                                             
                                              S0 = {(ci , βi )}i
                                             
                                             
                                              dS(t)
                                             
                                                      = F (S(t))        S(0) = S0
                                              dt
                                              dy (t)
                                             
                                                      = G (S(t), y (t)) y (1) = y
                                             
                                             
                                                 dt




 St´phanie Allassonni`re (CMAP)
   e                 e                                  AMALA                          BigMC, January 2012       38 / 42
Application on Bayesian Mixed effect template estimation   Results on the template estimation using LDDMM shooting


Gradient computation

E (ci , βi ) =        k   (I0 (φ−1 (yk )) − I (yk ))2 +σ 2
                                1                                                Reg(φ1 )
     S0                               A(yk (0))                      L(S0 )=β(0)t Γg (q(0),q(0))β(0)
                                             
                                              S0 = {(ci , βi )}i
                                             
                                             
                                              dS(t)
                                             
                                                      = F (S(t))        S(0) = S0
                                              dt
                                              dy (t)
                                             
                                                      = G (S(t), y (t)) y (1) = y
                                             
                                             
                                                 dt




 St´phanie Allassonni`re (CMAP)
   e                 e                                  AMALA                          BigMC, January 2012       38 / 42
Application on Bayesian Mixed effect template estimation   Results on the template estimation using LDDMM shooting


Gradient computation

E (ci , βi ) =        k   (I0 (φ−1 (yk )) − I (yk ))2 +σ 2
                                1                                                Reg(φ1 )
     S0                               A(yk (0))                      L(S0 )=β(0)t Γg (q(0),q(0))β(0)
                                            
                                             S0 = {(ci , βi )}i
                                            
                                            
                                             dS(t)
                                            
                                                      = F (S(t))          S(0) = S0
                                             dt
                                             dy (t)
                                            
                                                      = G (S(t), y (t)) y (1) = y
                                            
                                            
                                                dt
                                                       T
                                       S0 E = dS0 y (0)    y (0) A + S0 L




 St´phanie Allassonni`re (CMAP)
   e                 e                                  AMALA                          BigMC, January 2012       38 / 42
Application on Bayesian Mixed effect template estimation   Results on the template estimation using LDDMM shooting


Gradient computation

E (ci , βi ) =        k   (I0 (φ−1 (yk )) − I (yk ))2 +σ 2
                                1                                                Reg(φ1 )
     S0                               A(yk (0))                      L(S0 )=β(0)t Γg (q(0),q(0))β(0)
                                            
                                             S0 = {(ci , βi )}i
                                            
                                            
                                             dS(t)
                                            
                                                      = F (S(t))          S(0) = S0
                                             dt
                                             dy (t)
                                            
                                                      = G (S(t), y (t)) y (1) = y
                                            
                                            
                                                dt
                                                       T
                                       S0 E = dS0 y (0)    y (0) A + S0 L

                                       yk (0) A   = 2 (I0 (yk (0)) − I (yk (1)))              yk (0) I0




 St´phanie Allassonni`re (CMAP)
   e                 e                                  AMALA                          BigMC, January 2012       38 / 42
Application on Bayesian Mixed effect template estimation   Results on the template estimation using LDDMM shooting


Gradient computation

E (ci , βi ) =        k   (I0 (φ−1 (yk )) − I (yk ))2 +σ 2
                                1                                                Reg(φ1 )
     S0                               A(yk (0))                      L(S0 )=β(0)t Γg (q(0),q(0))β(0)
                                            
                                             S0 = {(ci , βi )}i
                                            
                                            
                                             dS(t)
                                            
                                                      = F (S(t))          S(0) = S0
                                             dt
                                             dy (t)
                                            
                                                      = G (S(t), y (t)) y (1) = y
                                            
                                            
                                                dt
                                                       T
                                       S0 E = dS0 y (0)    y (0) A + S0 L

                                       yk (0) A   = 2 (I0 (yk (0)) − I (yk (1)))              yk (0) I0
                                   - Momenta decrease image discrepancy
                                   - Control Points attracted by image contours




 St´phanie Allassonni`re (CMAP)
   e                 e                                  AMALA                          BigMC, January 2012       38 / 42
Application on Bayesian Mixed effect template estimation   Results on the template estimation using LDDMM shooting


Gradient computation

E (ci , βi ) =        k   (I0 (φ−1 (yk )) − I (yk ))2 +σ 2
                                1                                                Reg(φ1 )
     S0                               A(yk (0))                      L(S0 )=β(0)t Γg (q(0),q(0))β(0)
                                          
                                           S0 = {(ci , βi )}i
                                          
                                          
                                           dS(t)
                                          
                                                   = F (S(t))        S(0) = S0
                                           dt
                                           dy (t)
                                          
                                                   = G (S(t), y (t)) y (1) = y
                                          
                                          
                                               dt
                                        dη(t)
                                              = ∂S(t) G T η(t), η(0) = y (0) A
                                         dt




 St´phanie Allassonni`re (CMAP)
   e                 e                                  AMALA                          BigMC, January 2012       38 / 42
Application on Bayesian Mixed effect template estimation   Results on the template estimation using LDDMM shooting


Gradient computation

E (ci , βi ) =        k   (I0 (φ−1 (yk )) − I (yk ))2 +σ 2
                                1                                                Reg(φ1 )
     S0                               A(yk (0))                      L(S0 )=β(0)t Γg (q(0),q(0))β(0)
                                           
                                            S0 = {(ci , βi )}i
                                           
                                           
                                            dS(t)
                                           
                                                     = F (S(t))          S(0) = S0
                                            dt
                                            dy (t)
                                           
                                                    = G (S(t), y (t)) y (1) = y
                                           
                                           
                                               dt
                                        dη(t)
                                              = ∂S(t) G T η(t), η(0) = y (0) A
                                         dt
                                        dξ(t)
                                              = ∂y (t) G T η(t) − dF T ξ(t), ξ(1) = 0
                                         dt




 St´phanie Allassonni`re (CMAP)
   e                 e                                  AMALA                          BigMC, January 2012       38 / 42
Application on Bayesian Mixed effect template estimation   Results on the template estimation using LDDMM shooting


Gradient computation

E (ci , βi ) =        k   (I0 (φ−1 (yk )) − I (yk ))2 +σ 2
                                1                                                Reg(φ1 )
     S0                               A(yk (0))                      L(S0 )=β(0)t Γg (q(0),q(0))β(0)
                                           
                                            S0 = {(ci , βi )}i
                                           
                                           
                                            dS(t)
                                           
                                                     = F (S(t))          S(0) = S0
                                            dt
                                            dy (t)
                                           
                                                    = G (S(t), y (t)) y (1) = y
                                           
                                           
                                               dt
                                        dη(t)
                                              = ∂S(t) G T η(t), η(0) = y (0) A
                                         dt
                                        dξ(t)
                                              = ∂y (t) G T η(t) − dF T ξ(t), ξ(1) = 0
                                         dt

                                       S0 E   = ξ(0) +          S0 L




 St´phanie Allassonni`re (CMAP)
   e                 e                                  AMALA                          BigMC, January 2012       38 / 42
Application on Bayesian Mixed effect template estimation   Results on the template estimation using LDDMM shooting


Using LDDMM deformations via shooting (preliminary results)




     AMALA :                                                  GH :




 St´phanie Allassonni`re (CMAP)
   e                 e                                  AMALA                          BigMC, January 2012       39 / 42
Conclusion


Conclusion


      Good performances (as accurate as other algorithms)


      Reduce computational time


      Can handle the movement of control points in practice (theory to
      confirm)


      Can handle sparsity of the template (       model selection)


      Removing control points ? In practice, why not... theory ?




 St´phanie Allassonni`re (CMAP)
   e                 e                    AMALA             BigMC, January 2012   40 / 42
Conclusion


Conclusion


      Good performances (as accurate as other algorithms)


      Reduce computational time


      Can handle the movement of control points in practice (theory to
      confirm)


      Can handle sparsity of the template (       model selection)


      Removing control points ? In practice, why not... theory ?

                                    Thank you !


 St´phanie Allassonni`re (CMAP)
   e                 e                    AMALA             BigMC, January 2012   40 / 42
Conclusion




St´phanie Allassonni`re (CMAP)
  e                 e                    AMALA   BigMC, January 2012   41 / 42
Conclusion




St´phanie Allassonni`re (CMAP)
  e                 e                    AMALA   BigMC, January 2012   42 / 42

Contenu connexe

Plus de BigMC

Hedibert Lopes' talk at BigMC
Hedibert Lopes' talk at  BigMCHedibert Lopes' talk at  BigMC
Hedibert Lopes' talk at BigMCBigMC
 
Andreas Eberle
Andreas EberleAndreas Eberle
Andreas EberleBigMC
 
Olivier Féron's talk at BigMC March 2011
Olivier Féron's talk at BigMC March 2011Olivier Féron's talk at BigMC March 2011
Olivier Féron's talk at BigMC March 2011BigMC
 
Olivier Cappé's talk at BigMC March 2011
Olivier Cappé's talk at BigMC March 2011Olivier Cappé's talk at BigMC March 2011
Olivier Cappé's talk at BigMC March 2011BigMC
 
Estimation de copules, une approche bayésienne
Estimation de copules, une approche bayésienneEstimation de copules, une approche bayésienne
Estimation de copules, une approche bayésienneBigMC
 
Comparing estimation algorithms for block clustering models
Comparing estimation algorithms for block clustering modelsComparing estimation algorithms for block clustering models
Comparing estimation algorithms for block clustering modelsBigMC
 
Computation of the marginal likelihood
Computation of the marginal likelihoodComputation of the marginal likelihood
Computation of the marginal likelihoodBigMC
 
Learning spline-based curve models (Laure Amate)
Learning spline-based curve models (Laure Amate)Learning spline-based curve models (Laure Amate)
Learning spline-based curve models (Laure Amate)BigMC
 
Omiros' talk on the Bernoulli factory problem
Omiros' talk on the  Bernoulli factory problemOmiros' talk on the  Bernoulli factory problem
Omiros' talk on the Bernoulli factory problemBigMC
 

Plus de BigMC (9)

Hedibert Lopes' talk at BigMC
Hedibert Lopes' talk at  BigMCHedibert Lopes' talk at  BigMC
Hedibert Lopes' talk at BigMC
 
Andreas Eberle
Andreas EberleAndreas Eberle
Andreas Eberle
 
Olivier Féron's talk at BigMC March 2011
Olivier Féron's talk at BigMC March 2011Olivier Féron's talk at BigMC March 2011
Olivier Féron's talk at BigMC March 2011
 
Olivier Cappé's talk at BigMC March 2011
Olivier Cappé's talk at BigMC March 2011Olivier Cappé's talk at BigMC March 2011
Olivier Cappé's talk at BigMC March 2011
 
Estimation de copules, une approche bayésienne
Estimation de copules, une approche bayésienneEstimation de copules, une approche bayésienne
Estimation de copules, une approche bayésienne
 
Comparing estimation algorithms for block clustering models
Comparing estimation algorithms for block clustering modelsComparing estimation algorithms for block clustering models
Comparing estimation algorithms for block clustering models
 
Computation of the marginal likelihood
Computation of the marginal likelihoodComputation of the marginal likelihood
Computation of the marginal likelihood
 
Learning spline-based curve models (Laure Amate)
Learning spline-based curve models (Laure Amate)Learning spline-based curve models (Laure Amate)
Learning spline-based curve models (Laure Amate)
 
Omiros' talk on the Bernoulli factory problem
Omiros' talk on the  Bernoulli factory problemOmiros' talk on the  Bernoulli factory problem
Omiros' talk on the Bernoulli factory problem
 

Anisotropic Metropolis Adjusted Langevin Algorithm: convergence and utility in Stochastic EM algorithm

  • 1. Anisotropic Metropolis adjusted Langevin algorithm: Convergence and utility in stochastic EM algorithm. ´ ` Stephanie Allassonniere ´ CMAP, Ecole Polytechnique BigMC, January 2012 Join work with Estelle Kuhn (INRA, France) St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 1 / 42
  • 2. Introduction Introduction: Where does the problem came from? Image analysis: Compare two observations via the quantification of the deformation from one to the other (D’Arcy Thompson, 1917) Each element of a population is a smooth deformation of a template St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 2 / 42
  • 3. Introduction Introduction: Where does the problem came from? Image analysis: Compare two observations via the quantification of the deformation from one to the other (D’Arcy Thompson, 1917) Registration Each element of a population is a smooth deformation of a template St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 2 / 42
  • 4. Introduction Introduction: Where does the problem came from? Image analysis: Compare two observations via the quantification of the deformation from one to the other (D’Arcy Thompson, 1917) Registration Each element of a population is a smooth deformation of a template Template estimation St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 2 / 42
  • 5. Introduction Introduction: Where does the problem came from? Image analysis: Compare two observations via the quantification of the deformation from one to the other (D’Arcy Thompson, 1917) Registration Each element of a population is a smooth deformation of a template Template estimation / Mean St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 2 / 42
  • 6. Introduction Introduction: Where does the problem came from? Image analysis: Compare two observations via the quantification of the deformation from one to the other (D’Arcy Thompson, 1917) Registration / Variance Each element of a population is a smooth deformation of a template Template estimation / Mean St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 2 / 42
  • 7. Introduction Introduction: Where does the problem came from? St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 3 / 42
  • 8. Introduction Introduction: Where does the problem came from? Deformable template model: (u = voxel, vu its position) y (u) = I0 (vu − m(vu )) + σ (u) , St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 3 / 42
  • 9. Introduction Introduction: Where does the problem came from? Deformable template model: (u = voxel, vu its position) y (u) = I0 (vu − m(vu )) + σ (u) , Template I0 and geometry Law (m) estimation St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 3 / 42
  • 10. Introduction Introduction: Where does the problem came from? Deformable template model: (u = voxel, vu its position) y (u) = I0 (vu − m(vu )) + σ (u) , Template I0 and geometry Law (m) estimation High dimensional setting, Low sample size St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 3 / 42
  • 11. Introduction Introduction: Where does the problem came from? Deformable template model: (u = voxel, vu its position) y (u) = I0 (vu − m(vu )) + σ (u) , Template I0 and geometry Law (m) estimation High dimensional setting, Low sample size Considering the LDDMM framework through the shooting equations St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 3 / 42
  • 12. Introduction Outline: 1. AMALA: simulation of random variables in high dimension Anisotropic MALA description Convergence property 2. AMALA within stochastic algorithm for parameter estimation Maximum likelihood estimation for incomplete data setting AMALA-SAEM Convergence properties 3. Experiments BME-Template model: small deformation setting BME-Template model: LDDMM setting St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 4 / 42
  • 13. Introduction Outline: 1. AMALA: simulation of random variables in high dimension Anisotropic MALA description Convergence property 2. AMALA within stochastic algorithm for parameter estimation Maximum likelihood estimation for incomplete data setting AMALA-SAEM Convergence properties 3. Experiments BME-Template model: small deformation setting BME-Template model: LDDMM setting St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 5 / 42
  • 14. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Introduction: General setting: St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 6 / 42
  • 15. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Introduction: General setting: Simulation of random variable in high dimension settings: → Gibbs Sampler not useful St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 6 / 42
  • 16. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Introduction: General setting: Simulation of random variable in high dimension settings: → Gibbs Sampler not useful Metropolis Adjusted Langevin Algorithm (MALA) St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 6 / 42
  • 17. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Introduction: General setting: Simulation of random variable in high dimension settings: → Gibbs Sampler not useful Metropolis Adjusted Langevin Algorithm (MALA) Target distribution: π St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 6 / 42
  • 18. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Introduction: General setting: Simulation of random variable in high dimension settings: → Gibbs Sampler not useful Metropolis Adjusted Langevin Algorithm (MALA) Target distribution: π At iteration k of this algorithm, Xk the current value Simulate Xc w.r.t. N (Xk + δD(Xk ), δIdd ) where D(x) = max(b,| blog π(x)|) log π(x). Update Xk+1 = Xc with probability α(Xk , Xc ) = min 1, qMALA (Xk ,Xc(Xc ,Xk ) and Xk+1 = Xk otherwise. π(Xc )qMALA )π(Xk ) St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 6 / 42
  • 19. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Introduction: General setting: Simulation of random variable in high dimension settings: → Gibbs Sampler not useful Metropolis Adjusted Langevin Algorithm (MALA) Target distribution: π At iteration k of this algorithm, Xk the current value Simulate Xc w.r.t. N (Xk + δD(Xk ), δIdd ) where D(x) = max(b,| blog π(x)|) log π(x). Update Xk+1 = Xc with probability α(Xk , Xc ) = min 1, qMALA (Xk ,Xc(Xc ,Xk ) and Xk+1 = Xk otherwise. π(Xc )qMALA )π(Xk ) Problem: isotropic covariance matrix = numerically trapped (α(Xk , Xc ) = 0) St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 6 / 42
  • 20. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Introduction: General setting: Simulation of random variable in high dimension settings: → Gibbs Sampler not useful Metropolis Adjusted Langevin Algorithm (MALA) Target distribution: π At iteration k of this algorithm, Xk the current value Simulate Xc w.r.t. N (Xk + δD(Xk ), δIdd ) where D(x) = max(b,| blog π(x)|) log π(x). Update Xk+1 = Xc with probability α(Xk , Xc ) = min 1, qMALA (Xk ,Xc(Xc ,Xk ) and Xk+1 = Xk otherwise. π(Xc )qMALA )π(Xk ) Problem: isotropic covariance matrix = numerically trapped (α(Xk , Xc ) = 0) → Anisotropic Metropolis Adjusted Langevin Algorithm (AMALA) St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 6 / 42
  • 21. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Description of the algorithm How including anisotropy? Following the magnitude of the gradient St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 7 / 42
  • 22. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Description of the algorithm How including anisotropy? Following the magnitude of the gradient First approximation: independence of directions St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 7 / 42
  • 23. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Description of the algorithm How including anisotropy? Following the magnitude of the gradient First approximation: independence of directions Bounded covariance (same as bounded drift) St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 7 / 42
  • 24. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Description of the algorithm Anisotropic Metropolis Adjusted Langevin Algorithm (AMALA) For all k = 1 : kend Iterates of Markov chain St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 8 / 42
  • 25. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Description of the algorithm Anisotropic Metropolis Adjusted Langevin Algorithm (AMALA) For all k = 1 : kend Iterates of Markov chain Sample Xc with respect to N (Xk + δD(Xk ), δΣ(Xk )) b with D(Xk ) = max(b,| log π(Xk )|) log π(Xk ) and Σ(Xk ) = Idd + diag ([ log π(Xk )]2 ∧ b), ... , ([ log π(Xk )]2 ∧ b) 1 d St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 8 / 42
  • 26. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Description of the algorithm Anisotropic Metropolis Adjusted Langevin Algorithm (AMALA) For all k = 1 : kend Iterates of Markov chain Sample Xc with respect to N (Xk + δD(Xk ), δΣ(Xk )) b with D(Xk ) = max(b,| log π(Xk )|) log π(Xk ) and Σ(Xk ) = Idd + diag ([ log π(Xk )]2 ∧ b), ... , ([ log π(Xk )]2 ∧ b) 1 d Compute the acceptance ratio π(Xc )qc (Xc , Xk ) α(Xk , Xc ) = min 1, qc (Xk , Xc )π(Xk ) (qc = the pdf of this distribution). St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 8 / 42
  • 27. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Description of the algorithm Anisotropic Metropolis Adjusted Langevin Algorithm (AMALA) For all k = 1 : kend Iterates of Markov chain Sample Xc with respect to N (Xk + δD(Xk ), δΣ(Xk )) b with D(Xk ) = max(b,| log π(Xk )|) log π(Xk ) and Σ(Xk ) = Idd + diag ([ log π(Xk )]2 ∧ b), ... , ([ log π(Xk )]2 ∧ b) 1 d Compute the acceptance ratio π(Xc )qc (Xc , Xk ) α(Xk , Xc ) = min 1, qc (Xk , Xc )π(Xk ) (qc = the pdf of this distribution). Sample Xk+1 = Xc with probability α(Xk , Xc ) and Xk+1 = Xk with probability 1 − α(Xk , Xc ) = Acceptation/reject St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 8 / 42
  • 28. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chain Geometric ergodicity of the Markov chain Condition: π super-exponential: Smoothness condition on the target distribution (B1) The density π is positive with continuous first derivative such that: lim n(x). log π(x) = −∞ (1) |x|→∞ and lim sup n(x).m(x) < 0 (2) |x|→∞ x where is the gradient operator in Rd , n(x) = |x| is the unit vector π(x) pointing in the direction of x and m(x) = is the unit vector in | π(x)| the direction of the gradient of the stationary distribution at point x. St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 9 / 42
  • 29. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chain Geometric ergodicity of the Markov chain Result: St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 10 / 42
  • 30. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chain Geometric ergodicity of the Markov chain Result: Existence of a small set Π(x, A) ≥ εν(A)1C (x), ∀x ∈ X and ∀A ∈ B St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 10 / 42
  • 31. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chain Geometric ergodicity of the Markov chain Result: Existence of a small set Π(x, A) ≥ εν(A)1C (x), ∀x ∈ X and ∀A ∈ B Drift condition: pulls the chain back into the small set ΠV (x) ≤ λV (x) + b1C (x) . St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 10 / 42
  • 32. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chain Geometric ergodicity of the Markov chain Result: Existence of a small set Π(x, A) ≥ εν(A)1C (x), ∀x ∈ X and ∀A ∈ B Drift condition: pulls the chain back into the small set ΠV (x) ≤ λV (x) + b1C (x) . Geometric ergodicity |Πn V (x) − π(x)| sup ≤ Rρn . (3) x∈X V (x) St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 10 / 42
  • 33. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chain Experiments on synthetic data Target: 10 dimensional Gaussian distribution with zero mean and diagonal covariance matrix with diagonal coefficients randomly picked between 1 and 2500 Comparison of AMALA and symmetric random walk 500, 000 iterations for each algorithm starting at zero Mean squared jump distance (MSJD) in stationarity: AMALA 0.1504 - random walk 0.0407. St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 11 / 42
  • 34. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chain Experiments on synthetic data Figure: Autocorrelation functions of the AMALA (red) and the random walk (blue) samplers for four of the ten components of the Gaussian 10 dimensional distribution. St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 12 / 42
  • 35. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chain Why not using exising MALA-like algorithms? Optimised MALA-like algorithms are usually adaptive Good performances in practice Good theoretical properties St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 13 / 42
  • 36. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chain Why not using exising MALA-like algorithms? Optimised MALA-like algorithms are usually adaptive Good performances in practice Good theoretical properties However St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 13 / 42
  • 37. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chain Why not using exising MALA-like algorithms? Optimised MALA-like algorithms are usually adaptive Good performances in practice Good theoretical properties However Numerical problem at the first iterations (not yet stationary): convergence time? St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 13 / 42
  • 38. Anisotropic Metropolis Adjusted Langevin algorithm (AMALA) Geometric ergodicity of the chain Why not using exising MALA-like algorithms? Optimised MALA-like algorithms are usually adaptive Good performances in practice Good theoretical properties However Numerical problem at the first iterations (not yet stationary): convergence time? Most important: Our goal = parameter estimation AMALA = one tool inside another algorithm Adaptive + estimation algorithm = numerical issues: too many degree of freedom St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 13 / 42
  • 39. Applying AMALA within SAEM Outline: 1. AMALA: simulation of random variables in high dimension Anisotropic MALA description Convergence property 2. AMALA within stochastic algorithm for parameter estimation Maximum likelihood estimation for incomplete data setting AMALA-SAEM Convergence properties 3. Experiments BME-Template model: small deformation setting BME-Template model: LDDMM setting St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 14 / 42
  • 40. Applying AMALA within SAEM Maximum likelihood estimation for incomplete data setting Maximum likelihood estimation for incomplete data setting y ∈ Rn : observed data St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 15 / 42
  • 41. Applying AMALA within SAEM Maximum likelihood estimation for incomplete data setting Maximum likelihood estimation for incomplete data setting y ∈ Rn : observed data z ∈ Rl : missing data St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 15 / 42
  • 42. Applying AMALA within SAEM Maximum likelihood estimation for incomplete data setting Maximum likelihood estimation for incomplete data setting y ∈ Rn : observed data z ∈ Rl : missing data (y , z) ∈ Rn+l : complete data St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 15 / 42
  • 43. Applying AMALA within SAEM Maximum likelihood estimation for incomplete data setting Maximum likelihood estimation for incomplete data setting y ∈ Rn : observed data z ∈ Rl : missing data (y , z) ∈ Rn+l : complete data P = {f (y , z; θ), θ ∈ Θ}: family of parametric pdfs on Rn+l St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 15 / 42
  • 44. Applying AMALA within SAEM Maximum likelihood estimation for incomplete data setting Maximum likelihood estimation for incomplete data setting y ∈ Rn : observed data z ∈ Rl : missing data (y , z) ∈ Rn+l : complete data P = {f (y , z; θ), θ ∈ Θ}: family of parametric pdfs on Rn+l Assumption: ∃θ ∈ Θ s.t. the complete data likelihood q(y , z; θ) = f (y , z; θ) St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 15 / 42
  • 45. Applying AMALA within SAEM Maximum likelihood estimation for incomplete data setting Maximum likelihood estimation for incomplete data setting y ∈ Rn : observed data z ∈ Rl : missing data (y , z) ∈ Rn+l : complete data P = {f (y , z; θ), θ ∈ Θ}: family of parametric pdfs on Rn+l Assumption: ∃θ ∈ Θ s.t. the complete data likelihood q(y , z; θ) = f (y , z; θ) Observed likelihood: g (y ; θ) = f (y , z; θ)µ(dz). (4) St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 15 / 42
  • 46. Applying AMALA within SAEM Maximum likelihood estimation for incomplete data setting Maximum likelihood estimation for incomplete data setting y ∈ Rn : observed data z ∈ Rl : missing data (y , z) ∈ Rn+l : complete data P = {f (y , z; θ), θ ∈ Θ}: family of parametric pdfs on Rn+l Assumption: ∃θ ∈ Θ s.t. the complete data likelihood q(y , z; θ) = f (y , z; θ) Observed likelihood: g (y ; θ) = f (y , z; θ)µ(dz). (4) n Given a sample of observations (yi )1≤i≤n = y1 St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 15 / 42
  • 47. Applying AMALA within SAEM Maximum likelihood estimation for incomplete data setting Maximum likelihood estimation for incomplete data setting y ∈ Rn : observed data z ∈ Rl : missing data (y , z) ∈ Rn+l : complete data P = {f (y , z; θ), θ ∈ Θ}: family of parametric pdfs on Rn+l Assumption: ∃θ ∈ Θ s.t. the complete data likelihood q(y , z; θ) = f (y , z; θ) Observed likelihood: g (y ; θ) = f (y , z; θ)µ(dz). (4) Given a sample of observations (yi )1≤i≤n = y1n ˆ Find: θg in Θ s.t. ˆ n θg = arg max g (y1 ; θ) θ∈Θ St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 15 / 42
  • 48. Applying AMALA within SAEM Description of the algorithm AMALA-SAEM Incomplete data setting + maximum likelihood estimation = EM algorithm St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 16 / 42
  • 49. Applying AMALA within SAEM Description of the algorithm AMALA-SAEM Incomplete data setting + maximum likelihood estimation = EM algorithm General case −→ E step not tractable St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 16 / 42
  • 50. Applying AMALA within SAEM Description of the algorithm AMALA-SAEM Incomplete data setting + maximum likelihood estimation = EM algorithm General case −→ E step not tractable Stochastic Approximation EM for convergence properties St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 16 / 42
  • 51. Applying AMALA within SAEM Description of the algorithm AMALA-SAEM Incomplete data setting + maximum likelihood estimation = EM algorithm General case −→ E step not tractable Stochastic Approximation EM for convergence properties with MCMC method for simulation step. St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 16 / 42
  • 52. Applying AMALA within SAEM Description of the algorithm AMALA-SAEM Incomplete data setting + maximum likelihood estimation = EM algorithm General case −→ E step not tractable Stochastic Approximation EM for convergence properties with MCMC method for simulation step. → AMALA-SAEM: using AMALA as the MCMC method St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 16 / 42
  • 53. Applying AMALA within SAEM Description of the algorithm Description of the algorithm Assumption: model in the exponential family = all information carried by sufficient statistics S For k = 1 : kend Iteration of SAEM St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 17 / 42
  • 54. Applying AMALA within SAEM Description of the algorithm Description of the algorithm Assumption: model in the exponential family = all information carried by sufficient statistics S For k = 1 : kend Iteration of SAEM Sample zk through a single AMALA step (simulation and acceptation/reject) using current parameter θk−1 St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 17 / 42
  • 55. Applying AMALA within SAEM Description of the algorithm Description of the algorithm Assumption: model in the exponential family = all information carried by sufficient statistics S For k = 1 : kend Iteration of SAEM Sample zk through a single AMALA step (simulation and acceptation/reject) using current parameter θk−1 Compute the stochastic approximation sk = sk−1 + γk (S(zk ) − sk−1 ) , where (γk )k is a sequence of positive step sizes. St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 17 / 42
  • 56. Applying AMALA within SAEM Description of the algorithm Description of the algorithm Assumption: model in the exponential family = all information carried by sufficient statistics S For k = 1 : kend Iteration of SAEM Sample zk through a single AMALA step (simulation and acceptation/reject) using current parameter θk−1 Compute the stochastic approximation sk = sk−1 + γk (S(zk ) − sk−1 ) , where (γk )k is a sequence of positive step sizes. Update the parameter ˆ θk = θ(sk ). St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 17 / 42
  • 57. Applying AMALA within SAEM Description of the algorithm Description of the algorithm Assumption: model in the exponential family = all information carried by sufficient statistics S For k = 1 : kend Iteration of SAEM Sample zk through a single AMALA step (simulation and acceptation/reject) using current parameter θk−1 Compute the stochastic approximation sk = sk−1 + γk (S(zk ) − sk−1 ) , where (γk )k is a sequence of positive step sizes. Update the parameter ˆ θk = θ(sk ). Can require truncation on random boundaries for convergence purposes St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 17 / 42
  • 58. Applying AMALA within SAEM Convergence properties Convergence properties Conditions: Smoothness of the model (classic conditions for convergence of stochastic approximation and EM) Condition for AMALA geometric ergodicity (B1) St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 18 / 42
  • 59. Applying AMALA within SAEM Convergence properties Convergence properties Conditions: Smoothness of the model (classic conditions for convergence of stochastic approximation and EM) Condition for AMALA geometric ergodicity (B1) Results: Convergence of (sk ) a.s. towards critical point of mean field of the problem St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 18 / 42
  • 60. Applying AMALA within SAEM Convergence properties Convergence properties Conditions: Smoothness of the model (classic conditions for convergence of stochastic approximation and EM) Condition for AMALA geometric ergodicity (B1) Results: Convergence of (sk ) a.s. towards critical point of mean field of the problem Convergence of estimated parameters (θk ) a.s. towards critical point of observed likelihood St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 18 / 42
  • 61. Applying AMALA within SAEM Convergence properties Convergence properties Conditions: Smoothness of the model (classic conditions for convergence of stochastic approximation and EM) Condition for AMALA geometric ergodicity (B1) Results: Convergence of (sk ) a.s. towards critical point of mean field of the problem Convergence of estimated parameters (θk ) a.s. towards critical point of observed likelihood √ Central limit theorem for (θk ) with rate 1/ γk St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 18 / 42
  • 62. Applying AMALA within SAEM Convergence properties Conditions for the SA to converge Define for any V : X → [1, ∞] and any g : X → Rm the norm g (z) g V = sup . z∈X V (z) (A1’) S is an open subset of Rm , h : S → Rm is continuous and there exists a continuously differentiable function w : S → [0, ∞[ with the following properties. (i) There exists an M0 > 0 such that L {s ∈ S, w (s), h(s) = 0} ⊂ {s ∈ S, w (s) < M0 } . St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 19 / 42
  • 63. Applying AMALA within SAEM Convergence properties Conditions for the SA to converge (2) (ii) There exists a closed convex set Sa ⊂ S for which s → s + ρHs (z) ∈ Sa for any ρ ∈ [0, 1] and (z, s) ∈ X × Sa (Sa is absorbing) and such that for any M1 ∈]M0 , ∞], the set WM1 ∩ Sa is a compact set of S where WM1 {s ∈ S, w (s) ≤ M1 }. (iii) For any s ∈ SL w (s), h(s) < 0. (iv) The closure of w (L) has an empty interior. (A2’) For any s ∈ S, Hs : X → S is measurable and Hs (z) πs (dz) < ∞. St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 20 / 42
  • 64. Applying AMALA within SAEM Convergence properties Conditions for the SA to converge (3) (A3”) There exist a function V : X → [1, ∞] such that {z ∈ X , V (z) < ∞} = ∅, constants a ∈]0, 1], p ≥ 2 , r > 0 and q ≥ 1 such that for any compact subset K ⊂ S, (i) sup Hs V < ∞, (5) s∈K sup ( gs V + Πs gs V) < ∞, (6) s∈K −a sup s −s { gs − gs Vq + Πs gs − Πs gs Vq} < ∞, (7) s,s ∈K where for anys ∈ S a solution of the Poisson equation g − Πs g = Hs − πs (Hs ) is denoted by gs . St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 21 / 42
  • 65. Applying AMALA within SAEM Convergence properties Conditions for the SA to converge (4) (ii) For any sequence ε = (εk )k≥0 satisfying εk < ¯ for an ¯ sufficiently small, for any sequence γ = (γk )k≥0 , there exist a constant C such that and for any z ∈ X , sup sup Eγ V p (zk )1σ(K)∧ν(ε)≥k ≤ C V p+r (z) , z,s (8) s∈K k≥0 where ν(ε) = inf{k ≥ 1, sk − sk−1 ≥ εk } and σ(K) = inf{k ≥ 1, sk ∈ K} and the expectation is related to the / non-homogeneous Markov chain ((zk , sk ))k≥0 using the step-size sequence γ = (γk )k≥0 . (A4) The sequences γ = (γk )k≥0 and ε = (εk )k≥0 are non-increasing, ∞ positive and satisfy: γk = ∞, lim εk = 0 and k=0 k→∞ ∞ {γk + γk εa + (γk ε−1 )p } < ∞, where a and p are defined in (A3”). 2 k k k=1 St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 22 / 42
  • 66. Applying AMALA within SAEM Convergence properties Condition for AMALA-SAEM to converge (M1) The parameter space Θ is an open subset of Rp . The complete data likelihood function is given by: f (y , z; θ) = exp {−ψ(θ) + S(z), φ(θ) } , where S is a Borel function on Rl taking its values in an open subset S of Rm . Moreover, the convex hull of S(Rl ) is included in S, and, for all θ in Θ, ||S(z)||pθ (z)µ(dz) < ∞. (M2) The functions ψ and φ are twice continuously differentiable on Θ. St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 23 / 42
  • 67. Applying AMALA within SAEM Convergence properties Condition for AMALA-SAEM to converge (2) (M3) The function ¯ : Θ → S defined as s ¯(θ) s S(z)pθ (z)µ(dz) is continuously differentiable on Θ. (M4) The function l : Θ → R defined as the observed-data log-likelihood l(θ) log g (y ; θ) = log f (y , z; θ)µ(dz) is continuously differentiable on Θ and ∂θ f (y , z; θ)µ(dz) = ∂θ f (y , z; θ)µ(dz). St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 24 / 42
  • 68. Applying AMALA within SAEM Convergence properties Condition for AMALA-SAEM to converge (3) ˆ (M5) There exists a function θ : S → Θ, such that: ˆ ∀s ∈ S, ∀θ ∈ Θ, L(s; θ(s)) ≥ L(s; θ). ˆ Moreover, the function θ is continuously differentiable on S. ˆ (M6) The functions l : Θ → R and θ : S → Θ are m times differentiable. (M7) (i) There exists an M0 > 0 such that ˆ ˆ s ∈ S, ∂s l(θ(s)) = 0 ⊂ {s ∈ S, −l(θ(s)) < M0 } . ¯ ˆ (ii) For all M1 > M0 , the set Conv (S(Rl )) ∩ {s ∈ S, −l(θ(s)) ≤ M1 } is a compact set of S. St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 25 / 42
  • 69. Applying AMALA within SAEM Convergence properties Condition for AMALA-SAEM to converge (4) (M8) There exists a polynomial function P of degree 2 such that for all z ∈ X ||S(z)|| ≤ |P(z)| . (B3) For any compact subset K of S, there exists a polynomial function Q of the hidden variable such that sup | z log pθ(s) (z)| ≤ |Q(z)| ˆ s∈K . St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 26 / 42
  • 70. Application on Bayesian Mixed effect template estimation Outline: 1. AMALA: simulation of random variables in high dimension Anisotropic MALA description Convergence property 2. AMALA within stochastic algorithm for parameter estimation Maximum likelihood estimation for incomplete data setting AMALA-SAEM Convergence properties 3. Experiments BME-Template model: small deformation setting BME-Template model: LDDMM setting St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 27 / 42
  • 71. Application on Bayesian Mixed effect template estimation Description of the BME Template model BME Template model with small deformations Deformable template model: (u = voxel, vu its position) y (u) = I0 (vu − m(vu )) + σ (u) , St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 28 / 42
  • 72. Application on Bayesian Mixed effect template estimation Description of the BME Template model BME Template model with small deformations Deformable template model: (u = voxel, vu its position) y (u) = I0 (vu − m(vu )) + σ (u) , Parametric template and deformation: kp Iα (v ) = (Kp α)(v ) = Kp (v , rp,k )αj and j=1 kg mz (v ) = (Kg z(v ) = Kg (v , rg ,k )z j . j=1 St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 28 / 42
  • 73. Application on Bayesian Mixed effect template estimation Description of the BME Template model BME Template model with small deformations Deformable template model: (u = voxel, vu its position) y (u) = I0 (vu − m(vu )) + σ (u) , Parametric template and deformation: kp Iα (v ) = (Kp α)(v ) = Kp (v , rp,k )αj and j=1 kg mz (v ) = (Kg z(v ) = Kg (v , rg ,k )z j . j=1 Generative model:  z ∼ ⊗n N2kg (0, Γg ) | Γg ,  i=1 y ∼ ⊗n N|Λ| (mzi Iα , σ 2 Id) | z, α, σ 2 ,  i=1 St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 28 / 42
  • 74. Application on Bayesian Mixed effect template estimation Description of the BME Template model BME Template model with small deformations Deformable template model: (u = voxel, vu its position) y (u) = I0 (vu − m(vu )) + σ (u) , Parametric template and deformation: kp Iα (v ) = (Kp α)(v ) = Kp (v , rp,k )αj and j=1 kg mz (v ) = (Kg z(v ) = Kg (v , rg ,k )z j . j=1 Generative model:  z ∼ ⊗n N2kg (0, Γg ) | Γg ,  i=1 y ∼ ⊗n N|Λ| (mzi Iα , σ 2 Id) | z, α, σ 2 ,  i=1 Bayesian framework → MAP estimator (= penalised MLE) St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 28 / 42
  • 75. Application on Bayesian Mixed effect template estimation Results on the template estimation Training sets Figure: Left: Training set (inverse video). Right: Noisy training set (inverse video). St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 29 / 42
  • 76. Application on Bayesian Mixed effect template estimation Results on the template estimation Estimated templates Algorithm/ FAM-EM H.G.-SAEM AMALA-SAEM Noise level No Noise Noisy of Variance 1 Figure: Estimated templates using different algorithms and two level of noise. The training set includes 20 images per digit. St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 30 / 42
  • 77. Application on Bayesian Mixed effect template estimation Results on the covariance matrix estimation Estimated geometric variability Figure: Synthetic samples generated with respect to the BME template model. St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 31 / 42
  • 78. Application on Bayesian Mixed effect template estimation CLT empirical proof Empirical proof of the CLT Figure: Evolution of the estimation of the noise variance along the SAEM iterations. Left: original data. Right: Noisy training set. St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 32 / 42
  • 79. Application on Bayesian Mixed effect template estimation CLT empirical proof Figure: Evolution of the estimation of the noise variance along the SAEM iterations. Test of convergence towards the Gaussian distribution of the estimated parameters. St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 33 / 42
  • 80. Application on Bayesian Mixed effect template estimation Medical image template estimation Corpus callosum data base Figure: Medical image template estimation: 10 Corpus callosum and splenium training images among the 47 available. Figure: Grey level mean. FAM-EM estimated template. Hybrid Gibbs - SAEM estimated template.AMALA-SAEM estimation . St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 34 / 42
  • 81. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting BME Template model with LDDMM Deformable template model: (u = voxel, vu its position) y (u) = I0 (φ−1 (vu )) + σ (u) , β(0) kp Parametric template: Iα (v ) = (Kp α)(v ) = Kp (v , rp,k )αj and φ j=1 LDDMM solution of shooting with initial momentum β(0). Generative model:  z ∼ ⊗n N2kg (0, Γg ) | Γg ,  i=1 y ∼ ⊗n N|Λ| (φβ(0) Iα , σ 2 Id) | z, α, σ 2 ,  i=1 Bayesian framework → MAP estimator (= penalised MLE) St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 35 / 42
  • 82. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting LDDMM: parametric deformation: Fix some control points: c(t) = (c1 (t), ..., cng (t)) Choose a kernel Kg Start from an initial momentum β(0) = β 1 (0), ..., β ng (0) Then, Hamiltonian System → Time evolution of both momenta and control points  dc = ∂H (c, β) = K (c(t))β(t)    dt g  ∂β (9)  dβ  ∂H 1 = − (c, β) = − c(t) K (β(t), β(t))   dt ∂c 2 St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 36 / 42
  • 83. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting LDDMM: parametric deformation (2): Interpolating on any point of the domain: ng vt (r ) = (Kg β(t))(r ) = Kg (r , ck (t))β k (t) ∀r ∈ D (10) k=1 Deformation = solution of the flow equation:    ∂φβ(0) (t)  = vt ◦ φβ(0) (t) (11)  φ ∂t   0 = Id . φβ(0) = φβ(0) (1) St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 37 / 42
  • 84. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting Gradient computation E (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2 1 Reg(φ1 ) St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 38 / 42
  • 85. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting Gradient computation E (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2 1 Reg(φ1 ) S0   S0 = {(ci , βi )}i          St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 38 / 42
  • 86. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting Gradient computation E (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2 1 Reg(φ1 ) S0   S0 = {(ci , βi )}i    dS(t)  = F (S(t)) S(0) = S0  dt     St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 38 / 42
  • 87. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting Gradient computation E (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2 1 Reg(φ1 ) S0   S0 = {(ci , βi )}i    dS(t)  = F (S(t)) S(0) = S0  dt  dy (t)  = G (S(t), y (t)) y (1) = y   dt St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 38 / 42
  • 88. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting Gradient computation E (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2 1 Reg(φ1 ) S0 A(yk (0))   S0 = {(ci , βi )}i    dS(t)  = F (S(t)) S(0) = S0  dt  dy (t)  = G (S(t), y (t)) y (1) = y   dt St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 38 / 42
  • 89. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting Gradient computation E (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2 1 Reg(φ1 ) S0 A(yk (0)) L(S0 )=β(0)t Γg (q(0),q(0))β(0)   S0 = {(ci , βi )}i    dS(t)  = F (S(t)) S(0) = S0  dt  dy (t)  = G (S(t), y (t)) y (1) = y   dt St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 38 / 42
  • 90. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting Gradient computation E (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2 1 Reg(φ1 ) S0 A(yk (0)) L(S0 )=β(0)t Γg (q(0),q(0))β(0)   S0 = {(ci , βi )}i    dS(t)  = F (S(t)) S(0) = S0  dt  dy (t)  = G (S(t), y (t)) y (1) = y   dt T S0 E = dS0 y (0) y (0) A + S0 L St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 38 / 42
  • 91. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting Gradient computation E (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2 1 Reg(φ1 ) S0 A(yk (0)) L(S0 )=β(0)t Γg (q(0),q(0))β(0)   S0 = {(ci , βi )}i    dS(t)  = F (S(t)) S(0) = S0  dt  dy (t)  = G (S(t), y (t)) y (1) = y   dt T S0 E = dS0 y (0) y (0) A + S0 L yk (0) A = 2 (I0 (yk (0)) − I (yk (1))) yk (0) I0 St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 38 / 42
  • 92. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting Gradient computation E (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2 1 Reg(φ1 ) S0 A(yk (0)) L(S0 )=β(0)t Γg (q(0),q(0))β(0)   S0 = {(ci , βi )}i    dS(t)  = F (S(t)) S(0) = S0  dt  dy (t)  = G (S(t), y (t)) y (1) = y   dt T S0 E = dS0 y (0) y (0) A + S0 L yk (0) A = 2 (I0 (yk (0)) − I (yk (1))) yk (0) I0 - Momenta decrease image discrepancy - Control Points attracted by image contours St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 38 / 42
  • 93. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting Gradient computation E (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2 1 Reg(φ1 ) S0 A(yk (0)) L(S0 )=β(0)t Γg (q(0),q(0))β(0)   S0 = {(ci , βi )}i    dS(t)  = F (S(t)) S(0) = S0  dt  dy (t)  = G (S(t), y (t)) y (1) = y   dt dη(t) = ∂S(t) G T η(t), η(0) = y (0) A dt St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 38 / 42
  • 94. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting Gradient computation E (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2 1 Reg(φ1 ) S0 A(yk (0)) L(S0 )=β(0)t Γg (q(0),q(0))β(0)   S0 = {(ci , βi )}i    dS(t)  = F (S(t)) S(0) = S0  dt  dy (t)  = G (S(t), y (t)) y (1) = y   dt dη(t) = ∂S(t) G T η(t), η(0) = y (0) A dt dξ(t) = ∂y (t) G T η(t) − dF T ξ(t), ξ(1) = 0 dt St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 38 / 42
  • 95. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting Gradient computation E (ci , βi ) = k (I0 (φ−1 (yk )) − I (yk ))2 +σ 2 1 Reg(φ1 ) S0 A(yk (0)) L(S0 )=β(0)t Γg (q(0),q(0))β(0)   S0 = {(ci , βi )}i    dS(t)  = F (S(t)) S(0) = S0  dt  dy (t)  = G (S(t), y (t)) y (1) = y   dt dη(t) = ∂S(t) G T η(t), η(0) = y (0) A dt dξ(t) = ∂y (t) G T η(t) − dF T ξ(t), ξ(1) = 0 dt S0 E = ξ(0) + S0 L St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 38 / 42
  • 96. Application on Bayesian Mixed effect template estimation Results on the template estimation using LDDMM shooting Using LDDMM deformations via shooting (preliminary results) AMALA : GH : St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 39 / 42
  • 97. Conclusion Conclusion Good performances (as accurate as other algorithms) Reduce computational time Can handle the movement of control points in practice (theory to confirm) Can handle sparsity of the template ( model selection) Removing control points ? In practice, why not... theory ? St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 40 / 42
  • 98. Conclusion Conclusion Good performances (as accurate as other algorithms) Reduce computational time Can handle the movement of control points in practice (theory to confirm) Can handle sparsity of the template ( model selection) Removing control points ? In practice, why not... theory ? Thank you ! St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 40 / 42
  • 99. Conclusion St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 41 / 42
  • 100. Conclusion St´phanie Allassonni`re (CMAP) e e AMALA BigMC, January 2012 42 / 42