SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
The Metropolis adjusted Langevin Algorithm
for log-concave probability measures in high
                dimensions
                Andreas Eberle
                 June 9, 2011


                     0-0
1    INTRODUCTION

                      1 2
            U (x) =     |x| + V (x) ,     x ∈ Rd ,     V ∈ C 4 (Rd ),
                      2
                     1 −U (x) d       (2π)d/2 −V (x) d
             µ(dx) =   e     λ (dx) =        e      γ (dx),
                     Z                   Z

γd = N (0, Id ) standard normal distribution in Rd .

AIM :

    • Approximate Sampling from µ.

    • Rigorous error and complexity estimates, d → ∞.
RUNNING EXAMPLE: TRANSITION PATH SAMPLING

                  dYt = dBt −         H(Yt ) dt ,   Y0 = y0 ∈ Rn ,
µ = conditional distribution on C([0, T ], Rn ) of (Yt )t∈[0,T ] given YT = yT .
By Girsanov‘s Theorem:

                        µ(dy) = Z −1 exp(−V (y)) γ(dy),
γ = distribution of Brownian bridge from y0 to yT ,
                                  T
                                      1
                   V (y) =              ∆H(yt ) + | H(yt )|2    dt.
                              0       2


Finite dimensional approximation via Karhunen-Loève expansion:

         γ(dy) → γ d (dx) ,           V (y) → Vd (x)           setup above
MARKOV CHAIN MONTE CARLO APPROACH

  • Simulate an ergodic Markov process (Xn ) with stationary distribution µ.
                  −1
  • n large: P ◦ Xn ≈ µ

  • Continuous time: (over-damped) Langevin diffusion
                           1        1
                    dXt = − Xt dt −   V (Xt ) dt + dBt
                           2        2


  • Discrete time: Metropolis-Hastings Algorithms, Gibbs Samplers
METROPOLIS-HASTINGS ALGORITHM
(Metropolis et al 1953, Hastings 1970)

              µ(x) := Z −1 exp(−U (x))      density of µ w.r.t. λd ,

          p(x, y) stochastic kernel on Rd     proposal density, > 0,


ALGORITHM

  1. Choose an initial state X0 .

  2. For n := 0, 1, 2, . . . do

        • Sample Yn ∼ p(Xn , y)dy, Un ∼ Unif(0, 1) independently.
        • If Un < α(Xn , Yn ) then accept the proposal and set Xn+1 := Yn ;
          else reject the proposal and set Xn+1 := Xn .
METROPOLIS-HASTINGS ACCEPTANCE PROBABILITY
                      µ(y)p(y, x)
     α(x, y) = min                ,1   = exp (−G(x, y)+ ),    x, y ∈ Rd ,
                      µ(x)p(x, y)

              µ(x)p(x, y)                   p(x, y)                  γ d (x)p(x, y)
G(x, y) = log             = U (y)−U (x)+log         = V (y)−V (x)+log d
              µ(y)p(y, x)                   p(y, x)                  γ (y)p(y, x)


   • (Xn ) is a time-homogeneous Markov chain with transition kernel

       q(x, dy) = α(x, y)p(x, y)dy + q(x)δx (dy), q(x) = 1 − q(x, Rd  {x}).


   • Detailed Balance:

                         µ(dx) q(x, dy) = µ(dy) q(y, dx).
PROPOSAL DISTRIBUTIONS FOR METROPOLIS-HASTINGS
            x → Yh (x) proposed move,          h > 0 step size,
          ph (x, dy) = P [Yh (x) ∈ dy] proposal distribution,
       αh (x, y) = exp(−Gh (x, y)+ ) acceptance probability.


  • Random Walk Proposals (       Random Walk Metropolis)
                                       √
                       Yh (x)    = x + h · Z,   Z ∼ γd,
                    ph (x, dy)   = N (x, h · Id ),
                    Gh (x, y)    = U (y) − U (x).
  • Ornstein-Uhlenbeck Proposals
                        h               h2
       Yh (x) =      1−      x+      h−    · Z,       Z ∼ γd,
                        2               4
    ph (x, dy) = N ((1 − h/2)x, (h − h2 /4) · Id ),      det. balance w.r.t. γ d
    Gh (x, y) = V (y) − V (x).
• Euler Proposals (     Metropolis Adjusted Langevin Algorithm)

                          h       h              √
          Yh (x) =     1−      x−      V (x) +       h · Z,       Z ∼ γd.
                          2       2

  (Euler step for Langevin equation dXt = − 1 Xt dt −
                                            2
                                                              1
                                                              2   V (Xt ) dt + dBt )

                               h      h
        ph (x, dy)   = N ((1 − )x −       V (x), h · Id ),
                               2      2
        Gh (x, y)    = V (y) − V (x) − (y − x) · ( V (y) +          V (x))/2
                        +h(| U (y)|2 − | U (x)|2 )/4.


  REMARK. Even for V ≡ 0, γ d is not a stationary distribution for pEuler .
                                                                     h
  Stationarity only holds asymptotically as h → 0. This causes substantial
  problems in high dimensions.
• Semi-implicit Euler Proposals (   Semi-implicit MALA)

                        h       h                  h2
     Yh (x) =        1−    x−       V (x) + h −       · Z,   Z ∼ γd,
                        2       2                  4
                         h       h              h2
  ph (x, dy) =   N ((1 − )x −        V (x), (h − ) · Id )  ( = pOU if V ≡ 0 )
                                                                h
                         2       2               4
  Gh (x, y) =    V (y) − V (x) − (y − x) · ( V (y) + V (x))/2
                      h
                 +          (y + x) · ( V (y) − V (x)) + | V (y)|2 − | V (x)|2 .
                   8 − 2h


  REMARK. Semi-implicit discretization of Langevin equation
                       1       1
         dXt     =   − Xt dt −    V (Xt ) dt + dBt
                       2       2
                       ε Xn+1 + Xn   ε             √
  Xn+1 − Xn      =   −             −      V (Xn ) + εZn+1 ,     Zi i.i.d. ∼ γ d
                       2     2       2
Solve for Xn and substitute h = ε/(1 + ε/4):

                      h        h                   h2
       Xn+1 =      1−     Xn −      V (Xn ) +   h−    · Zn+1 .
                      2        2                   4
KNOWN RESULTS FOR METROPOLIS-HASTINGS IN HIGH DIMENSIONS

  • Scaling of acceptance probabilities and mean square jumps as d → ∞

  • Diffusion limits as d → ∞
  • Ergodicity, Geometric Ergodicity

  • Quantitative bounds for mixing times, rigorous complexity estimates
Optimal Scaling and diffusion limits

   • Roberts, Gelman, Gilks 1997: Diffusion limit for RWM with product tar-
     get, h = O(d−1 )

   • Roberts, Rosenthal 1998: Diffusion limit for MALA with product target,
     h = O(d−1/3 )

   • Beskos, Roberts, Stuart, Voss 2008: Semi-implicit MALA applied to
     Transition Path Sampling, Scaling h = O(1)

   • Beskos, Roberts, Stuart 2009: Optimal Scaling for non-product targets

   • Mattingly, Pillai, Stuart 2010: Diffusion limit for RWM with non-product
     target, h = O(d−1 )

   • Pillai, Stuart, Thiéry 2011: Diffusion limit for MALA with non-product
     target, h = O(d−1/3 )
Geometric ergodicity for MALA

   • Roberts, Tweedie 1996: Geometric convergence holds if      U is globally
     Lipschitz but fails in general

   • Bou Rabee, van den Eijnden 2009: Strong accuracy for truncated MALA

   • Bou Rabee, Hairer, van den Eijnden 2010: Convergence to equilibrium
     for MALA at exponential rate up to term exponentially small in time step
     size
BOUNDS FOR MIXING TIME, COMPLEXITY
Metropolis with ball walk proposals
   • Dyer, Frieze, Kannan 1991: µ = U nif (K), K ⊂ Rd convex
     ⇒ Total variation mixing time is polynomial in d and diam(K)
   • Applegate, Kannan 1991, ... , Lovasz, Vempala 2006: U : K → R
     concave, K ⊂ Rd convex
     ⇒ Total variation mixing time is polynomial in d and diam(K)
Metropolis adjusted Langevin :
   • No rigorous complexity estimates so far.
   • Classical results for Langevin diffusions. In particular: If µ is strictly
     log-concave, i.e.,

                      ∃ κ > 0 : ∂ 2 U (x) ≥ κ · Id     ∀ x ∈ Rd

      then
                    dK ( law(Xt ) , µ ) ≤ e−κt dK ( law(X0 ) , µ ).
If µ is strictly log-concave, i.e.,

                     ∃ κ > 0 : ∂ 2 U (x) ≥ κ · Id     ∀ x ∈ Rd

then
                   dK ( law(Xt ) , µ ) ≤ e−κt dK ( law(X0 ) , µ ).


   • Bound is independent of dimension, sharp !

   • Under additional conditions, a corresponding result holds for the Euler
     discretization.

   • This suggests that comparable bounds might hold for MALA, or even for
     Ornstein-Uhlenbeck proposals.
2   Main result and strategy of proof
Semi-implicit MALA:

                   h       h                  h2
    Yh (x) =    1−      x−       V (x) +   h−    · Z,      Z ∼ γ d , h > 0,
                   2       2                  4

Coupling of proposal distributions ph (x, dy), x ∈ Rd ,

            Yh (x)   if U ≤ αh (x, Yh (x))
Wh (x) =                                   , U ∼ U nif (0, 1) independent of Z,
            x        if U > α(x, Y (x, x))
                                       ˜

Coupling of MALA transition kernels qh (x, dy), x ∈ Rd .
We fix a radius R ∈ (0, ∞) and a norm              ·   −    on Rd such that x            −   ≤ |x| for
any x ∈ Rd , and set

       d(x, x) := min( x − x
            ˜              ˜    − , R),       B := {x ∈ Rd :                  x   −   < R/2}.


EXAMPLE: Transition Path Sampling

   • |x|Rd is finite dimensional projection of Cameron-Martin norm

                                                           2        1/2
                                              T
                                                      dx
                             |x|CM =                           dt         .
                                          0           dt

   •    x   −   is finite dimensional approximation of supremum or L2 norm.
GOAL:

E [d(Wh (x), Wh (˜))] ≤
                 x          1 − κh + Ch3/2 d(x, x)
                                                ˜            ∀ x, x ∈ B, h ∈ (0, 1)
                                                                  ˜

 with explicit constants κ, C ∈ (0, ∞) that do depend on the dimension d only
through the moments

                                        k
                    mk :=           x   −   γ d (dx) ,   k ∈ N.
                              Rd


CONSEQUENCE:
   • Contractivity of MALA transition kernel qh for small h w.r.t. Kantorovich-
     Wasserstein distance

              dK (ν, η) =     sup       E[d(X, Y )] ,     ν, η ∈ P rob(Rd ).
                            X∼ν,Y ∼η


         dK (νqh , µ) ≤ (1 − κh + Ch3/2 ) dK (ν, µ) + R · (ν(B c ) + µ(B c )).
• Upper bound for mixing time

         Tmix (ε) = inf {n ≥ 0 : dK (νqh , µ) < ε for any ν ∈ P rob(Rd )}.
                                       n




EXAMPLE: Transition Path Sampling
Dimension-independent bounds hold under appropriate assumptions.
STRATEGY OF PROOF: Let

    A(x) := {U ≤ αh (x, Yh (x))}       (proposed move from x is accepted)

Then

          E[d(Wh (x), Wh (˜))] ≤
                          x           E[d(Yh (x), Yh (˜)); A(x) ∩ A(˜)]
                                                      x             x
                                      + d(x, x) · P [A(x)C ∩ A(˜)C ]
                                             ˜                 x
                                      + R · P [A(x) A(˜)].x

We prove under appropriate assumptions:

  1. E[d(Yh (x), Yh (˜))] ≤ (1 − κh) · d(x, x),
                     x                      ˜

  2. P [A(x)C ] = E[1 − αh (x, Yh (x))] ≤ C1 h3/2 ,

  3. P [A(x)    A(˜)] ≤ E[|αh (x, Yh (x)) − αh (˜, Yh (˜))|] ≤ C2 h3/2 x − x
                  x                             x      x                   ˜   −,

with explicit constants κ, C1 , C2 ∈ (0, ∞).
3    Contractivity of proposal step
PROPOSITION. Suppose there exists a constant α ∈ (0, 1) such that
                       2
                           V (x) · η        −   ≤ α η   −    ∀x ∈ B, η ∈ Rd .                  (1)

Then
                                               1−α
       Yh (x) − Yh (˜)
                    x       −   ≤           1−     h        x−x
                                                              ˜   −        ∀x, x ∈ B, h > 0.
                                                                               ˜
                                                2

Proof.
                                        1
 Yh (x) − Yh (˜)
              x    −       ≤                ∂x−˜ Yh (tx + (1 − t)˜)
                                               x                 x    −   dt
                                    0
                                        1
                                        h            h 2
                           =              )(x − x) −
                                            (1 −˜        V (tx + (1 − t)˜) · (x − x)
                                                                        x         ˜                  −   dt
                                 0      2            2
                                    h              h
                           ≤    (1 − ) x − x − + α x − x −
                                            ˜             ˜
                                    2              2
REMARK.
The assumption (1) implies strict convexity of U (x) = 1 |x|2 + V (x).
                                                       2

EXAMPLE.
For Transition Path Sampling, (A1) holds for small T with α independent of
the dimension.
4    Bounds for MALA rejection probabilities

αh (x, y) = exp(−Gh (x, y)+ ) Acceptance probability for semi-implicit MALA,



Gh (x, y)   = V (y) − V (x) − (y − x) · ( V (y) + V (x))/2
                   h
              +          (y + x) · ( V (y) − V (x)) + | V (y)|2 − | V (x)|2 .
                8 − 2h

ASSUMPTION. There exist finite constants Cn , pn ∈ [0, ∞) such that
                n                                       pn
             |(∂ξ1 ,...,ξn V )(x)| ≤ Cn max(1, x      −)     ξ1   −   · · · ξn   −


for any x ∈ Rd , ξ1 , . . . , ξn ∈ Rd , and n = 2, 3, 4.
THEOREM. If the assumption above is satisfied then there exists a polyno-
mial P : R2 → R+ of degree max(p3 + 3, 2p2 + 2) such that

   E[1 − αh (x, Yh (x))] ≤ E[Gh (x, Yh (x))+ ] ≤ P( x    −,    U (x)   −)   · h3/2

for all x ∈ Rd , h ∈ (0, 2).

REMARK.

   • The polynomial P is explicit. It depends only on the values C2 , C3 , p2 , p3
     and on the moments
                               k
                  mk = E[ Z    − ],    0 ≤ k ≤ max(p3 + 3, 2p2 + 2)

      but it does not depend on the dimension d.

   • For MALA with explicit Euler proposals, a corresponding estimate holds
     with mk replaced by mk = E[|Z|k ]. Note, however, that mk → ∞ as
                          ˜                                    ˜
     d → ∞.
Proof.

   |V (y) − V (x) − (y − x) · ( V (y) +              V (x))/2|                                   (2)
              1 1           3
       =          t(1 − t)∂y−x V (x + t(y − x)) dt
              2 0
             1
       ≤        y − x 3 · sup{∂η V (z) : z ∈ [x, y], η
                      −
                                3
                                                                  −   ≤ 1}
             12
                        3
   E       Yh (x) − x   −   ≤ const. · h3/2                                                      (3)

   |(y + x) · ( V (y) −          V (x))|                                                         (4)
       ≤      y+x   −   · sup |∂ξ V (y) − ∂ξ V (x)|
                            ξ   − ≤1

                                                  2
       ≤      y+x   −   · y−x          −   · sup{∂ξη V (z) : z ∈ [x, y], ξ   −,   η   −   ≤ 1}
5     Dependence of rejection on current state
THEOREM. If the assumption above is satisfied then there exists a polyno-
mial Q : R2 → R+ of degree max(p4 + 2, p3 + p2 + 2, 3p2 + 1) such that
             E     x Gh (x, Yh (x)) +    ≤ Q( x   −,        U (x)   −)   · h3/2
for all x ∈ Rd , h ∈ (0, 2), where
                          η   +   := sup{ξ · η : ξ   −   ≤ 1}.


CONSEQUENCE.
    P [Accept(x)   Accept(˜)] ≤ E[|αh (x, Yh (x)) − αh (˜, Yh (˜))|]
                          x                             x      x
                                   ≤ E[|Gh (x, Yh (x)) − Gh (˜, Yh (˜))|]
                                                             x      x
                                   ≤ x − x − · sup
                                         ˜                z Gh (z, Yh (z))          +
                                                  z∈[x,˜]
                                                       x

                                   ≤ h3/2 x − x
                                              ˜      −    sup Q( x       −,       U (x)   −)
                                                         z∈[x,˜]
                                                              x

Contenu connexe

Tendances

Introduction to inverse problems
Introduction to inverse problemsIntroduction to inverse problems
Introduction to inverse problems
Delta Pi Systems
 
Natalini nse slide_giu2013
Natalini nse slide_giu2013Natalini nse slide_giu2013
Natalini nse slide_giu2013
Madd Maths
 
Stuff You Must Know Cold for the AP Calculus BC Exam!
Stuff You Must Know Cold for the AP Calculus BC Exam!Stuff You Must Know Cold for the AP Calculus BC Exam!
Stuff You Must Know Cold for the AP Calculus BC Exam!
A Jorge Garcia
 

Tendances (20)

Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
 
Low Complexity Regularization of Inverse Problems
Low Complexity Regularization of Inverse ProblemsLow Complexity Regularization of Inverse Problems
Low Complexity Regularization of Inverse Problems
 
Low Complexity Regularization of Inverse Problems - Course #1 Inverse Problems
Low Complexity Regularization of Inverse Problems - Course #1 Inverse ProblemsLow Complexity Regularization of Inverse Problems - Course #1 Inverse Problems
Low Complexity Regularization of Inverse Problems - Course #1 Inverse Problems
 
Signal Processing Course : Convex Optimization
Signal Processing Course : Convex OptimizationSignal Processing Course : Convex Optimization
Signal Processing Course : Convex Optimization
 
Geodesic Method in Computer Vision and Graphics
Geodesic Method in Computer Vision and GraphicsGeodesic Method in Computer Vision and Graphics
Geodesic Method in Computer Vision and Graphics
 
Lesson 27: Integration by Substitution, part II (Section 10 version)
Lesson 27: Integration by Substitution, part II (Section 10 version)Lesson 27: Integration by Substitution, part II (Section 10 version)
Lesson 27: Integration by Substitution, part II (Section 10 version)
 
Introduction to inverse problems
Introduction to inverse problemsIntroduction to inverse problems
Introduction to inverse problems
 
Scattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysisScattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysis
 
Tales on two commuting transformations or flows
Tales on two commuting transformations or flowsTales on two commuting transformations or flows
Tales on two commuting transformations or flows
 
Monte-Carlo method for Two-Stage SLP
Monte-Carlo method for Two-Stage SLPMonte-Carlo method for Two-Stage SLP
Monte-Carlo method for Two-Stage SLP
 
Quantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averagesQuantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averages
 
Signal Processing Course : Sparse Regularization of Inverse Problems
Signal Processing Course : Sparse Regularization of Inverse ProblemsSignal Processing Course : Sparse Regularization of Inverse Problems
Signal Processing Course : Sparse Regularization of Inverse Problems
 
Nested sampling
Nested samplingNested sampling
Nested sampling
 
Natalini nse slide_giu2013
Natalini nse slide_giu2013Natalini nse slide_giu2013
Natalini nse slide_giu2013
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Stuff You Must Know Cold for the AP Calculus BC Exam!
Stuff You Must Know Cold for the AP Calculus BC Exam!Stuff You Must Know Cold for the AP Calculus BC Exam!
Stuff You Must Know Cold for the AP Calculus BC Exam!
 
Lecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dualLecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dual
 
ABC in Venezia
ABC in VeneziaABC in Venezia
ABC in Venezia
 
Lecture5 kernel svm
Lecture5 kernel svmLecture5 kernel svm
Lecture5 kernel svm
 
The multilayer perceptron
The multilayer perceptronThe multilayer perceptron
The multilayer perceptron
 

En vedette

En vedette (7)

Hedibert Lopes' talk at BigMC
Hedibert Lopes' talk at  BigMCHedibert Lopes' talk at  BigMC
Hedibert Lopes' talk at BigMC
 
Anisotropic Metropolis Adjusted Langevin Algorithm: convergence and utility i...
Anisotropic Metropolis Adjusted Langevin Algorithm: convergence and utility i...Anisotropic Metropolis Adjusted Langevin Algorithm: convergence and utility i...
Anisotropic Metropolis Adjusted Langevin Algorithm: convergence and utility i...
 
Comparing estimation algorithms for block clustering models
Comparing estimation algorithms for block clustering modelsComparing estimation algorithms for block clustering models
Comparing estimation algorithms for block clustering models
 
Computation of the marginal likelihood
Computation of the marginal likelihoodComputation of the marginal likelihood
Computation of the marginal likelihood
 
Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...
Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...
Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...
 
Stability of adaptive random-walk Metropolis algorithms
Stability of adaptive random-walk Metropolis algorithmsStability of adaptive random-walk Metropolis algorithms
Stability of adaptive random-walk Metropolis algorithms
 
"Monte-Carlo Tree Search for the game of Go"
"Monte-Carlo Tree Search for the game of Go""Monte-Carlo Tree Search for the game of Go"
"Monte-Carlo Tree Search for the game of Go"
 

Similaire à Andreas Eberle

Unconditionally stable fdtd methods
Unconditionally stable fdtd methodsUnconditionally stable fdtd methods
Unconditionally stable fdtd methods
physics Imposible
 

Similaire à Andreas Eberle (20)

CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
 
1 hofstad
1 hofstad1 hofstad
1 hofstad
 
Holographic Cotton Tensor
Holographic Cotton TensorHolographic Cotton Tensor
Holographic Cotton Tensor
 
Norm-variation of bilinear averages
Norm-variation of bilinear averagesNorm-variation of bilinear averages
Norm-variation of bilinear averages
 
Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...
 
Physical Chemistry Assignment Help
Physical Chemistry Assignment HelpPhysical Chemistry Assignment Help
Physical Chemistry Assignment Help
 
Funcion gamma
Funcion gammaFuncion gamma
Funcion gamma
 
02 2d systems matrix
02 2d systems matrix02 2d systems matrix
02 2d systems matrix
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - Introduction
 
Fdtd
FdtdFdtd
Fdtd
 
Fdtd
FdtdFdtd
Fdtd
 
Unconditionally stable fdtd methods
Unconditionally stable fdtd methodsUnconditionally stable fdtd methods
Unconditionally stable fdtd methods
 
Modeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationModeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential Equation
 
Bird’s-eye view of Gaussian harmonic analysis
Bird’s-eye view of Gaussian harmonic analysisBird’s-eye view of Gaussian harmonic analysis
Bird’s-eye view of Gaussian harmonic analysis
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
On Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular IntegralsOn Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular Integrals
 
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
 
CS369-FDM-FEM.pptx
CS369-FDM-FEM.pptxCS369-FDM-FEM.pptx
CS369-FDM-FEM.pptx
 
Monopole zurich
Monopole zurichMonopole zurich
Monopole zurich
 
Alternating direction
Alternating directionAlternating direction
Alternating direction
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Andreas Eberle

  • 1. The Metropolis adjusted Langevin Algorithm for log-concave probability measures in high dimensions Andreas Eberle June 9, 2011 0-0
  • 2. 1 INTRODUCTION 1 2 U (x) = |x| + V (x) , x ∈ Rd , V ∈ C 4 (Rd ), 2 1 −U (x) d (2π)d/2 −V (x) d µ(dx) = e λ (dx) = e γ (dx), Z Z γd = N (0, Id ) standard normal distribution in Rd . AIM : • Approximate Sampling from µ. • Rigorous error and complexity estimates, d → ∞.
  • 3. RUNNING EXAMPLE: TRANSITION PATH SAMPLING dYt = dBt − H(Yt ) dt , Y0 = y0 ∈ Rn , µ = conditional distribution on C([0, T ], Rn ) of (Yt )t∈[0,T ] given YT = yT . By Girsanov‘s Theorem: µ(dy) = Z −1 exp(−V (y)) γ(dy), γ = distribution of Brownian bridge from y0 to yT , T 1 V (y) = ∆H(yt ) + | H(yt )|2 dt. 0 2 Finite dimensional approximation via Karhunen-Loève expansion: γ(dy) → γ d (dx) , V (y) → Vd (x) setup above
  • 4. MARKOV CHAIN MONTE CARLO APPROACH • Simulate an ergodic Markov process (Xn ) with stationary distribution µ. −1 • n large: P ◦ Xn ≈ µ • Continuous time: (over-damped) Langevin diffusion 1 1 dXt = − Xt dt − V (Xt ) dt + dBt 2 2 • Discrete time: Metropolis-Hastings Algorithms, Gibbs Samplers
  • 5. METROPOLIS-HASTINGS ALGORITHM (Metropolis et al 1953, Hastings 1970) µ(x) := Z −1 exp(−U (x)) density of µ w.r.t. λd , p(x, y) stochastic kernel on Rd proposal density, > 0, ALGORITHM 1. Choose an initial state X0 . 2. For n := 0, 1, 2, . . . do • Sample Yn ∼ p(Xn , y)dy, Un ∼ Unif(0, 1) independently. • If Un < α(Xn , Yn ) then accept the proposal and set Xn+1 := Yn ; else reject the proposal and set Xn+1 := Xn .
  • 6. METROPOLIS-HASTINGS ACCEPTANCE PROBABILITY µ(y)p(y, x) α(x, y) = min ,1 = exp (−G(x, y)+ ), x, y ∈ Rd , µ(x)p(x, y) µ(x)p(x, y) p(x, y) γ d (x)p(x, y) G(x, y) = log = U (y)−U (x)+log = V (y)−V (x)+log d µ(y)p(y, x) p(y, x) γ (y)p(y, x) • (Xn ) is a time-homogeneous Markov chain with transition kernel q(x, dy) = α(x, y)p(x, y)dy + q(x)δx (dy), q(x) = 1 − q(x, Rd {x}). • Detailed Balance: µ(dx) q(x, dy) = µ(dy) q(y, dx).
  • 7. PROPOSAL DISTRIBUTIONS FOR METROPOLIS-HASTINGS x → Yh (x) proposed move, h > 0 step size, ph (x, dy) = P [Yh (x) ∈ dy] proposal distribution, αh (x, y) = exp(−Gh (x, y)+ ) acceptance probability. • Random Walk Proposals ( Random Walk Metropolis) √ Yh (x) = x + h · Z, Z ∼ γd, ph (x, dy) = N (x, h · Id ), Gh (x, y) = U (y) − U (x). • Ornstein-Uhlenbeck Proposals h h2 Yh (x) = 1− x+ h− · Z, Z ∼ γd, 2 4 ph (x, dy) = N ((1 − h/2)x, (h − h2 /4) · Id ), det. balance w.r.t. γ d Gh (x, y) = V (y) − V (x).
  • 8. • Euler Proposals ( Metropolis Adjusted Langevin Algorithm) h h √ Yh (x) = 1− x− V (x) + h · Z, Z ∼ γd. 2 2 (Euler step for Langevin equation dXt = − 1 Xt dt − 2 1 2 V (Xt ) dt + dBt ) h h ph (x, dy) = N ((1 − )x − V (x), h · Id ), 2 2 Gh (x, y) = V (y) − V (x) − (y − x) · ( V (y) + V (x))/2 +h(| U (y)|2 − | U (x)|2 )/4. REMARK. Even for V ≡ 0, γ d is not a stationary distribution for pEuler . h Stationarity only holds asymptotically as h → 0. This causes substantial problems in high dimensions.
  • 9. • Semi-implicit Euler Proposals ( Semi-implicit MALA) h h h2 Yh (x) = 1− x− V (x) + h − · Z, Z ∼ γd, 2 2 4 h h h2 ph (x, dy) = N ((1 − )x − V (x), (h − ) · Id ) ( = pOU if V ≡ 0 ) h 2 2 4 Gh (x, y) = V (y) − V (x) − (y − x) · ( V (y) + V (x))/2 h + (y + x) · ( V (y) − V (x)) + | V (y)|2 − | V (x)|2 . 8 − 2h REMARK. Semi-implicit discretization of Langevin equation 1 1 dXt = − Xt dt − V (Xt ) dt + dBt 2 2 ε Xn+1 + Xn ε √ Xn+1 − Xn = − − V (Xn ) + εZn+1 , Zi i.i.d. ∼ γ d 2 2 2
  • 10. Solve for Xn and substitute h = ε/(1 + ε/4): h h h2 Xn+1 = 1− Xn − V (Xn ) + h− · Zn+1 . 2 2 4
  • 11. KNOWN RESULTS FOR METROPOLIS-HASTINGS IN HIGH DIMENSIONS • Scaling of acceptance probabilities and mean square jumps as d → ∞ • Diffusion limits as d → ∞ • Ergodicity, Geometric Ergodicity • Quantitative bounds for mixing times, rigorous complexity estimates
  • 12. Optimal Scaling and diffusion limits • Roberts, Gelman, Gilks 1997: Diffusion limit for RWM with product tar- get, h = O(d−1 ) • Roberts, Rosenthal 1998: Diffusion limit for MALA with product target, h = O(d−1/3 ) • Beskos, Roberts, Stuart, Voss 2008: Semi-implicit MALA applied to Transition Path Sampling, Scaling h = O(1) • Beskos, Roberts, Stuart 2009: Optimal Scaling for non-product targets • Mattingly, Pillai, Stuart 2010: Diffusion limit for RWM with non-product target, h = O(d−1 ) • Pillai, Stuart, Thiéry 2011: Diffusion limit for MALA with non-product target, h = O(d−1/3 )
  • 13. Geometric ergodicity for MALA • Roberts, Tweedie 1996: Geometric convergence holds if U is globally Lipschitz but fails in general • Bou Rabee, van den Eijnden 2009: Strong accuracy for truncated MALA • Bou Rabee, Hairer, van den Eijnden 2010: Convergence to equilibrium for MALA at exponential rate up to term exponentially small in time step size
  • 14. BOUNDS FOR MIXING TIME, COMPLEXITY Metropolis with ball walk proposals • Dyer, Frieze, Kannan 1991: µ = U nif (K), K ⊂ Rd convex ⇒ Total variation mixing time is polynomial in d and diam(K) • Applegate, Kannan 1991, ... , Lovasz, Vempala 2006: U : K → R concave, K ⊂ Rd convex ⇒ Total variation mixing time is polynomial in d and diam(K) Metropolis adjusted Langevin : • No rigorous complexity estimates so far. • Classical results for Langevin diffusions. In particular: If µ is strictly log-concave, i.e., ∃ κ > 0 : ∂ 2 U (x) ≥ κ · Id ∀ x ∈ Rd then dK ( law(Xt ) , µ ) ≤ e−κt dK ( law(X0 ) , µ ).
  • 15. If µ is strictly log-concave, i.e., ∃ κ > 0 : ∂ 2 U (x) ≥ κ · Id ∀ x ∈ Rd then dK ( law(Xt ) , µ ) ≤ e−κt dK ( law(X0 ) , µ ). • Bound is independent of dimension, sharp ! • Under additional conditions, a corresponding result holds for the Euler discretization. • This suggests that comparable bounds might hold for MALA, or even for Ornstein-Uhlenbeck proposals.
  • 16. 2 Main result and strategy of proof Semi-implicit MALA: h h h2 Yh (x) = 1− x− V (x) + h− · Z, Z ∼ γ d , h > 0, 2 2 4 Coupling of proposal distributions ph (x, dy), x ∈ Rd , Yh (x) if U ≤ αh (x, Yh (x)) Wh (x) = , U ∼ U nif (0, 1) independent of Z, x if U > α(x, Y (x, x)) ˜ Coupling of MALA transition kernels qh (x, dy), x ∈ Rd .
  • 17. We fix a radius R ∈ (0, ∞) and a norm · − on Rd such that x − ≤ |x| for any x ∈ Rd , and set d(x, x) := min( x − x ˜ ˜ − , R), B := {x ∈ Rd : x − < R/2}. EXAMPLE: Transition Path Sampling • |x|Rd is finite dimensional projection of Cameron-Martin norm 2 1/2 T dx |x|CM = dt . 0 dt • x − is finite dimensional approximation of supremum or L2 norm.
  • 18. GOAL: E [d(Wh (x), Wh (˜))] ≤ x 1 − κh + Ch3/2 d(x, x) ˜ ∀ x, x ∈ B, h ∈ (0, 1) ˜ with explicit constants κ, C ∈ (0, ∞) that do depend on the dimension d only through the moments k mk := x − γ d (dx) , k ∈ N. Rd CONSEQUENCE: • Contractivity of MALA transition kernel qh for small h w.r.t. Kantorovich- Wasserstein distance dK (ν, η) = sup E[d(X, Y )] , ν, η ∈ P rob(Rd ). X∼ν,Y ∼η dK (νqh , µ) ≤ (1 − κh + Ch3/2 ) dK (ν, µ) + R · (ν(B c ) + µ(B c )).
  • 19. • Upper bound for mixing time Tmix (ε) = inf {n ≥ 0 : dK (νqh , µ) < ε for any ν ∈ P rob(Rd )}. n EXAMPLE: Transition Path Sampling Dimension-independent bounds hold under appropriate assumptions.
  • 20. STRATEGY OF PROOF: Let A(x) := {U ≤ αh (x, Yh (x))} (proposed move from x is accepted) Then E[d(Wh (x), Wh (˜))] ≤ x E[d(Yh (x), Yh (˜)); A(x) ∩ A(˜)] x x + d(x, x) · P [A(x)C ∩ A(˜)C ] ˜ x + R · P [A(x) A(˜)].x We prove under appropriate assumptions: 1. E[d(Yh (x), Yh (˜))] ≤ (1 − κh) · d(x, x), x ˜ 2. P [A(x)C ] = E[1 − αh (x, Yh (x))] ≤ C1 h3/2 , 3. P [A(x) A(˜)] ≤ E[|αh (x, Yh (x)) − αh (˜, Yh (˜))|] ≤ C2 h3/2 x − x x x x ˜ −, with explicit constants κ, C1 , C2 ∈ (0, ∞).
  • 21. 3 Contractivity of proposal step PROPOSITION. Suppose there exists a constant α ∈ (0, 1) such that 2 V (x) · η − ≤ α η − ∀x ∈ B, η ∈ Rd . (1) Then 1−α Yh (x) − Yh (˜) x − ≤ 1− h x−x ˜ − ∀x, x ∈ B, h > 0. ˜ 2 Proof. 1 Yh (x) − Yh (˜) x − ≤ ∂x−˜ Yh (tx + (1 − t)˜) x x − dt 0 1 h h 2 = )(x − x) − (1 −˜ V (tx + (1 − t)˜) · (x − x) x ˜ − dt 0 2 2 h h ≤ (1 − ) x − x − + α x − x − ˜ ˜ 2 2
  • 22. REMARK. The assumption (1) implies strict convexity of U (x) = 1 |x|2 + V (x). 2 EXAMPLE. For Transition Path Sampling, (A1) holds for small T with α independent of the dimension.
  • 23. 4 Bounds for MALA rejection probabilities αh (x, y) = exp(−Gh (x, y)+ ) Acceptance probability for semi-implicit MALA, Gh (x, y) = V (y) − V (x) − (y − x) · ( V (y) + V (x))/2 h + (y + x) · ( V (y) − V (x)) + | V (y)|2 − | V (x)|2 . 8 − 2h ASSUMPTION. There exist finite constants Cn , pn ∈ [0, ∞) such that n pn |(∂ξ1 ,...,ξn V )(x)| ≤ Cn max(1, x −) ξ1 − · · · ξn − for any x ∈ Rd , ξ1 , . . . , ξn ∈ Rd , and n = 2, 3, 4.
  • 24. THEOREM. If the assumption above is satisfied then there exists a polyno- mial P : R2 → R+ of degree max(p3 + 3, 2p2 + 2) such that E[1 − αh (x, Yh (x))] ≤ E[Gh (x, Yh (x))+ ] ≤ P( x −, U (x) −) · h3/2 for all x ∈ Rd , h ∈ (0, 2). REMARK. • The polynomial P is explicit. It depends only on the values C2 , C3 , p2 , p3 and on the moments k mk = E[ Z − ], 0 ≤ k ≤ max(p3 + 3, 2p2 + 2) but it does not depend on the dimension d. • For MALA with explicit Euler proposals, a corresponding estimate holds with mk replaced by mk = E[|Z|k ]. Note, however, that mk → ∞ as ˜ ˜ d → ∞.
  • 25. Proof. |V (y) − V (x) − (y − x) · ( V (y) + V (x))/2| (2) 1 1 3 = t(1 − t)∂y−x V (x + t(y − x)) dt 2 0 1 ≤ y − x 3 · sup{∂η V (z) : z ∈ [x, y], η − 3 − ≤ 1} 12 3 E Yh (x) − x − ≤ const. · h3/2 (3) |(y + x) · ( V (y) − V (x))| (4) ≤ y+x − · sup |∂ξ V (y) − ∂ξ V (x)| ξ − ≤1 2 ≤ y+x − · y−x − · sup{∂ξη V (z) : z ∈ [x, y], ξ −, η − ≤ 1}
  • 26. 5 Dependence of rejection on current state THEOREM. If the assumption above is satisfied then there exists a polyno- mial Q : R2 → R+ of degree max(p4 + 2, p3 + p2 + 2, 3p2 + 1) such that E x Gh (x, Yh (x)) + ≤ Q( x −, U (x) −) · h3/2 for all x ∈ Rd , h ∈ (0, 2), where η + := sup{ξ · η : ξ − ≤ 1}. CONSEQUENCE. P [Accept(x) Accept(˜)] ≤ E[|αh (x, Yh (x)) − αh (˜, Yh (˜))|] x x x ≤ E[|Gh (x, Yh (x)) − Gh (˜, Yh (˜))|] x x ≤ x − x − · sup ˜ z Gh (z, Yh (z)) + z∈[x,˜] x ≤ h3/2 x − x ˜ − sup Q( x −, U (x) −) z∈[x,˜] x