SlideShare une entreprise Scribd logo
1  sur  90
Télécharger pour lire hors ligne
Vanilla Rao–Blackwellisation of
        Metropolis–Hastings algorithms

                    Christian P. Robert
         Universit´ Paris-Dauphine, IuF, and CREST
                  e
Joint works with Randal Douc, Pierre Jacob and Murray Smith




            LGM2012, Trondheim, May 30, 2012



                                                              1 / 32
Main themes



   1   Rao–Blackwellisation on MCMC
   2   Can be performed in any Hastings Metropolis algorithm
   3   Asymptotically more efficient than usual MCMC with a
       controlled additional computing
   4   Takes advantage of parallel capacities at a very basic level
       (GPUs)




                                                                      2 / 32
Main themes



   1   Rao–Blackwellisation on MCMC
   2   Can be performed in any Hastings Metropolis algorithm
   3   Asymptotically more efficient than usual MCMC with a
       controlled additional computing
   4   Takes advantage of parallel capacities at a very basic level
       (GPUs)




                                                                      2 / 32
Main themes



   1   Rao–Blackwellisation on MCMC
   2   Can be performed in any Hastings Metropolis algorithm
   3   Asymptotically more efficient than usual MCMC with a
       controlled additional computing
   4   Takes advantage of parallel capacities at a very basic level
       (GPUs)




                                                                      2 / 32
Main themes



   1   Rao–Blackwellisation on MCMC
   2   Can be performed in any Hastings Metropolis algorithm
   3   Asymptotically more efficient than usual MCMC with a
       controlled additional computing
   4   Takes advantage of parallel capacities at a very basic level
       (GPUs)




                                                                      2 / 32
Metropolis Hastings revisited
                         Rao–Blackwellisation
                      Rao-Blackwellisation (2)


Metropolis Hastings algorithm


    1   We wish to approximate

                                   h(x)π(x)dx
                         I=                   =       h(x)¯ (x)dx
                                                          π
                                     π(x)dx


    2   π(x) is known but not           π(x)dx.
                                   1              n
    3   Approximate I with δ = n t=1 h(x(t) ) where (x(t) ) is a Markov
        chain with limiting distribution π .
                                         ¯
    4   Convergence obtained from Law of Large Numbers or CLT for
        Markov chains.


                                                                          3 / 32
Metropolis Hastings revisited
                         Rao–Blackwellisation
                      Rao-Blackwellisation (2)


Metropolis Hastings algorithm


    1   We wish to approximate

                                   h(x)π(x)dx
                         I=                   =       h(x)¯ (x)dx
                                                          π
                                     π(x)dx


    2   π(x) is known but not           π(x)dx.
                                   1              n
    3   Approximate I with δ = n t=1 h(x(t) ) where (x(t) ) is a Markov
        chain with limiting distribution π .
                                         ¯
    4   Convergence obtained from Law of Large Numbers or CLT for
        Markov chains.


                                                                          3 / 32
Metropolis Hastings revisited
                         Rao–Blackwellisation
                      Rao-Blackwellisation (2)


Metropolis Hastings algorithm


    1   We wish to approximate

                                   h(x)π(x)dx
                         I=                   =       h(x)¯ (x)dx
                                                          π
                                     π(x)dx


    2   π(x) is known but not           π(x)dx.
                                   1              n
    3   Approximate I with δ = n t=1 h(x(t) ) where (x(t) ) is a Markov
        chain with limiting distribution π .
                                         ¯
    4   Convergence obtained from Law of Large Numbers or CLT for
        Markov chains.


                                                                          3 / 32
Metropolis Hastings revisited
                         Rao–Blackwellisation
                      Rao-Blackwellisation (2)


Metropolis Hastings algorithm


    1   We wish to approximate

                                   h(x)π(x)dx
                         I=                   =       h(x)¯ (x)dx
                                                          π
                                     π(x)dx


    2   π(x) is known but not           π(x)dx.
                                   1              n
    3   Approximate I with δ = n t=1 h(x(t) ) where (x(t) ) is a Markov
        chain with limiting distribution π .
                                         ¯
    4   Convergence obtained from Law of Large Numbers or CLT for
        Markov chains.


                                                                          3 / 32
Metropolis Hastings revisited
                          Rao–Blackwellisation
                       Rao-Blackwellisation (2)


Metropolis Hasting Algorithm

  Suppose that x(t) is drawn.
    1   Simulate yt ∼ q(·|x(t) ).
    2   Set x(t+1) = yt with probability

                                                    π(yt ) q(x(t) |yt )
                      α(x(t) , yt ) = min 1,
                                                   π(x(t) ) q(yt |x(t) )

        Otherwise, set x(t+1) = x(t) .
    3   α is such that the detailed balance equation is satisfied: ⊲ π is the
                                                                    ¯
        stationary distribution of (x(t) ).
  ◮ The accepted candidates are simulated with the rejection algorithm.


                                                                               4 / 32
Metropolis Hastings revisited
                          Rao–Blackwellisation
                       Rao-Blackwellisation (2)


Metropolis Hasting Algorithm

  Suppose that x(t) is drawn.
    1   Simulate yt ∼ q(·|x(t) ).
    2   Set x(t+1) = yt with probability

                                                    π(yt ) q(x(t) |yt )
                      α(x(t) , yt ) = min 1,
                                                   π(x(t) ) q(yt |x(t) )

        Otherwise, set x(t+1) = x(t) .
    3   α is such that the detailed balance equation is satisfied: ⊲ π is the
                                                                    ¯
        stationary distribution of (x(t) ).
  ◮ The accepted candidates are simulated with the rejection algorithm.


                                                                               4 / 32
Metropolis Hastings revisited
                           Rao–Blackwellisation
                        Rao-Blackwellisation (2)


Metropolis Hasting Algorithm
  Suppose that x(t) is drawn.
    1   Simulate yt ∼ q(·|x(t) ).
    2   Set x(t+1) = yt with probability

                                                     π(yt ) q(x(t) |yt )
                       α(x(t) , yt ) = min 1,
                                                    π(x(t) ) q(yt |x(t) )

        Otherwise, set x(t+1) = x(t) .
    3   α is such that the detailed balance equation is satisfied:

                        π(x)q(y|x)α(x, y) = π(y)q(x|y)α(y, x).

        ⊲ π is the stationary distribution of (x(t) ).
          ¯
  ◮ The accepted candidates are simulated with the rejection algorithm.
                                                                            4 / 32
Metropolis Hastings revisited
                           Rao–Blackwellisation
                        Rao-Blackwellisation (2)


Metropolis Hasting Algorithm
  Suppose that x(t) is drawn.
    1   Simulate yt ∼ q(·|x(t) ).
    2   Set x(t+1) = yt with probability

                                                     π(yt ) q(x(t) |yt )
                       α(x(t) , yt ) = min 1,
                                                    π(x(t) ) q(yt |x(t) )

        Otherwise, set x(t+1) = x(t) .
    3   α is such that the detailed balance equation is satisfied:

                        π(x)q(y|x)α(x, y) = π(y)q(x|y)α(y, x).

        ⊲ π is the stationary distribution of (x(t) ).
          ¯
  ◮ The accepted candidates are simulated with the rejection algorithm.
                                                                            4 / 32
Metropolis Hastings revisited
                          Rao–Blackwellisation
                       Rao-Blackwellisation (2)


Some properties of the HM algorithm



    1   Alternative representation of the estimator δ is
                                       n                     Mn
                                  1                      1
                           δ=               h(x(t) ) =             ni h(zi ) ,
                                  n   t=1
                                                         n   i=1

        where
             zi ’s are the accepted yj ’s,
             Mn is the number of accepted yj ’s till time n,
             ni is the number of times zi appears in the sequence (x(t) )t .




                                                                                 5 / 32
Metropolis Hastings revisited
                        Rao–Blackwellisation
                     Rao-Blackwellisation (2)




                                     α(zi , ·) q(·|zi )   q(·|zi )
                      q (·|zi ) =
                      ˜                                 ≤          ,
                                          p(zi )           p(zi )

where p(zi ) =   α(zi , y) q(y|zi )dy. To simulate from q (·|zi ):
                                                        ˜
  1   Propose a candidate y ∼ q(·|zi )
  2   Accept with probability

                                                 q(y|zi )
                            q (y|zi )
                            ˜                               = α(zi , y)
                                                  p(zi )

      Otherwise, reject it and starts again.
◮ this is the transition of the HM algorithm.The transition kernel q
                                                                   ˜
admits π as a stationary distribution:
        ˜

                                      π (x)˜(y|x) =
                                      ˜ q


                                                                          6 / 32
Metropolis Hastings revisited
                        Rao–Blackwellisation
                     Rao-Blackwellisation (2)




                                     α(zi , ·) q(·|zi )   q(·|zi )
                      q (·|zi ) =
                      ˜                                 ≤          ,
                                          p(zi )           p(zi )

where p(zi ) =   α(zi , y) q(y|zi )dy. To simulate from q (·|zi ):
                                                        ˜
  1   Propose a candidate y ∼ q(·|zi )
  2   Accept with probability

                                                 q(y|zi )
                            q (y|zi )
                            ˜                               = α(zi , y)
                                                  p(zi )

      Otherwise, reject it and starts again.
◮ this is the transition of the HM algorithm.The transition kernel q
                                                                   ˜
admits π as a stationary distribution:
        ˜

                                      π (x)˜(y|x) =
                                      ˜ q


                                                                          6 / 32
Metropolis Hastings revisited
                        Rao–Blackwellisation
                     Rao-Blackwellisation (2)




                                     α(zi , ·) q(·|zi )   q(·|zi )
                      q (·|zi ) =
                      ˜                                 ≤          ,
                                          p(zi )           p(zi )

where p(zi ) =   α(zi , y) q(y|zi )dy. To simulate from q (·|zi ):
                                                        ˜
  1   Propose a candidate y ∼ q(·|zi )
  2   Accept with probability

                                                 q(y|zi )
                            q (y|zi )
                            ˜                               = α(zi , y)
                                                  p(zi )

      Otherwise, reject it and starts again.
◮ this is the transition of the HM algorithm.The transition kernel q
                                                                   ˜
admits π as a stationary distribution:
        ˜

                                      π (x)˜(y|x) =
                                      ˜ q


                                                                          6 / 32
Metropolis Hastings revisited
                         Rao–Blackwellisation
                      Rao-Blackwellisation (2)




                                      α(zi , ·) q(·|zi )   q(·|zi )
                       q (·|zi ) =
                       ˜                                 ≤          ,
                                           p(zi )           p(zi )
where p(zi ) =    α(zi , y) q(y|zi )dy. To simulate from q (·|zi ):
                                                         ˜
  1   Propose a candidate y ∼ q(·|zi )
  2   Accept with probability
                                                  q(y|zi )
                             q (y|zi )
                             ˜                               = α(zi , y)
                                                   p(zi )
      Otherwise, reject it and starts again.
◮ this is the transition of the HM algorithm.The transition kernel q
                                                                   ˜
admits π as a stationary distribution:
        ˜
                                         π(x)p(x)   α(x, y)q(y|x)
                 π (x)˜(y|x) =
                 ˜ q
                                         π(u)p(u)du     p(x)
                                              π (x)
                                              ˜                  q (y|x)
                                                                 ˜

                                                                           6 / 32
Metropolis Hastings revisited
                        Rao–Blackwellisation
                     Rao-Blackwellisation (2)




                                     α(zi , ·) q(·|zi )   q(·|zi )
                      q (·|zi ) =
                      ˜                                 ≤          ,
                                          p(zi )           p(zi )

where p(zi ) =   α(zi , y) q(y|zi )dy. To simulate from q (·|zi ):
                                                        ˜
  1   Propose a candidate y ∼ q(·|zi )
  2   Accept with probability

                                                 q(y|zi )
                            q (y|zi )
                            ˜                               = α(zi , y)
                                                  p(zi )

      Otherwise, reject it and starts again.
◮ this is the transition of the HM algorithm.The transition kernel q
                                                                   ˜
admits π as a stationary distribution:
        ˜

                                                 π(x)α(x, y)q(y|x)
                       π (x)˜(y|x) =
                       ˜ q
                                                    π(u)p(u)du

                                                                          6 / 32
Metropolis Hastings revisited
                        Rao–Blackwellisation
                     Rao-Blackwellisation (2)




                                     α(zi , ·) q(·|zi )   q(·|zi )
                      q (·|zi ) =
                      ˜                                 ≤          ,
                                          p(zi )           p(zi )

where p(zi ) =   α(zi , y) q(y|zi )dy. To simulate from q (·|zi ):
                                                        ˜
  1   Propose a candidate y ∼ q(·|zi )
  2   Accept with probability

                                                 q(y|zi )
                            q (y|zi )
                            ˜                               = α(zi , y)
                                                  p(zi )

      Otherwise, reject it and starts again.
◮ this is the transition of the HM algorithm.The transition kernel q
                                                                   ˜
admits π as a stationary distribution:
        ˜

                                                 π(y)α(y, x)q(x|y)
                       π (x)˜(y|x) =
                       ˜ q
                                                     π(u)p(u)du

                                                                          6 / 32
Metropolis Hastings revisited
                        Rao–Blackwellisation
                     Rao-Blackwellisation (2)




                                     α(zi , ·) q(·|zi )   q(·|zi )
                      q (·|zi ) =
                      ˜                                 ≤          ,
                                          p(zi )           p(zi )

where p(zi ) =   α(zi , y) q(y|zi )dy. To simulate from q (·|zi ):
                                                        ˜
  1   Propose a candidate y ∼ q(·|zi )
  2   Accept with probability

                                                 q(y|zi )
                            q (y|zi )
                            ˜                               = α(zi , y)
                                                  p(zi )

      Otherwise, reject it and starts again.
◮ this is the transition of the HM algorithm.The transition kernel q
                                                                   ˜
admits π as a stationary distribution:
        ˜

                            π (x)˜(y|x) = π (y)˜(x|y) ,
                            ˜ q           ˜ q


                                                                          6 / 32
Metropolis Hastings revisited
                         Rao–Blackwellisation
                      Rao-Blackwellisation (2)




Lemma (Douc & X., AoS, 2011)

The sequence (zi , ni ) satisfies
  1   (zi , ni )i is a Markov chain;
  2   zi+1 and ni are independent given zi ;
  3   ni is distributed as a geometric random variable with probability
      parameter
                              p(zi ) :=           α(zi , y) q(y|zi ) dy ;      (1)


  4                                                   ˜
      (zi )i is a Markov chain with transition kernel Q(z, dy) = q (y|z)dy
                                                                 ˜
      and stationary distribution π such that
                                  ˜

                   q (·|z) ∝ α(z, ·) q(·|z)
                   ˜                                  and π (·) ∝ π(·)p(·) .
                                                          ˜


                                                                                     7 / 32
Metropolis Hastings revisited
                         Rao–Blackwellisation
                      Rao-Blackwellisation (2)




Lemma (Douc & X., AoS, 2011)

The sequence (zi , ni ) satisfies
  1   (zi , ni )i is a Markov chain;
  2   zi+1 and ni are independent given zi ;
  3   ni is distributed as a geometric random variable with probability
      parameter
                              p(zi ) :=           α(zi , y) q(y|zi ) dy ;      (1)


  4                                                   ˜
      (zi )i is a Markov chain with transition kernel Q(z, dy) = q (y|z)dy
                                                                 ˜
      and stationary distribution π such that
                                  ˜

                   q (·|z) ∝ α(z, ·) q(·|z)
                   ˜                                  and π (·) ∝ π(·)p(·) .
                                                          ˜


                                                                                     7 / 32
Metropolis Hastings revisited
                         Rao–Blackwellisation
                      Rao-Blackwellisation (2)




Lemma (Douc & X., AoS, 2011)

The sequence (zi , ni ) satisfies
  1   (zi , ni )i is a Markov chain;
  2   zi+1 and ni are independent given zi ;
  3   ni is distributed as a geometric random variable with probability
      parameter
                              p(zi ) :=           α(zi , y) q(y|zi ) dy ;      (1)


  4                                                   ˜
      (zi )i is a Markov chain with transition kernel Q(z, dy) = q (y|z)dy
                                                                 ˜
      and stationary distribution π such that
                                  ˜

                   q (·|z) ∝ α(z, ·) q(·|z)
                   ˜                                  and π (·) ∝ π(·)p(·) .
                                                          ˜


                                                                                     7 / 32
Metropolis Hastings revisited
                         Rao–Blackwellisation
                      Rao-Blackwellisation (2)




Lemma (Douc & X., AoS, 2011)

The sequence (zi , ni ) satisfies
  1   (zi , ni )i is a Markov chain;
  2   zi+1 and ni are independent given zi ;
  3   ni is distributed as a geometric random variable with probability
      parameter
                              p(zi ) :=           α(zi , y) q(y|zi ) dy ;      (1)


  4                                                   ˜
      (zi )i is a Markov chain with transition kernel Q(z, dy) = q (y|z)dy
                                                                 ˜
      and stationary distribution π such that
                                  ˜

                   q (·|z) ∝ α(z, ·) q(·|z)
                   ˜                                  and π (·) ∝ π(·)p(·) .
                                                          ˜


                                                                                     7 / 32
Metropolis Hastings revisited
                     Rao–Blackwellisation
                  Rao-Blackwellisation (2)


Old bottle, new wine [or vice-versa]


                     zi−1




                                              8 / 32
Metropolis Hastings revisited
                     Rao–Blackwellisation
                  Rao-Blackwellisation (2)


Old bottle, new wine [or vice-versa]


                                 indep
                    zi−1                      zi


                         indep


                   ni−1




                                                   8 / 32
Metropolis Hastings revisited
                     Rao–Blackwellisation
                  Rao-Blackwellisation (2)


Old bottle, new wine [or vice-versa]


                             indep                    indep
                zi−1                          zi              zi+1


                      indep                        indep


                ni−1                          ni




                                                                     8 / 32
Metropolis Hastings revisited
                     Rao–Blackwellisation
                  Rao-Blackwellisation (2)


Old bottle, new wine [or vice-versa]
                             indep                      indep
                zi−1                          zi                     zi+1


                      indep                        indep


                ni−1                          ni




                              n                         Mn
                         1                          1
                  δ=               h(x(t) ) =                 ni h(zi ) .
                         n   t=1
                                                    n   i=1


                                                                            8 / 32
Metropolis Hastings revisited
                     Rao–Blackwellisation
                  Rao-Blackwellisation (2)


Old bottle, new wine [or vice-versa]
                             indep                      indep
                zi−1                          zi                     zi+1


                      indep                        indep


                ni−1                          ni




                              n                         Mn
                         1                          1
                  δ=               h(x(t) ) =                 ni h(zi ) .
                         n   t=1
                                                    n   i=1


                                                                            8 / 32
Formal importance sampling
                   Metropolis Hastings revisited
                                                       Variance reduction
                          Rao–Blackwellisation
                                                       Asymptotic results
                       Rao-Blackwellisation (2)
                                                       Illustrations


Importance sampling perspective




    1   A natural idea:
                                                       Mn
                                                   1         h(zi )
                                        δ∗ =                        ,
                                                   n   i=1
                                                             p(zi )




                                                                                    9 / 32
Formal importance sampling
                   Metropolis Hastings revisited
                                                   Variance reduction
                          Rao–Blackwellisation
                                                   Asymptotic results
                       Rao-Blackwellisation (2)
                                                   Illustrations


Importance sampling perspective



    1   A natural idea:

                                     Mn    h(zi )        Mn    π(zi )
                                     i=1                 i=1           h(zi )
                                           p(zi )              π (zi )
                                                               ˜
                        δ∗ ≃                      =                           .
                                     Mn      1               Mn π(zi )
                                     i=1                     i=1
                                           p(zi )                 π (zi )
                                                                  ˜




                                                                                  9 / 32
Formal importance sampling
                   Metropolis Hastings revisited
                                                   Variance reduction
                          Rao–Blackwellisation
                                                   Asymptotic results
                       Rao-Blackwellisation (2)
                                                   Illustrations


Importance sampling perspective



    1   A natural idea:

                                     Mn    h(zi )        Mn    π(zi )
                                     i=1                 i=1           h(zi )
                                           p(zi )              π (zi )
                                                               ˜
                        δ∗ ≃                      =                           .
                                     Mn      1               Mn π(zi )
                                     i=1                     i=1
                                           p(zi )                 π (zi )
                                                                  ˜

    2   But p not available in closed form.




                                                                                  9 / 32
Formal importance sampling
                   Metropolis Hastings revisited
                                                   Variance reduction
                          Rao–Blackwellisation
                                                   Asymptotic results
                       Rao-Blackwellisation (2)
                                                   Illustrations


Importance sampling perspective


    1   A natural idea:

                                     Mn    h(zi )        Mn    π(zi )
                                     i=1                 i=1           h(zi )
                                           p(zi )              π (zi )
                                                               ˜
                        δ∗ ≃                      =                           .
                                     Mn      1               Mn π(zi )
                                     i=1                     i=1
                                           p(zi )                 π (zi )
                                                                  ˜

    2   But p not available in closed form.
    3   The geometric ni is the replacement, an obvious solution that is
        used in the original Metropolis–Hastings estimate since
        E[ni ] = 1/p(zi ).



                                                                                  9 / 32
Formal importance sampling
                   Metropolis Hastings revisited
                                                      Variance reduction
                          Rao–Blackwellisation
                                                      Asymptotic results
                       Rao-Blackwellisation (2)
                                                      Illustrations


The Bernoulli factory
  The crude estimate of 1/p(zi ),
                                       ∞
                       ni = 1 +                    I {uℓ ≥ α(zi , yℓ )} ,
                                     j=1 ℓ≤j

  can be improved:

  Lemma (Douc & X., AoS, 2011)
  If (yj )j is an iid sequence with distribution q(y|zi ), the quantity
                                           ∞
                          ˆ
                          ξi = 1 +                   {1 − α(zi , yℓ )}
                                         j=1 ℓ≤j


  is an unbiased estimator of 1/p(zi ) which variance, conditional on zi , is
  lower than the conditional variance of ni , {1 − p(zi )}/p2 (zi ).

                                                                                   10 / 32
Formal importance sampling
                    Metropolis Hastings revisited
                                                     Variance reduction
                           Rao–Blackwellisation
                                                     Asymptotic results
                        Rao-Blackwellisation (2)
                                                     Illustrations


Rao-Blackwellised, for sure?
                                           ∞
                           ˆ
                           ξi = 1 +                 {1 − α(zi , yℓ )}
                                          j=1 ℓ≤j


    1   Infinite sum but finite with at least positive probability:

                                                        π(yt ) q(x(t) |yt )
                       α(x(t) , yt ) = min 1,
                                                       π(x(t) ) q(yt |x(t) )
        For example: take a symmetric random walk as a proposal.
    2   What if we wish to be sure that the sum is finite?
  Finite horizon k version:
                   ∞
        ˆk
        ξi = 1 +                    {1 − α(zi , yj )}                 I {uℓ ≥ α(zi , yℓ )}
                   j=1 1≤ℓ≤k∧j                            k+1≤ℓ≤j

                                                                                             11 / 32
Formal importance sampling
                    Metropolis Hastings revisited
                                                     Variance reduction
                           Rao–Blackwellisation
                                                     Asymptotic results
                        Rao-Blackwellisation (2)
                                                     Illustrations


Rao-Blackwellised, for sure?
                                           ∞
                           ˆ
                           ξi = 1 +                 {1 − α(zi , yℓ )}
                                          j=1 ℓ≤j


    1   Infinite sum but finite with at least positive probability:

                                                        π(yt ) q(x(t) |yt )
                       α(x(t) , yt ) = min 1,
                                                       π(x(t) ) q(yt |x(t) )
        For example: take a symmetric random walk as a proposal.
    2   What if we wish to be sure that the sum is finite?
  Finite horizon k version:
                   ∞
        ˆk
        ξi = 1 +                    {1 − α(zi , yj )}                 I {uℓ ≥ α(zi , yℓ )}
                   j=1 1≤ℓ≤k∧j                            k+1≤ℓ≤j

                                                                                             11 / 32
Formal importance sampling
                   Metropolis Hastings revisited
                                                   Variance reduction
                          Rao–Blackwellisation
                                                   Asymptotic results
                       Rao-Blackwellisation (2)
                                                   Illustrations


Variance improvement



  Proposition (Douc & X., AoS, 2011)
  If (yj )j is an iid sequence with distribution q(y|zi ) and (uj )j is an iid
  uniform sequence, for any k ≥ 0, the quantity
                   ∞
       ˆk
       ξi = 1 +                    {1 − α(zi , yj )}                I {uℓ ≥ α(zi , yℓ )}
                  j=1 1≤ℓ≤k∧j                           k+1≤ℓ≤j


  is an unbiased estimator of 1/p(zi ) with an almost sure finite number of
  terms.




                                                                                           12 / 32
Formal importance sampling
                     Metropolis Hastings revisited
                                                     Variance reduction
                            Rao–Blackwellisation
                                                     Asymptotic results
                         Rao-Blackwellisation (2)
                                                     Illustrations


Variance improvement

  Proposition (Douc & X., AoS, 2011)
  If (yj )j is an iid sequence with distribution q(y|zi ) and (uj )j is an iid
  uniform sequence, for any k ≥ 0, the quantity
                    ∞
       ˆk
       ξi = 1 +                      {1 − α(zi , yj )}                I {uℓ ≥ α(zi , yℓ )}
                    j=1 1≤ℓ≤k∧j                            k+1≤ℓ≤j


  is an unbiased estimator of 1/p(zi ) with an almost sure finite number of
  terms. Moreover, for k ≥ 1,

      ˆk       1 − p(zi )   1 − (1 − 2p(zi ) + r(zi ))k           2 − p(zi )
    V ξi z i =    2 (z )
                          −                                                       (p(zi ) − r(zi )) ,
                 p i              2p(zi ) − r(zi )                  p2 (zi )

  where p(zi ) :=   α(zi , y) q(y|zi ) dy. and r(zi ) :=    α2 (zi , y) q(y|zi ) dy.


                                                                                                        12 / 32
Formal importance sampling
                   Metropolis Hastings revisited
                                                   Variance reduction
                          Rao–Blackwellisation
                                                   Asymptotic results
                       Rao-Blackwellisation (2)
                                                   Illustrations


Variance improvement

  Proposition (Douc & X., AoS, 2011)
  If (yj )j is an iid sequence with distribution q(y|zi ) and (uj )j is an iid
  uniform sequence, for any k ≥ 0, the quantity
                   ∞
       ˆk
       ξi = 1 +                    {1 − α(zi , yj )}                I {uℓ ≥ α(zi , yℓ )}
                  j=1 1≤ℓ≤k∧j                           k+1≤ℓ≤j


  is an unbiased estimator of 1/p(zi ) with an almost sure finite number of
  terms. Therefore, we have

                 ˆ         ˆk        ˆ0
               V ξi zi ≤ V ξi zi ≤ V ξi zi = V [ni | zi ] .




                                                                                           12 / 32
Formal importance sampling
            Metropolis Hastings revisited
                                            Variance reduction
                   Rao–Blackwellisation
                                            Asymptotic results
                Rao-Blackwellisation (2)
                                            Illustrations



                   zi−1




           ∞
ˆk
ξi = 1 +                    {1 − α(zi , yj )}                I {uℓ ≥ α(zi , yℓ )}
           j=1 1≤ℓ≤k∧j                           k+1≤ℓ≤j



                                                                                    13 / 32
Formal importance sampling
            Metropolis Hastings revisited
                                            Variance reduction
                   Rao–Blackwellisation
                                            Asymptotic results
                Rao-Blackwellisation (2)
                                            Illustrations


                            not indep
                  zi−1                      zi


                       not indep


                  ˆk
                  ξi−1




           ∞
ˆk
ξi = 1 +                    {1 − α(zi , yj )}                I {uℓ ≥ α(zi , yℓ )}
           j=1 1≤ℓ≤k∧j                           k+1≤ℓ≤j



                                                                                    13 / 32
Formal importance sampling
            Metropolis Hastings revisited
                                                 Variance reduction
                   Rao–Blackwellisation
                                                 Asymptotic results
                Rao-Blackwellisation (2)
                                                 Illustrations


                        not indep                 not indep
               zi−1                         zi                     zi+1


                   not indep                     not indep


               ˆk
               ξi−1                         ˆk
                                            ξi




           ∞
ˆk
ξi = 1 +                    {1 − α(zi , yj )}                     I {uℓ ≥ α(zi , yℓ )}
           j=1 1≤ℓ≤k∧j                                k+1≤ℓ≤j



                                                                                         13 / 32
Formal importance sampling
Metropolis Hastings revisited
                                     Variance reduction
       Rao–Blackwellisation
                                     Asymptotic results
    Rao-Blackwellisation (2)
                                     Illustrations



            not indep                 not indep
  zi−1                          zi                     zi+1


       not indep                     not indep


  ˆk
  ξi−1                          ˆk
                                ξi




                                M ˆk
                k               i=1 ξi h(zi )
               δM =               M ˆk
                                                .
                                  i=1 ξi




                                                                  13 / 32
Formal importance sampling
Metropolis Hastings revisited
                                     Variance reduction
       Rao–Blackwellisation
                                     Asymptotic results
    Rao-Blackwellisation (2)
                                     Illustrations



            not indep                 not indep
  zi−1                          zi                     zi+1


       not indep                     not indep


  ˆk
  ξi−1                          ˆk
                                ξi




                                M ˆk
                k               i=1 ξi h(zi )
               δM =               M ˆk
                                                .
                                  i=1 ξi




                                                                  13 / 32
Formal importance sampling
               Metropolis Hastings revisited
                                                   Variance reduction
                      Rao–Blackwellisation
                                                   Asymptotic results
                   Rao-Blackwellisation (2)
                                                   Illustrations




Let
                                               M ˆk
                               k               i=1 ξi h(zi )
                              δM =               M ˆk
                                                               .
                                                 i=1 ξi

For any positive function ϕ, we denote Cϕ = {h; |h/ϕ|∞ < ∞}.




                                                                                14 / 32
Formal importance sampling
                    Metropolis Hastings revisited
                                                        Variance reduction
                           Rao–Blackwellisation
                                                        Asymptotic results
                        Rao-Blackwellisation (2)
                                                        Illustrations



Let
                                                    M ˆk
                                    k               i=1 ξi h(zi )
                                   δM =               M ˆk
                                                                    .
                                                      i=1 ξi

For any positive function ϕ, we denote Cϕ = {h; |h/ϕ|∞ < ∞}. Assume
that there exists a positive function ϕ ≥ 1 such that
                                             M
                                             i=1 h(zi )/p(zi )      P
                         ∀h ∈ Cϕ ,             M
                                                                  −→ π(h)
                                               i=1 1/p(zi )



Theorem (Douc & X., AoS, 2011)

Under the assumption that π(p) > 0, the following convergence property holds:
   i) If h is in Cϕ , then

                               k      P
                              δM −→M →∞ π(h) (◮Consistency)



                                                                                     14 / 32
Formal importance sampling
                    Metropolis Hastings revisited
                                                        Variance reduction
                           Rao–Blackwellisation
                                                        Asymptotic results
                        Rao-Blackwellisation (2)
                                                        Illustrations

Let
                                                    M ˆk
                                    k               i=1 ξi h(zi )
                                   δM =               M ˆk
                                                                    .
                                                      i=1 ξi

For any positive function ϕ, we denote Cϕ = {h; |h/ϕ|∞ < ∞}.
Assume that there exists a positive function ψ such that
                          √            M
                                       i=1 h(zi )/p(zi )                 L
           ∀h ∈ Cψ ,          M          M
                                                            − π(h)      −→ N (0, Γ(h))
                                         i=1 1/p(zi )



Theorem (Douc & X., AoS, 2011)

Under the assumption that π(p) > 0, the following convergence property holds:
  ii) If, in addition, h2 /p ∈ Cϕ and h ∈ Cψ , then

                √       k                 L
                    M (δM − π(h)) −→M →∞ N (0, Vk [h − π(h)]) , (◮Clt)


      where Vk (h) := π(p)               ˆk
                                  π(dz)V ξi z h2 (z)p(z) + Γ(h) .

                                                                                         14 / 32
Formal importance sampling
                Metropolis Hastings revisited
                                                Variance reduction
                       Rao–Blackwellisation
                                                Asymptotic results
                    Rao-Blackwellisation (2)
                                                Illustrations

We will need some additional assumptions. Assume a maximal inequality
for the Markov chain (zi )i : there exists a measurable function ζ such that
for any starting point x,
                                                        
                                  i
                                                              nCh (x)
       ∀h ∈ Cζ , Px  sup            [h(zi ) − π (h)] > ǫ ≤
                                               ˜
                          0≤i≤n j=0                             ǫ2


Theorem (Douc & X., AoS, 2011)

Assume that h is such that h/p ∈ Cζ and {Ch/p , h2 /p2 } ⊂ Cφ . Assume
moreover that
              √      0           L
                M δM − π(h) −→ N (0, V0 [h − π(h)]) .

Then, for any starting point x,
                  n
                  t=1   h(x(t) )                n→+∞
         Mn                      − π(h)           −→ N (0, V0 [h − π(h)]) ,
                        n

where Mn is defined by                                                          15 / 32
Formal importance sampling
                Metropolis Hastings revisited
                                                Variance reduction
                       Rao–Blackwellisation
                                                Asymptotic results
                    Rao-Blackwellisation (2)
                                                Illustrations

We will need some additional assumptions. Assume a maximal inequality
for the Markov chain (zi )i : there exists a measurable function ζ such that
for any starting point x,
                                                        
                                  i
                                                              nCh (x)
       ∀h ∈ Cζ , Px  sup            [h(zi ) − π (h)] > ǫ ≤
                                               ˜
                          0≤i≤n j=0                             ǫ2

Moreover, assume that ∃φ ≥ 1 such that for any starting point x,
                              ˜          P
              ∀h ∈ Cφ ,       Qn (x, h) −→ π (h) = π(ph)/π(p) ,
                                           ˜


Theorem (Douc & X., AoS, 2011)

Assume that h is such that h/p ∈ Cζ and {Ch/p , h2 /p2 } ⊂ Cφ . Assume
moreover that
              √      0           L
                M δM − π(h) −→ N (0, V0 [h − π(h)]) .

Then, for any starting point x,
                  n           (t)                                              15 / 32
Formal importance sampling
                Metropolis Hastings revisited
                                                Variance reduction
                       Rao–Blackwellisation
                                                Asymptotic results
                    Rao-Blackwellisation (2)
                                                Illustrations

We will need some additional assumptions. Assume a maximal inequality
for the Markov chain (zi )i : there exists a measurable function ζ such that
for any starting point x,
                                                        
                                  i
                                                              nCh (x)
       ∀h ∈ Cζ , Px  sup            [h(zi ) − π (h)] > ǫ ≤
                                               ˜
                          0≤i≤n j=0                             ǫ2

Moreover, assume that ∃φ ≥ 1 such that for any starting point x,
                              ˜          P
              ∀h ∈ Cφ ,       Qn (x, h) −→ π (h) = π(ph)/π(p) ,
                                           ˜


Theorem (Douc & X., AoS, 2011)

Assume that h is such that h/p ∈ Cζ and {Ch/p , h2 /p2 } ⊂ Cφ . Assume
moreover that
              √      0           L
                M δM − π(h) −→ N (0, V0 [h − π(h)]) .

Then, for any starting point x,
                  n           (t)                                              15 / 32
Formal importance sampling
                  Metropolis Hastings revisited
                                                  Variance reduction
                         Rao–Blackwellisation
                                                  Asymptotic results
                      Rao-Blackwellisation (2)
                                                  Illustrations


                                                                     
                                          i
                                                                               nCh (x)
      ∀h ∈ Cζ ,      Px  sup                 [h(zi ) − π (h)] > ǫ ≤
                                                        ˜
                              0≤i≤n j=0                                          ǫ2

                                ˜          P
              ∀h ∈ Cφ ,         Qn (x, h) −→ π (h) = π(ph)/π(p) ,
                                             ˜


Theorem (Douc & X., AoS, 2011)

Assume that h is such that h/p ∈ Cζ and {Ch/p , h2 /p2 } ⊂ Cφ . Assume
moreover that
              √      0           L
                M δM − π(h) −→ N (0, V0 [h − π(h)]) .

Then, for any starting point x,
                    n
                    t=1   h(x(t) )                n→+∞
         Mn                        − π(h)           −→ N (0, V0 [h − π(h)]) ,
                          n

where Mn is defined by
                                                                                         15 / 32
Formal importance sampling
                Metropolis Hastings revisited
                                                Variance reduction
                       Rao–Blackwellisation
                                                Asymptotic results
                    Rao-Blackwellisation (2)
                                                Illustrations




Theorem (Douc & X., AoS, 2011)

Assume that h is such that h/p ∈ Cζ and {Ch/p , h2 /p2 } ⊂ Cφ . Assume
moreover that
              √      0           L
                M δM − π(h) −→ N (0, V0 [h − π(h)]) .

Then, for any starting point x,
                  n
                  t=1   h(x(t) )                n→+∞
         Mn                      − π(h)           −→ N (0, V0 [h − π(h)]) ,
                        n

where Mn is defined by
                              Mn                Mn +1
                                    ˆ0
                                    ξi ≤ n <            ˆ0
                                                        ξi .
                              i=1                 i=1



                                                                              15 / 32
Formal importance sampling
                 Metropolis Hastings revisited
                                                 Variance reduction
                        Rao–Blackwellisation
                                                 Asymptotic results
                     Rao-Blackwellisation (2)
                                                 Illustrations


Variance gain (1)



                  h(x)          x            x2       IX>0        p(x)
                  τ = .1        0.971        0.953    0.957       0.207
                  τ =2          0.965        0.942    0.875       0.861
                  τ =5          0.913        0.982    0.785       0.826
                  τ =7          0.899        0.982    0.768       0.820

  Ratios of the empirical variances of δ ∞ and δ estimating E[h(X)]:
  100 MCMC iterations over 103 replications of a random walk Gaussian
  proposal with scale τ .




                                                                              16 / 32
Formal importance sampling
                  Metropolis Hastings revisited
                                                  Variance reduction
                         Rao–Blackwellisation
                                                  Asymptotic results
                      Rao-Blackwellisation (2)
                                                  Illustrations


Illustration (1)




  Figure: Overlay of the variations of 250 iid realisations of the estimates
  δ (gold) and δ ∞ (grey) of E[X] = 0 for 1000 iterations, along with the
  90% interquantile range for the estimates δ (brown) and δ ∞ (pink), in
  the setting of a random walk Gaussian proposal with scale τ = 10.
                                                                               17 / 32
Formal importance sampling
                    Metropolis Hastings revisited
                                                       Variance reduction
                           Rao–Blackwellisation
                                                       Asymptotic results
                        Rao-Blackwellisation (2)
                                                       Illustrations


Extra computational effort



                                 median             mean     q.8     q.9      time
                τ   = .25        0.0                8.85     4.9     13       4.2
                τ   = .50        0.0                6.76     4       11       2.25
                τ   = 1.0        0.25               6.15     4       10       2.5
                τ   = 2.0        0.20               5.90     3.5     8.5      4.5

  Additional computing effort due: median and mean numbers of additional
  iterations, 80% and 90% quantiles for the additional iterations, and ratio
  of the average R computing times obtained over 105 simulations




                                                                                     18 / 32
Formal importance sampling
                  Metropolis Hastings revisited
                                                  Variance reduction
                         Rao–Blackwellisation
                                                  Asymptotic results
                      Rao-Blackwellisation (2)
                                                  Illustrations


Illustration (2)




  Figure: Overlay of the variations of 500 iid realisations of the estimates
  δ (deep grey), δ ∞ (medium grey) and of the importance sampling version
  (light grey) of E[X] = 10 when X ∼ Exp(.1) for 100 iterations, along
  with the 90% interquantile ranges (same colour code), in the setting of
  an independent exponential proposal with scale µ = 0.02.                     19 / 32
Metropolis Hastings revisited
                                                  Independent case
                         Rao–Blackwellisation
                                                  General MH algorithms
                      Rao-Blackwellisation (2)


Integrating out white noise [C+X, 96]




  In Casella+X. (1996), averaging over possible past and future histories
  (by integrating out uniforms) to improve weights of accepted values




                                                                            20 / 32
Metropolis Hastings revisited
                                                    Independent case
                           Rao–Blackwellisation
                                                    General MH algorithms
                        Rao-Blackwellisation (2)


Integrating out white noise [C+X, 96]



  In Casella+X. (1996), averaging over possible past and future histories
  (by integrating out uniforms) to improve weights of accepted values
  Given the whole sequence of proposed values yt ∼ µ(yt ), averaging over
  uniforms is possible: starting with y1 , we can compute
                       T                                         T
                1                                           1
                            E[h(Xt )|y1 , . . . , yT ] =              ϕt h(yt )
                T     t=1
                                                            T   t=1

  through a recurrence relation:




                                                                                  20 / 32
Metropolis Hastings revisited
                                                    Independent case
                         Rao–Blackwellisation
                                                    General MH algorithms
                      Rao-Blackwellisation (2)


Integrating out white noise [C+X, 96]
  In Casella+X. (1996), averaging over possible past and future histories
  (by integrating out uniforms) to improve weights of accepted values

                                                                   p
                                            (i)
                                         ϕt        =      δt           ξtj
                                                               j=t
                                                          t−1
                  with      δ0 = 1 ,        δt     =               δj ξj(t−1) ρjt
                                                          j=0

                                                               j
                   and      ξtt = 1 ,             ξtj =                (1 − ρtu )
                                                          u=t+1

  occurence survivals of the yt ’s, associated with Metropolis–Hastings ratio

                  ωt = π(yt )/µ(yt ) ,                 ρtu = ωu /ωt ∧ 1 .

                                                                                    20 / 32
Metropolis Hastings revisited
                                                    Independent case
                         Rao–Blackwellisation
                                                    General MH algorithms
                      Rao-Blackwellisation (2)


Integrating out white noise (2)



  Extension to generic M-H feasible (C+X, 96)
  Potentialy large variance improvement but cost of O(T 2 )...
  Possible recovery of efficiency thanks to parallelisation:
  Moving from (ǫ1 , . . . , ǫp ) towards...

                                      (ǫ(1) , . . . , ǫ(p) )

  by averaging over ”all” possible orders




                                                                            21 / 32
Metropolis Hastings revisited
                                                    Independent case
                         Rao–Blackwellisation
                                                    General MH algorithms
                      Rao-Blackwellisation (2)


Integrating out white noise (2)



  Extension to generic M-H feasible (C+X, 96)
  Potentialy large variance improvement but cost of O(T 2 )...
  Possible recovery of efficiency thanks to parallelisation:
  Moving from (ǫ1 , . . . , ǫp ) towards...

                                      (ǫ(1) , . . . , ǫ(p) )

  by averaging over ”all” possible orders




                                                                            21 / 32
Metropolis Hastings revisited
                                                    Independent case
                         Rao–Blackwellisation
                                                    General MH algorithms
                      Rao-Blackwellisation (2)


Integrating out white noise (2)



  Extension to generic M-H feasible (C+X, 96)
  Potentialy large variance improvement but cost of O(T 2 )...
  Possible recovery of efficiency thanks to parallelisation:
  Moving from (ǫ1 , . . . , ǫp ) towards...

                                      (ǫ(1) , . . . , ǫ(p) )

  by averaging over ”all” possible orders




                                                                            21 / 32
Metropolis Hastings revisited
                                                     Independent case
                         Rao–Blackwellisation
                                                     General MH algorithms
                      Rao-Blackwellisation (2)


Case of the independent Metropolis–Hastings algorithm




  Starting at time t with p processors and a pool of p proposed values,

                                         (y1 , . . . , yp )

  use processors to examine in parallel p different “histories”




                                                                             22 / 32
Metropolis Hastings revisited
                                                     Independent case
                         Rao–Blackwellisation
                                                     General MH algorithms
                      Rao-Blackwellisation (2)


Case of the independent Metropolis–Hastings algorithm


  Starting at time t with p processors and a pool of p proposed values,

                                         (y1 , . . . , yp )

  use processors to examine in parallel p different “histories”




                                                                             22 / 32
Metropolis Hastings revisited
                                                  Independent case
                         Rao–Blackwellisation
                                                  General MH algorithms
                      Rao-Blackwellisation (2)


Improvement


  The standard estimator τ1 of Eπ [h(X)]
                         ˆ
                                                    p
                                             1
                           τ1 (xt , y1:p ) =
                           ˆ                            h(xt+k )
                                             p
                                                  k=1

  is necessarily dominated by the average
                                                    p
                                            1
                          τ2 (xt , y1:p ) = 2
                          ˆ                             nk h(yk )
                                           p
                                                  k=0

  where y0 = xt and n0 is the number of times xt is repeated.




                                                                          23 / 32
Metropolis Hastings revisited
                                                       Independent case
                         Rao–Blackwellisation
                                                       General MH algorithms
                      Rao-Blackwellisation (2)


Further Rao-Blackwellisation
  E.g., use of the Metropolis–Hastings weights wj : j being the index such
  that xt+i−1 = yj , update of the weights at each time t + i:

                           wj = wj + 1 − ρ(xt+i−1 , yi )
                           wi = wi + ρ(xt+i−1 , yi )

  resulting into a more stable estimator
                                                         p
                                                  1
                          τ3 (xt , y1:p ) =
                          ˆ                                  wk h(yk )
                                                  p2
                                                       k=0

  E.g., Casella+X. (1996)
                                                         p
                                                  1
                          τ4 (xt , y1:p ) =
                          ˆ                                  ϕk h(yk )
                                                  p2
                                                       k=0


                                                                               24 / 32
Metropolis Hastings revisited
                                                       Independent case
                         Rao–Blackwellisation
                                                       General MH algorithms
                      Rao-Blackwellisation (2)


Further Rao-Blackwellisation
  E.g., use of the Metropolis–Hastings weights wj : j being the index such
  that xt+i−1 = yj , update of the weights at each time t + i:

                           wj = wj + 1 − ρ(xt+i−1 , yi )
                           wi = wi + ρ(xt+i−1 , yi )

  resulting into a more stable estimator
                                                         p
                                                  1
                          τ3 (xt , y1:p ) =
                          ˆ                                  wk h(yk )
                                                  p2
                                                       k=0

  E.g., Casella+X. (1996)
                                                         p
                                                  1
                          τ4 (xt , y1:p ) =
                          ˆ                                  ϕk h(yk )
                                                  p2
                                                       k=0


                                                                               24 / 32
Metropolis Hastings revisited
                                                 Independent case
                        Rao–Blackwellisation
                                                 General MH algorithms
                     Rao-Blackwellisation (2)


Markovian continuity




  The Markov validity of the chain is not jeopardised! The chain continues
                                                                    (j)
  by picking one sequence at random and taking the corresponding xt+p as
  starting point of the next parallel block.




                                                                             25 / 32
Metropolis Hastings revisited
                                                 Independent case
                        Rao–Blackwellisation
                                                 General MH algorithms
                     Rao-Blackwellisation (2)


Markovian continuity


  The Markov validity of the chain is not jeopardised! The chain continues
                                                                    (j)
  by picking one sequence at random and taking the corresponding xt+p as
  starting point of the next parallel block.




                                                                             25 / 32
Metropolis Hastings revisited
                                                  Independent case
                         Rao–Blackwellisation
                                                  General MH algorithms
                      Rao-Blackwellisation (2)


Impact of Rao-Blackwellisations



  Comparison of
      τ1 basic IMH estimator of Eπ [h(X)],
      ˆ
      τ2 improving by averaging over permutations of proposed values and
      ˆ
      using p times more uniforms
      τ3 improving upon τ2 by basic Rao-Blackwell argument,
      ˆ                 ˆ
      τ4 improving upon τ2 by integrating out ancillary uniforms, at a cost
      ˆ                 ˆ
      of O(p2 ).




                                                                              26 / 32
Metropolis Hastings revisited
                                                 Independent case
                        Rao–Blackwellisation
                                                 General MH algorithms
                     Rao-Blackwellisation (2)


Illustration


  Variations of estimates based on RB and standard versions of parallel
  chains and on a standard MCMC chain for the mean and variance of the
  target N (0, 1) distribution (based on 10, 000 independent replicas).




                                                                          27 / 32
Metropolis Hastings revisited
                                                 Independent case
                        Rao–Blackwellisation
                                                 General MH algorithms
                     Rao-Blackwellisation (2)


Illustration


  Variations of estimates based on RB and standard versions of parallel
  chains and on a standard MCMC chain for the mean and variance of the
  target N (0, 1) distribution (based on 10, 000 independent replicas).




                                                                          27 / 32
Metropolis Hastings revisited
                                                 Independent case
                        Rao–Blackwellisation
                                                 General MH algorithms
                     Rao-Blackwellisation (2)


Illustration


  Variations of estimates based on RB and standard versions of parallel
  chains and on a standard MCMC chain for the mean and variance of the
  target N (0, 1) distribution (based on 10, 000 independent replicas).




                                                                          27 / 32
Metropolis Hastings revisited
                                                 Independent case
                        Rao–Blackwellisation
                                                 General MH algorithms
                     Rao-Blackwellisation (2)


Illustration


  Variations of estimates based on RB and standard versions of parallel
  chains and on a standard MCMC chain for the mean and variance of the
  target N (0, 1) distribution (based on 10, 000 independent replicas).




                                                                          27 / 32
Metropolis Hastings revisited
                                                   Independent case
                          Rao–Blackwellisation
                                                   General MH algorithms
                       Rao-Blackwellisation (2)


Impact of the order


  Parallelisation allows for the partial integration of the uniforms
  What about the permutation order?
  Comparison of
       τ2N with no permutation,
       ˆ
       τ2C with circular permutations,
       ˆ
       τ2R with random permutations,
       ˆ
       τ2H with half-random permutations,
       ˆ
       τ2S with stratified permutations,
       ˆ




                                                                           28 / 32
Metropolis Hastings revisited
                                                   Independent case
                          Rao–Blackwellisation
                                                   General MH algorithms
                       Rao-Blackwellisation (2)


Impact of the order



  Parallelisation allows for the partial integration of the uniforms
  What about the permutation order?




                                                                           28 / 32
Metropolis Hastings revisited
                                                   Independent case
                          Rao–Blackwellisation
                                                   General MH algorithms
                       Rao-Blackwellisation (2)


Impact of the order



  Parallelisation allows for the partial integration of the uniforms
  What about the permutation order?




                                                                           28 / 32
Metropolis Hastings revisited
                                                   Independent case
                          Rao–Blackwellisation
                                                   General MH algorithms
                       Rao-Blackwellisation (2)


Impact of the order



  Parallelisation allows for the partial integration of the uniforms
  What about the permutation order?




                                                                           28 / 32
Metropolis Hastings revisited
                                                   Independent case
                          Rao–Blackwellisation
                                                   General MH algorithms
                       Rao-Blackwellisation (2)


Impact of the order



  Parallelisation allows for the partial integration of the uniforms
  What about the permutation order?




                                                                           28 / 32
Metropolis Hastings revisited
                                                 Independent case
                        Rao–Blackwellisation
                                                 General MH algorithms
                     Rao-Blackwellisation (2)


Importance target


  Comparison with the ultimate importance sampling




                                                                         29 / 32
Metropolis Hastings revisited
                                                 Independent case
                        Rao–Blackwellisation
                                                 General MH algorithms
                     Rao-Blackwellisation (2)


Importance target


  Comparison with the ultimate importance sampling




                                                                         29 / 32
Metropolis Hastings revisited
                                                 Independent case
                        Rao–Blackwellisation
                                                 General MH algorithms
                     Rao-Blackwellisation (2)


Importance target


  Comparison with the ultimate importance sampling




                                                                         29 / 32
Metropolis Hastings revisited
                                                 Independent case
                        Rao–Blackwellisation
                                                 General MH algorithms
                     Rao-Blackwellisation (2)


Importance target


  Comparison with the ultimate importance sampling




                                                                         29 / 32
Metropolis Hastings revisited
                                                     Independent case
                         Rao–Blackwellisation
                                                     General MH algorithms
                      Rao-Blackwellisation (2)


Extension to the general case



  Same principle can be applied to any Markov update: if

                                     xt+1 = Ψ(xt , ǫt )

  then generate
                                         (ǫ1 , . . . , ǫp )
  in advance and distribute to the p processors in different permutation
  orders
  Plus use of Douc & X’s (2011) Rao–Blackwellisation ξi ˆk




                                                                             30 / 32
Metropolis Hastings revisited
                                                     Independent case
                         Rao–Blackwellisation
                                                     General MH algorithms
                      Rao-Blackwellisation (2)


Extension to the general case



  Same principle can be applied to any Markov update: if

                                     xt+1 = Ψ(xt , ǫt )

  then generate
                                         (ǫ1 , . . . , ǫp )
  in advance and distribute to the p processors in different permutation
  orders
  Plus use of Douc & X’s (2011) Rao–Blackwellisation ξi ˆk




                                                                             30 / 32
Metropolis Hastings revisited
                                                        Independent case
                          Rao–Blackwellisation
                                                        General MH algorithms
                       Rao-Blackwellisation (2)


Implementation



                                              (j)
  Similar run of p parallel chains (xt+i ), use of averages
                                                    p     p
                               (1:p)        1                          (j)
                        τ2 (x1:p ) =
                        ˆ                                     nk h(xt+k )
                                            p2
                                                   k=1 j=1

  and selection of new starting value at random at time t + p:




                                                                                31 / 32
Metropolis Hastings revisited
                                                        Independent case
                          Rao–Blackwellisation
                                                        General MH algorithms
                       Rao-Blackwellisation (2)


Implementation
                                              (j)
  Similar run of p parallel chains (xt+i ), use of averages
                                                    p     p
                               (1:p)        1                          (j)
                        τ2 (x1:p ) =
                        ˆ                                     nk h(xt+k )
                                            p2
                                                   k=1 j=1

  and selection of new starting value at random at time t + p:




                                                                                31 / 32
Metropolis Hastings revisited
                                                               Independent case
                                Rao–Blackwellisation
                                                               General MH algorithms
                             Rao-Blackwellisation (2)


Illustration
  Variations of estimates based on RB and standard versions of parallel
  chains and on a standard MCMC chain for the mean and variance of the
  target distribution (based on p = 64 parallel processors, 50 blocs of p
  MCMC steps and 500 independent replicas).




                                                         1.3
                 0.10




                                                         1.2
                 0.05




                                                         1.1
                 0.00




                                                         1.0
                 −0.05




                                                         0.9
                 −0.10




                               RB       par      org                RB      par        org



                                                                                             32 / 32
Metropolis Hastings revisited
                                                               Independent case
                                Rao–Blackwellisation
                                                               General MH algorithms
                             Rao-Blackwellisation (2)


Illustration
  Variations of estimates based on RB and standard versions of parallel
  chains and on a standard MCMC chain for the mean and variance of the
  target distribution (based on p = 64 parallel processors, 50 blocs of p
  MCMC steps and 500 independent replicas).




                                                         1.3
                 0.10




                                                         1.2
                 0.05




                                                         1.1
                 0.00




                                                         1.0
                 −0.05




                                                         0.9
                 −0.10




                               RB       par      org                RB      par        org



                                                                                             32 / 32

Contenu connexe

Tendances

RSS discussion of Girolami and Calderhead, October 13, 2010
RSS discussion of Girolami and Calderhead, October 13, 2010RSS discussion of Girolami and Calderhead, October 13, 2010
RSS discussion of Girolami and Calderhead, October 13, 2010
Christian Robert
 
Introduction to advanced Monte Carlo methods
Introduction to advanced Monte Carlo methodsIntroduction to advanced Monte Carlo methods
Introduction to advanced Monte Carlo methods
Christian Robert
 
Sienna 4 divideandconquer
Sienna 4 divideandconquerSienna 4 divideandconquer
Sienna 4 divideandconquer
chidabdu
 

Tendances (19)

MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methods
 
Fractal dimension versus Computational Complexity
Fractal dimension versus Computational ComplexityFractal dimension versus Computational Complexity
Fractal dimension versus Computational Complexity
 
Ma2520962099
Ma2520962099Ma2520962099
Ma2520962099
 
Unbiased Hamiltonian Monte Carlo
Unbiased Hamiltonian Monte Carlo Unbiased Hamiltonian Monte Carlo
Unbiased Hamiltonian Monte Carlo
 
Fractal Dimension of Space-time Diagrams and the Runtime Complexity of Small ...
Fractal Dimension of Space-time Diagrams and the Runtime Complexity of Small ...Fractal Dimension of Space-time Diagrams and the Runtime Complexity of Small ...
Fractal Dimension of Space-time Diagrams and the Runtime Complexity of Small ...
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conference
 
RSS discussion of Girolami and Calderhead, October 13, 2010
RSS discussion of Girolami and Calderhead, October 13, 2010RSS discussion of Girolami and Calderhead, October 13, 2010
RSS discussion of Girolami and Calderhead, October 13, 2010
 
Metropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short TutorialMetropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short Tutorial
 
Towards a stable definition of Algorithmic Randomness
Towards a stable definition of Algorithmic RandomnessTowards a stable definition of Algorithmic Randomness
Towards a stable definition of Algorithmic Randomness
 
Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010
 
Particle filtering
Particle filteringParticle filtering
Particle filtering
 
Zvonimir Vlah "Lagrangian perturbation theory for large scale structure forma...
Zvonimir Vlah "Lagrangian perturbation theory for large scale structure forma...Zvonimir Vlah "Lagrangian perturbation theory for large scale structure forma...
Zvonimir Vlah "Lagrangian perturbation theory for large scale structure forma...
 
Hastings 1970
Hastings 1970Hastings 1970
Hastings 1970
 
Recent developments on unbiased MCMC
Recent developments on unbiased MCMCRecent developments on unbiased MCMC
Recent developments on unbiased MCMC
 
TIME-ABSTRACTING BISIMULATION FOR MARKOVIAN TIMED AUTOMATA
TIME-ABSTRACTING BISIMULATION FOR MARKOVIAN TIMED AUTOMATATIME-ABSTRACTING BISIMULATION FOR MARKOVIAN TIMED AUTOMATA
TIME-ABSTRACTING BISIMULATION FOR MARKOVIAN TIMED AUTOMATA
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
 
Introduction to advanced Monte Carlo methods
Introduction to advanced Monte Carlo methodsIntroduction to advanced Monte Carlo methods
Introduction to advanced Monte Carlo methods
 
Sienna 4 divideandconquer
Sienna 4 divideandconquerSienna 4 divideandconquer
Sienna 4 divideandconquer
 
Graph Spectra through Network Complexity Measures: Information Content of Eig...
Graph Spectra through Network Complexity Measures: Information Content of Eig...Graph Spectra through Network Complexity Measures: Information Content of Eig...
Graph Spectra through Network Complexity Measures: Information Content of Eig...
 

Similaire à Trondheim, LGM2012

Sns mid term-test2-solution
Sns mid term-test2-solutionSns mid term-test2-solution
Sns mid term-test2-solution
cheekeong1231
 
InternshipReport
InternshipReportInternshipReport
InternshipReport
Hamza Ameur
 

Similaire à Trondheim, LGM2012 (20)

Rdnd2008
Rdnd2008Rdnd2008
Rdnd2008
 
Adc
AdcAdc
Adc
 
Pres metabief2020jmm
Pres metabief2020jmmPres metabief2020jmm
Pres metabief2020jmm
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3
 
Sns mid term-test2-solution
Sns mid term-test2-solutionSns mid term-test2-solution
Sns mid term-test2-solution
 
Dr09 Slide
Dr09 SlideDr09 Slide
Dr09 Slide
 
An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...
 
Unbiased Markov chain Monte Carlo
Unbiased Markov chain Monte CarloUnbiased Markov chain Monte Carlo
Unbiased Markov chain Monte Carlo
 
Unbiased Markov chain Monte Carlo
Unbiased Markov chain Monte CarloUnbiased Markov chain Monte Carlo
Unbiased Markov chain Monte Carlo
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical Methods
 
Shanghai tutorial
Shanghai tutorialShanghai tutorial
Shanghai tutorial
 
Statistical Physics Assignment Help
Statistical Physics Assignment HelpStatistical Physics Assignment Help
Statistical Physics Assignment Help
 
Berans qm overview
Berans qm overviewBerans qm overview
Berans qm overview
 
Least squares support Vector Machine Classifier
Least squares support Vector Machine ClassifierLeast squares support Vector Machine Classifier
Least squares support Vector Machine Classifier
 
InternshipReport
InternshipReportInternshipReport
InternshipReport
 
Some Thoughts on Sampling
Some Thoughts on SamplingSome Thoughts on Sampling
Some Thoughts on Sampling
 
The Gaussian Hardy-Littlewood Maximal Function
The Gaussian Hardy-Littlewood Maximal FunctionThe Gaussian Hardy-Littlewood Maximal Function
The Gaussian Hardy-Littlewood Maximal Function
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...
 
Unbiased MCMC with couplings
Unbiased MCMC with couplingsUnbiased MCMC with couplings
Unbiased MCMC with couplings
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 

Plus de Christian Robert

Plus de Christian Robert (20)

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 

Dernier

Dernier (20)

Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 

Trondheim, LGM2012

  • 1. Vanilla Rao–Blackwellisation of Metropolis–Hastings algorithms Christian P. Robert Universit´ Paris-Dauphine, IuF, and CREST e Joint works with Randal Douc, Pierre Jacob and Murray Smith LGM2012, Trondheim, May 30, 2012 1 / 32
  • 2. Main themes 1 Rao–Blackwellisation on MCMC 2 Can be performed in any Hastings Metropolis algorithm 3 Asymptotically more efficient than usual MCMC with a controlled additional computing 4 Takes advantage of parallel capacities at a very basic level (GPUs) 2 / 32
  • 3. Main themes 1 Rao–Blackwellisation on MCMC 2 Can be performed in any Hastings Metropolis algorithm 3 Asymptotically more efficient than usual MCMC with a controlled additional computing 4 Takes advantage of parallel capacities at a very basic level (GPUs) 2 / 32
  • 4. Main themes 1 Rao–Blackwellisation on MCMC 2 Can be performed in any Hastings Metropolis algorithm 3 Asymptotically more efficient than usual MCMC with a controlled additional computing 4 Takes advantage of parallel capacities at a very basic level (GPUs) 2 / 32
  • 5. Main themes 1 Rao–Blackwellisation on MCMC 2 Can be performed in any Hastings Metropolis algorithm 3 Asymptotically more efficient than usual MCMC with a controlled additional computing 4 Takes advantage of parallel capacities at a very basic level (GPUs) 2 / 32
  • 6. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Metropolis Hastings algorithm 1 We wish to approximate h(x)π(x)dx I= = h(x)¯ (x)dx π π(x)dx 2 π(x) is known but not π(x)dx. 1 n 3 Approximate I with δ = n t=1 h(x(t) ) where (x(t) ) is a Markov chain with limiting distribution π . ¯ 4 Convergence obtained from Law of Large Numbers or CLT for Markov chains. 3 / 32
  • 7. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Metropolis Hastings algorithm 1 We wish to approximate h(x)π(x)dx I= = h(x)¯ (x)dx π π(x)dx 2 π(x) is known but not π(x)dx. 1 n 3 Approximate I with δ = n t=1 h(x(t) ) where (x(t) ) is a Markov chain with limiting distribution π . ¯ 4 Convergence obtained from Law of Large Numbers or CLT for Markov chains. 3 / 32
  • 8. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Metropolis Hastings algorithm 1 We wish to approximate h(x)π(x)dx I= = h(x)¯ (x)dx π π(x)dx 2 π(x) is known but not π(x)dx. 1 n 3 Approximate I with δ = n t=1 h(x(t) ) where (x(t) ) is a Markov chain with limiting distribution π . ¯ 4 Convergence obtained from Law of Large Numbers or CLT for Markov chains. 3 / 32
  • 9. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Metropolis Hastings algorithm 1 We wish to approximate h(x)π(x)dx I= = h(x)¯ (x)dx π π(x)dx 2 π(x) is known but not π(x)dx. 1 n 3 Approximate I with δ = n t=1 h(x(t) ) where (x(t) ) is a Markov chain with limiting distribution π . ¯ 4 Convergence obtained from Law of Large Numbers or CLT for Markov chains. 3 / 32
  • 10. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Metropolis Hasting Algorithm Suppose that x(t) is drawn. 1 Simulate yt ∼ q(·|x(t) ). 2 Set x(t+1) = yt with probability π(yt ) q(x(t) |yt ) α(x(t) , yt ) = min 1, π(x(t) ) q(yt |x(t) ) Otherwise, set x(t+1) = x(t) . 3 α is such that the detailed balance equation is satisfied: ⊲ π is the ¯ stationary distribution of (x(t) ). ◮ The accepted candidates are simulated with the rejection algorithm. 4 / 32
  • 11. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Metropolis Hasting Algorithm Suppose that x(t) is drawn. 1 Simulate yt ∼ q(·|x(t) ). 2 Set x(t+1) = yt with probability π(yt ) q(x(t) |yt ) α(x(t) , yt ) = min 1, π(x(t) ) q(yt |x(t) ) Otherwise, set x(t+1) = x(t) . 3 α is such that the detailed balance equation is satisfied: ⊲ π is the ¯ stationary distribution of (x(t) ). ◮ The accepted candidates are simulated with the rejection algorithm. 4 / 32
  • 12. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Metropolis Hasting Algorithm Suppose that x(t) is drawn. 1 Simulate yt ∼ q(·|x(t) ). 2 Set x(t+1) = yt with probability π(yt ) q(x(t) |yt ) α(x(t) , yt ) = min 1, π(x(t) ) q(yt |x(t) ) Otherwise, set x(t+1) = x(t) . 3 α is such that the detailed balance equation is satisfied: π(x)q(y|x)α(x, y) = π(y)q(x|y)α(y, x). ⊲ π is the stationary distribution of (x(t) ). ¯ ◮ The accepted candidates are simulated with the rejection algorithm. 4 / 32
  • 13. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Metropolis Hasting Algorithm Suppose that x(t) is drawn. 1 Simulate yt ∼ q(·|x(t) ). 2 Set x(t+1) = yt with probability π(yt ) q(x(t) |yt ) α(x(t) , yt ) = min 1, π(x(t) ) q(yt |x(t) ) Otherwise, set x(t+1) = x(t) . 3 α is such that the detailed balance equation is satisfied: π(x)q(y|x)α(x, y) = π(y)q(x|y)α(y, x). ⊲ π is the stationary distribution of (x(t) ). ¯ ◮ The accepted candidates are simulated with the rejection algorithm. 4 / 32
  • 14. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Some properties of the HM algorithm 1 Alternative representation of the estimator δ is n Mn 1 1 δ= h(x(t) ) = ni h(zi ) , n t=1 n i=1 where zi ’s are the accepted yj ’s, Mn is the number of accepted yj ’s till time n, ni is the number of times zi appears in the sequence (x(t) )t . 5 / 32
  • 15. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) α(zi , ·) q(·|zi ) q(·|zi ) q (·|zi ) = ˜ ≤ , p(zi ) p(zi ) where p(zi ) = α(zi , y) q(y|zi )dy. To simulate from q (·|zi ): ˜ 1 Propose a candidate y ∼ q(·|zi ) 2 Accept with probability q(y|zi ) q (y|zi ) ˜ = α(zi , y) p(zi ) Otherwise, reject it and starts again. ◮ this is the transition of the HM algorithm.The transition kernel q ˜ admits π as a stationary distribution: ˜ π (x)˜(y|x) = ˜ q 6 / 32
  • 16. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) α(zi , ·) q(·|zi ) q(·|zi ) q (·|zi ) = ˜ ≤ , p(zi ) p(zi ) where p(zi ) = α(zi , y) q(y|zi )dy. To simulate from q (·|zi ): ˜ 1 Propose a candidate y ∼ q(·|zi ) 2 Accept with probability q(y|zi ) q (y|zi ) ˜ = α(zi , y) p(zi ) Otherwise, reject it and starts again. ◮ this is the transition of the HM algorithm.The transition kernel q ˜ admits π as a stationary distribution: ˜ π (x)˜(y|x) = ˜ q 6 / 32
  • 17. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) α(zi , ·) q(·|zi ) q(·|zi ) q (·|zi ) = ˜ ≤ , p(zi ) p(zi ) where p(zi ) = α(zi , y) q(y|zi )dy. To simulate from q (·|zi ): ˜ 1 Propose a candidate y ∼ q(·|zi ) 2 Accept with probability q(y|zi ) q (y|zi ) ˜ = α(zi , y) p(zi ) Otherwise, reject it and starts again. ◮ this is the transition of the HM algorithm.The transition kernel q ˜ admits π as a stationary distribution: ˜ π (x)˜(y|x) = ˜ q 6 / 32
  • 18. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) α(zi , ·) q(·|zi ) q(·|zi ) q (·|zi ) = ˜ ≤ , p(zi ) p(zi ) where p(zi ) = α(zi , y) q(y|zi )dy. To simulate from q (·|zi ): ˜ 1 Propose a candidate y ∼ q(·|zi ) 2 Accept with probability q(y|zi ) q (y|zi ) ˜ = α(zi , y) p(zi ) Otherwise, reject it and starts again. ◮ this is the transition of the HM algorithm.The transition kernel q ˜ admits π as a stationary distribution: ˜ π(x)p(x) α(x, y)q(y|x) π (x)˜(y|x) = ˜ q π(u)p(u)du p(x) π (x) ˜ q (y|x) ˜ 6 / 32
  • 19. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) α(zi , ·) q(·|zi ) q(·|zi ) q (·|zi ) = ˜ ≤ , p(zi ) p(zi ) where p(zi ) = α(zi , y) q(y|zi )dy. To simulate from q (·|zi ): ˜ 1 Propose a candidate y ∼ q(·|zi ) 2 Accept with probability q(y|zi ) q (y|zi ) ˜ = α(zi , y) p(zi ) Otherwise, reject it and starts again. ◮ this is the transition of the HM algorithm.The transition kernel q ˜ admits π as a stationary distribution: ˜ π(x)α(x, y)q(y|x) π (x)˜(y|x) = ˜ q π(u)p(u)du 6 / 32
  • 20. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) α(zi , ·) q(·|zi ) q(·|zi ) q (·|zi ) = ˜ ≤ , p(zi ) p(zi ) where p(zi ) = α(zi , y) q(y|zi )dy. To simulate from q (·|zi ): ˜ 1 Propose a candidate y ∼ q(·|zi ) 2 Accept with probability q(y|zi ) q (y|zi ) ˜ = α(zi , y) p(zi ) Otherwise, reject it and starts again. ◮ this is the transition of the HM algorithm.The transition kernel q ˜ admits π as a stationary distribution: ˜ π(y)α(y, x)q(x|y) π (x)˜(y|x) = ˜ q π(u)p(u)du 6 / 32
  • 21. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) α(zi , ·) q(·|zi ) q(·|zi ) q (·|zi ) = ˜ ≤ , p(zi ) p(zi ) where p(zi ) = α(zi , y) q(y|zi )dy. To simulate from q (·|zi ): ˜ 1 Propose a candidate y ∼ q(·|zi ) 2 Accept with probability q(y|zi ) q (y|zi ) ˜ = α(zi , y) p(zi ) Otherwise, reject it and starts again. ◮ this is the transition of the HM algorithm.The transition kernel q ˜ admits π as a stationary distribution: ˜ π (x)˜(y|x) = π (y)˜(x|y) , ˜ q ˜ q 6 / 32
  • 22. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Lemma (Douc & X., AoS, 2011) The sequence (zi , ni ) satisfies 1 (zi , ni )i is a Markov chain; 2 zi+1 and ni are independent given zi ; 3 ni is distributed as a geometric random variable with probability parameter p(zi ) := α(zi , y) q(y|zi ) dy ; (1) 4 ˜ (zi )i is a Markov chain with transition kernel Q(z, dy) = q (y|z)dy ˜ and stationary distribution π such that ˜ q (·|z) ∝ α(z, ·) q(·|z) ˜ and π (·) ∝ π(·)p(·) . ˜ 7 / 32
  • 23. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Lemma (Douc & X., AoS, 2011) The sequence (zi , ni ) satisfies 1 (zi , ni )i is a Markov chain; 2 zi+1 and ni are independent given zi ; 3 ni is distributed as a geometric random variable with probability parameter p(zi ) := α(zi , y) q(y|zi ) dy ; (1) 4 ˜ (zi )i is a Markov chain with transition kernel Q(z, dy) = q (y|z)dy ˜ and stationary distribution π such that ˜ q (·|z) ∝ α(z, ·) q(·|z) ˜ and π (·) ∝ π(·)p(·) . ˜ 7 / 32
  • 24. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Lemma (Douc & X., AoS, 2011) The sequence (zi , ni ) satisfies 1 (zi , ni )i is a Markov chain; 2 zi+1 and ni are independent given zi ; 3 ni is distributed as a geometric random variable with probability parameter p(zi ) := α(zi , y) q(y|zi ) dy ; (1) 4 ˜ (zi )i is a Markov chain with transition kernel Q(z, dy) = q (y|z)dy ˜ and stationary distribution π such that ˜ q (·|z) ∝ α(z, ·) q(·|z) ˜ and π (·) ∝ π(·)p(·) . ˜ 7 / 32
  • 25. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Lemma (Douc & X., AoS, 2011) The sequence (zi , ni ) satisfies 1 (zi , ni )i is a Markov chain; 2 zi+1 and ni are independent given zi ; 3 ni is distributed as a geometric random variable with probability parameter p(zi ) := α(zi , y) q(y|zi ) dy ; (1) 4 ˜ (zi )i is a Markov chain with transition kernel Q(z, dy) = q (y|z)dy ˜ and stationary distribution π such that ˜ q (·|z) ∝ α(z, ·) q(·|z) ˜ and π (·) ∝ π(·)p(·) . ˜ 7 / 32
  • 26. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Old bottle, new wine [or vice-versa] zi−1 8 / 32
  • 27. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Old bottle, new wine [or vice-versa] indep zi−1 zi indep ni−1 8 / 32
  • 28. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Old bottle, new wine [or vice-versa] indep indep zi−1 zi zi+1 indep indep ni−1 ni 8 / 32
  • 29. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Old bottle, new wine [or vice-versa] indep indep zi−1 zi zi+1 indep indep ni−1 ni n Mn 1 1 δ= h(x(t) ) = ni h(zi ) . n t=1 n i=1 8 / 32
  • 30. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Old bottle, new wine [or vice-versa] indep indep zi−1 zi zi+1 indep indep ni−1 ni n Mn 1 1 δ= h(x(t) ) = ni h(zi ) . n t=1 n i=1 8 / 32
  • 31. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Importance sampling perspective 1 A natural idea: Mn 1 h(zi ) δ∗ = , n i=1 p(zi ) 9 / 32
  • 32. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Importance sampling perspective 1 A natural idea: Mn h(zi ) Mn π(zi ) i=1 i=1 h(zi ) p(zi ) π (zi ) ˜ δ∗ ≃ = . Mn 1 Mn π(zi ) i=1 i=1 p(zi ) π (zi ) ˜ 9 / 32
  • 33. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Importance sampling perspective 1 A natural idea: Mn h(zi ) Mn π(zi ) i=1 i=1 h(zi ) p(zi ) π (zi ) ˜ δ∗ ≃ = . Mn 1 Mn π(zi ) i=1 i=1 p(zi ) π (zi ) ˜ 2 But p not available in closed form. 9 / 32
  • 34. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Importance sampling perspective 1 A natural idea: Mn h(zi ) Mn π(zi ) i=1 i=1 h(zi ) p(zi ) π (zi ) ˜ δ∗ ≃ = . Mn 1 Mn π(zi ) i=1 i=1 p(zi ) π (zi ) ˜ 2 But p not available in closed form. 3 The geometric ni is the replacement, an obvious solution that is used in the original Metropolis–Hastings estimate since E[ni ] = 1/p(zi ). 9 / 32
  • 35. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations The Bernoulli factory The crude estimate of 1/p(zi ), ∞ ni = 1 + I {uℓ ≥ α(zi , yℓ )} , j=1 ℓ≤j can be improved: Lemma (Douc & X., AoS, 2011) If (yj )j is an iid sequence with distribution q(y|zi ), the quantity ∞ ˆ ξi = 1 + {1 − α(zi , yℓ )} j=1 ℓ≤j is an unbiased estimator of 1/p(zi ) which variance, conditional on zi , is lower than the conditional variance of ni , {1 − p(zi )}/p2 (zi ). 10 / 32
  • 36. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Rao-Blackwellised, for sure? ∞ ˆ ξi = 1 + {1 − α(zi , yℓ )} j=1 ℓ≤j 1 Infinite sum but finite with at least positive probability: π(yt ) q(x(t) |yt ) α(x(t) , yt ) = min 1, π(x(t) ) q(yt |x(t) ) For example: take a symmetric random walk as a proposal. 2 What if we wish to be sure that the sum is finite? Finite horizon k version: ∞ ˆk ξi = 1 + {1 − α(zi , yj )} I {uℓ ≥ α(zi , yℓ )} j=1 1≤ℓ≤k∧j k+1≤ℓ≤j 11 / 32
  • 37. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Rao-Blackwellised, for sure? ∞ ˆ ξi = 1 + {1 − α(zi , yℓ )} j=1 ℓ≤j 1 Infinite sum but finite with at least positive probability: π(yt ) q(x(t) |yt ) α(x(t) , yt ) = min 1, π(x(t) ) q(yt |x(t) ) For example: take a symmetric random walk as a proposal. 2 What if we wish to be sure that the sum is finite? Finite horizon k version: ∞ ˆk ξi = 1 + {1 − α(zi , yj )} I {uℓ ≥ α(zi , yℓ )} j=1 1≤ℓ≤k∧j k+1≤ℓ≤j 11 / 32
  • 38. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Variance improvement Proposition (Douc & X., AoS, 2011) If (yj )j is an iid sequence with distribution q(y|zi ) and (uj )j is an iid uniform sequence, for any k ≥ 0, the quantity ∞ ˆk ξi = 1 + {1 − α(zi , yj )} I {uℓ ≥ α(zi , yℓ )} j=1 1≤ℓ≤k∧j k+1≤ℓ≤j is an unbiased estimator of 1/p(zi ) with an almost sure finite number of terms. 12 / 32
  • 39. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Variance improvement Proposition (Douc & X., AoS, 2011) If (yj )j is an iid sequence with distribution q(y|zi ) and (uj )j is an iid uniform sequence, for any k ≥ 0, the quantity ∞ ˆk ξi = 1 + {1 − α(zi , yj )} I {uℓ ≥ α(zi , yℓ )} j=1 1≤ℓ≤k∧j k+1≤ℓ≤j is an unbiased estimator of 1/p(zi ) with an almost sure finite number of terms. Moreover, for k ≥ 1, ˆk 1 − p(zi ) 1 − (1 − 2p(zi ) + r(zi ))k 2 − p(zi ) V ξi z i = 2 (z ) − (p(zi ) − r(zi )) , p i 2p(zi ) − r(zi ) p2 (zi ) where p(zi ) := α(zi , y) q(y|zi ) dy. and r(zi ) := α2 (zi , y) q(y|zi ) dy. 12 / 32
  • 40. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Variance improvement Proposition (Douc & X., AoS, 2011) If (yj )j is an iid sequence with distribution q(y|zi ) and (uj )j is an iid uniform sequence, for any k ≥ 0, the quantity ∞ ˆk ξi = 1 + {1 − α(zi , yj )} I {uℓ ≥ α(zi , yℓ )} j=1 1≤ℓ≤k∧j k+1≤ℓ≤j is an unbiased estimator of 1/p(zi ) with an almost sure finite number of terms. Therefore, we have ˆ ˆk ˆ0 V ξi zi ≤ V ξi zi ≤ V ξi zi = V [ni | zi ] . 12 / 32
  • 41. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations zi−1 ∞ ˆk ξi = 1 + {1 − α(zi , yj )} I {uℓ ≥ α(zi , yℓ )} j=1 1≤ℓ≤k∧j k+1≤ℓ≤j 13 / 32
  • 42. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations not indep zi−1 zi not indep ˆk ξi−1 ∞ ˆk ξi = 1 + {1 − α(zi , yj )} I {uℓ ≥ α(zi , yℓ )} j=1 1≤ℓ≤k∧j k+1≤ℓ≤j 13 / 32
  • 43. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations not indep not indep zi−1 zi zi+1 not indep not indep ˆk ξi−1 ˆk ξi ∞ ˆk ξi = 1 + {1 − α(zi , yj )} I {uℓ ≥ α(zi , yℓ )} j=1 1≤ℓ≤k∧j k+1≤ℓ≤j 13 / 32
  • 44. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations not indep not indep zi−1 zi zi+1 not indep not indep ˆk ξi−1 ˆk ξi M ˆk k i=1 ξi h(zi ) δM = M ˆk . i=1 ξi 13 / 32
  • 45. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations not indep not indep zi−1 zi zi+1 not indep not indep ˆk ξi−1 ˆk ξi M ˆk k i=1 ξi h(zi ) δM = M ˆk . i=1 ξi 13 / 32
  • 46. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Let M ˆk k i=1 ξi h(zi ) δM = M ˆk . i=1 ξi For any positive function ϕ, we denote Cϕ = {h; |h/ϕ|∞ < ∞}. 14 / 32
  • 47. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Let M ˆk k i=1 ξi h(zi ) δM = M ˆk . i=1 ξi For any positive function ϕ, we denote Cϕ = {h; |h/ϕ|∞ < ∞}. Assume that there exists a positive function ϕ ≥ 1 such that M i=1 h(zi )/p(zi ) P ∀h ∈ Cϕ , M −→ π(h) i=1 1/p(zi ) Theorem (Douc & X., AoS, 2011) Under the assumption that π(p) > 0, the following convergence property holds: i) If h is in Cϕ , then k P δM −→M →∞ π(h) (◮Consistency) 14 / 32
  • 48. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Let M ˆk k i=1 ξi h(zi ) δM = M ˆk . i=1 ξi For any positive function ϕ, we denote Cϕ = {h; |h/ϕ|∞ < ∞}. Assume that there exists a positive function ψ such that √ M i=1 h(zi )/p(zi ) L ∀h ∈ Cψ , M M − π(h) −→ N (0, Γ(h)) i=1 1/p(zi ) Theorem (Douc & X., AoS, 2011) Under the assumption that π(p) > 0, the following convergence property holds: ii) If, in addition, h2 /p ∈ Cϕ and h ∈ Cψ , then √ k L M (δM − π(h)) −→M →∞ N (0, Vk [h − π(h)]) , (◮Clt) where Vk (h) := π(p) ˆk π(dz)V ξi z h2 (z)p(z) + Γ(h) . 14 / 32
  • 49. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations We will need some additional assumptions. Assume a maximal inequality for the Markov chain (zi )i : there exists a measurable function ζ such that for any starting point x,   i nCh (x) ∀h ∈ Cζ , Px  sup [h(zi ) − π (h)] > ǫ ≤ ˜ 0≤i≤n j=0 ǫ2 Theorem (Douc & X., AoS, 2011) Assume that h is such that h/p ∈ Cζ and {Ch/p , h2 /p2 } ⊂ Cφ . Assume moreover that √ 0 L M δM − π(h) −→ N (0, V0 [h − π(h)]) . Then, for any starting point x, n t=1 h(x(t) ) n→+∞ Mn − π(h) −→ N (0, V0 [h − π(h)]) , n where Mn is defined by 15 / 32
  • 50. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations We will need some additional assumptions. Assume a maximal inequality for the Markov chain (zi )i : there exists a measurable function ζ such that for any starting point x,   i nCh (x) ∀h ∈ Cζ , Px  sup [h(zi ) − π (h)] > ǫ ≤ ˜ 0≤i≤n j=0 ǫ2 Moreover, assume that ∃φ ≥ 1 such that for any starting point x, ˜ P ∀h ∈ Cφ , Qn (x, h) −→ π (h) = π(ph)/π(p) , ˜ Theorem (Douc & X., AoS, 2011) Assume that h is such that h/p ∈ Cζ and {Ch/p , h2 /p2 } ⊂ Cφ . Assume moreover that √ 0 L M δM − π(h) −→ N (0, V0 [h − π(h)]) . Then, for any starting point x, n (t) 15 / 32
  • 51. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations We will need some additional assumptions. Assume a maximal inequality for the Markov chain (zi )i : there exists a measurable function ζ such that for any starting point x,   i nCh (x) ∀h ∈ Cζ , Px  sup [h(zi ) − π (h)] > ǫ ≤ ˜ 0≤i≤n j=0 ǫ2 Moreover, assume that ∃φ ≥ 1 such that for any starting point x, ˜ P ∀h ∈ Cφ , Qn (x, h) −→ π (h) = π(ph)/π(p) , ˜ Theorem (Douc & X., AoS, 2011) Assume that h is such that h/p ∈ Cζ and {Ch/p , h2 /p2 } ⊂ Cφ . Assume moreover that √ 0 L M δM − π(h) −→ N (0, V0 [h − π(h)]) . Then, for any starting point x, n (t) 15 / 32
  • 52. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations   i nCh (x) ∀h ∈ Cζ , Px  sup [h(zi ) − π (h)] > ǫ ≤ ˜ 0≤i≤n j=0 ǫ2 ˜ P ∀h ∈ Cφ , Qn (x, h) −→ π (h) = π(ph)/π(p) , ˜ Theorem (Douc & X., AoS, 2011) Assume that h is such that h/p ∈ Cζ and {Ch/p , h2 /p2 } ⊂ Cφ . Assume moreover that √ 0 L M δM − π(h) −→ N (0, V0 [h − π(h)]) . Then, for any starting point x, n t=1 h(x(t) ) n→+∞ Mn − π(h) −→ N (0, V0 [h − π(h)]) , n where Mn is defined by 15 / 32
  • 53. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Theorem (Douc & X., AoS, 2011) Assume that h is such that h/p ∈ Cζ and {Ch/p , h2 /p2 } ⊂ Cφ . Assume moreover that √ 0 L M δM − π(h) −→ N (0, V0 [h − π(h)]) . Then, for any starting point x, n t=1 h(x(t) ) n→+∞ Mn − π(h) −→ N (0, V0 [h − π(h)]) , n where Mn is defined by Mn Mn +1 ˆ0 ξi ≤ n < ˆ0 ξi . i=1 i=1 15 / 32
  • 54. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Variance gain (1) h(x) x x2 IX>0 p(x) τ = .1 0.971 0.953 0.957 0.207 τ =2 0.965 0.942 0.875 0.861 τ =5 0.913 0.982 0.785 0.826 τ =7 0.899 0.982 0.768 0.820 Ratios of the empirical variances of δ ∞ and δ estimating E[h(X)]: 100 MCMC iterations over 103 replications of a random walk Gaussian proposal with scale τ . 16 / 32
  • 55. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Illustration (1) Figure: Overlay of the variations of 250 iid realisations of the estimates δ (gold) and δ ∞ (grey) of E[X] = 0 for 1000 iterations, along with the 90% interquantile range for the estimates δ (brown) and δ ∞ (pink), in the setting of a random walk Gaussian proposal with scale τ = 10. 17 / 32
  • 56. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Extra computational effort median mean q.8 q.9 time τ = .25 0.0 8.85 4.9 13 4.2 τ = .50 0.0 6.76 4 11 2.25 τ = 1.0 0.25 6.15 4 10 2.5 τ = 2.0 0.20 5.90 3.5 8.5 4.5 Additional computing effort due: median and mean numbers of additional iterations, 80% and 90% quantiles for the additional iterations, and ratio of the average R computing times obtained over 105 simulations 18 / 32
  • 57. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Illustration (2) Figure: Overlay of the variations of 500 iid realisations of the estimates δ (deep grey), δ ∞ (medium grey) and of the importance sampling version (light grey) of E[X] = 10 when X ∼ Exp(.1) for 100 iterations, along with the 90% interquantile ranges (same colour code), in the setting of an independent exponential proposal with scale µ = 0.02. 19 / 32
  • 58. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Integrating out white noise [C+X, 96] In Casella+X. (1996), averaging over possible past and future histories (by integrating out uniforms) to improve weights of accepted values 20 / 32
  • 59. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Integrating out white noise [C+X, 96] In Casella+X. (1996), averaging over possible past and future histories (by integrating out uniforms) to improve weights of accepted values Given the whole sequence of proposed values yt ∼ µ(yt ), averaging over uniforms is possible: starting with y1 , we can compute T T 1 1 E[h(Xt )|y1 , . . . , yT ] = ϕt h(yt ) T t=1 T t=1 through a recurrence relation: 20 / 32
  • 60. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Integrating out white noise [C+X, 96] In Casella+X. (1996), averaging over possible past and future histories (by integrating out uniforms) to improve weights of accepted values p (i) ϕt = δt ξtj j=t t−1 with δ0 = 1 , δt = δj ξj(t−1) ρjt j=0 j and ξtt = 1 , ξtj = (1 − ρtu ) u=t+1 occurence survivals of the yt ’s, associated with Metropolis–Hastings ratio ωt = π(yt )/µ(yt ) , ρtu = ωu /ωt ∧ 1 . 20 / 32
  • 61. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Integrating out white noise (2) Extension to generic M-H feasible (C+X, 96) Potentialy large variance improvement but cost of O(T 2 )... Possible recovery of efficiency thanks to parallelisation: Moving from (ǫ1 , . . . , ǫp ) towards... (ǫ(1) , . . . , ǫ(p) ) by averaging over ”all” possible orders 21 / 32
  • 62. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Integrating out white noise (2) Extension to generic M-H feasible (C+X, 96) Potentialy large variance improvement but cost of O(T 2 )... Possible recovery of efficiency thanks to parallelisation: Moving from (ǫ1 , . . . , ǫp ) towards... (ǫ(1) , . . . , ǫ(p) ) by averaging over ”all” possible orders 21 / 32
  • 63. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Integrating out white noise (2) Extension to generic M-H feasible (C+X, 96) Potentialy large variance improvement but cost of O(T 2 )... Possible recovery of efficiency thanks to parallelisation: Moving from (ǫ1 , . . . , ǫp ) towards... (ǫ(1) , . . . , ǫ(p) ) by averaging over ”all” possible orders 21 / 32
  • 64. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Case of the independent Metropolis–Hastings algorithm Starting at time t with p processors and a pool of p proposed values, (y1 , . . . , yp ) use processors to examine in parallel p different “histories” 22 / 32
  • 65. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Case of the independent Metropolis–Hastings algorithm Starting at time t with p processors and a pool of p proposed values, (y1 , . . . , yp ) use processors to examine in parallel p different “histories” 22 / 32
  • 66. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Improvement The standard estimator τ1 of Eπ [h(X)] ˆ p 1 τ1 (xt , y1:p ) = ˆ h(xt+k ) p k=1 is necessarily dominated by the average p 1 τ2 (xt , y1:p ) = 2 ˆ nk h(yk ) p k=0 where y0 = xt and n0 is the number of times xt is repeated. 23 / 32
  • 67. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Further Rao-Blackwellisation E.g., use of the Metropolis–Hastings weights wj : j being the index such that xt+i−1 = yj , update of the weights at each time t + i: wj = wj + 1 − ρ(xt+i−1 , yi ) wi = wi + ρ(xt+i−1 , yi ) resulting into a more stable estimator p 1 τ3 (xt , y1:p ) = ˆ wk h(yk ) p2 k=0 E.g., Casella+X. (1996) p 1 τ4 (xt , y1:p ) = ˆ ϕk h(yk ) p2 k=0 24 / 32
  • 68. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Further Rao-Blackwellisation E.g., use of the Metropolis–Hastings weights wj : j being the index such that xt+i−1 = yj , update of the weights at each time t + i: wj = wj + 1 − ρ(xt+i−1 , yi ) wi = wi + ρ(xt+i−1 , yi ) resulting into a more stable estimator p 1 τ3 (xt , y1:p ) = ˆ wk h(yk ) p2 k=0 E.g., Casella+X. (1996) p 1 τ4 (xt , y1:p ) = ˆ ϕk h(yk ) p2 k=0 24 / 32
  • 69. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Markovian continuity The Markov validity of the chain is not jeopardised! The chain continues (j) by picking one sequence at random and taking the corresponding xt+p as starting point of the next parallel block. 25 / 32
  • 70. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Markovian continuity The Markov validity of the chain is not jeopardised! The chain continues (j) by picking one sequence at random and taking the corresponding xt+p as starting point of the next parallel block. 25 / 32
  • 71. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Impact of Rao-Blackwellisations Comparison of τ1 basic IMH estimator of Eπ [h(X)], ˆ τ2 improving by averaging over permutations of proposed values and ˆ using p times more uniforms τ3 improving upon τ2 by basic Rao-Blackwell argument, ˆ ˆ τ4 improving upon τ2 by integrating out ancillary uniforms, at a cost ˆ ˆ of O(p2 ). 26 / 32
  • 72. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Illustration Variations of estimates based on RB and standard versions of parallel chains and on a standard MCMC chain for the mean and variance of the target N (0, 1) distribution (based on 10, 000 independent replicas). 27 / 32
  • 73. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Illustration Variations of estimates based on RB and standard versions of parallel chains and on a standard MCMC chain for the mean and variance of the target N (0, 1) distribution (based on 10, 000 independent replicas). 27 / 32
  • 74. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Illustration Variations of estimates based on RB and standard versions of parallel chains and on a standard MCMC chain for the mean and variance of the target N (0, 1) distribution (based on 10, 000 independent replicas). 27 / 32
  • 75. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Illustration Variations of estimates based on RB and standard versions of parallel chains and on a standard MCMC chain for the mean and variance of the target N (0, 1) distribution (based on 10, 000 independent replicas). 27 / 32
  • 76. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Impact of the order Parallelisation allows for the partial integration of the uniforms What about the permutation order? Comparison of τ2N with no permutation, ˆ τ2C with circular permutations, ˆ τ2R with random permutations, ˆ τ2H with half-random permutations, ˆ τ2S with stratified permutations, ˆ 28 / 32
  • 77. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Impact of the order Parallelisation allows for the partial integration of the uniforms What about the permutation order? 28 / 32
  • 78. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Impact of the order Parallelisation allows for the partial integration of the uniforms What about the permutation order? 28 / 32
  • 79. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Impact of the order Parallelisation allows for the partial integration of the uniforms What about the permutation order? 28 / 32
  • 80. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Impact of the order Parallelisation allows for the partial integration of the uniforms What about the permutation order? 28 / 32
  • 81. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Importance target Comparison with the ultimate importance sampling 29 / 32
  • 82. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Importance target Comparison with the ultimate importance sampling 29 / 32
  • 83. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Importance target Comparison with the ultimate importance sampling 29 / 32
  • 84. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Importance target Comparison with the ultimate importance sampling 29 / 32
  • 85. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Extension to the general case Same principle can be applied to any Markov update: if xt+1 = Ψ(xt , ǫt ) then generate (ǫ1 , . . . , ǫp ) in advance and distribute to the p processors in different permutation orders Plus use of Douc & X’s (2011) Rao–Blackwellisation ξi ˆk 30 / 32
  • 86. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Extension to the general case Same principle can be applied to any Markov update: if xt+1 = Ψ(xt , ǫt ) then generate (ǫ1 , . . . , ǫp ) in advance and distribute to the p processors in different permutation orders Plus use of Douc & X’s (2011) Rao–Blackwellisation ξi ˆk 30 / 32
  • 87. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Implementation (j) Similar run of p parallel chains (xt+i ), use of averages p p (1:p) 1 (j) τ2 (x1:p ) = ˆ nk h(xt+k ) p2 k=1 j=1 and selection of new starting value at random at time t + p: 31 / 32
  • 88. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Implementation (j) Similar run of p parallel chains (xt+i ), use of averages p p (1:p) 1 (j) τ2 (x1:p ) = ˆ nk h(xt+k ) p2 k=1 j=1 and selection of new starting value at random at time t + p: 31 / 32
  • 89. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Illustration Variations of estimates based on RB and standard versions of parallel chains and on a standard MCMC chain for the mean and variance of the target distribution (based on p = 64 parallel processors, 50 blocs of p MCMC steps and 500 independent replicas). 1.3 0.10 1.2 0.05 1.1 0.00 1.0 −0.05 0.9 −0.10 RB par org RB par org 32 / 32
  • 90. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Illustration Variations of estimates based on RB and standard versions of parallel chains and on a standard MCMC chain for the mean and variance of the target distribution (based on p = 64 parallel processors, 50 blocs of p MCMC steps and 500 independent replicas). 1.3 0.10 1.2 0.05 1.1 0.00 1.0 −0.05 0.9 −0.10 RB par org RB par org 32 / 32