Trondheim, LGM2012

1. Vanilla Rao–Blackwellisation of Metropolis–Hastings algorithms Christian P. Robert Universit´ Paris-Dauphine, IuF, and CREST e Joint works with Randal Douc, Pierre Jacob and Murray Smith LGM2012, Trondheim, May 30, 2012 1 / 32

2. Main themes 1 Rao–Blackwellisation on MCMC 2 Can be performed in any Hastings Metropolis algorithm 3 Asymptotically more eﬃcient than usual MCMC with a controlled additional computing 4 Takes advantage of parallel capacities at a very basic level (GPUs) 2 / 32

6. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Metropolis Hastings algorithm 1 We wish to approximate h(x)π(x)dx I= = h(x)¯ (x)dx π π(x)dx 2 π(x) is known but not π(x)dx. 1 n 3 Approximate I with δ = n t=1 h(x(t) ) where (x(t) ) is a Markov chain with limiting distribution π . ¯ 4 Convergence obtained from Law of Large Numbers or CLT for Markov chains. 3 / 32

10. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Metropolis Hasting Algorithm Suppose that x(t) is drawn. 1 Simulate yt ∼ q(·|x(t) ). 2 Set x(t+1) = yt with probability π(yt ) q(x(t) |yt ) α(x(t) , yt ) = min 1, π(x(t) ) q(yt |x(t) ) Otherwise, set x(t+1) = x(t) . 3 α is such that the detailed balance equation is satisﬁed: ⊲ π is the ¯ stationary distribution of (x(t) ). ◮ The accepted candidates are simulated with the rejection algorithm. 4 / 32

11. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Metropolis Hasting Algorithm Suppose that x(t) is drawn. 1 Simulate yt ∼ q(·|x(t) ). 2 Set x(t+1) = yt with probability π(yt ) q(x(t) |yt ) α(x(t) , yt ) = min 1, π(x(t) ) q(yt |x(t) ) Otherwise, set x(t+1) = x(t) . 3 α is such that the detailed balance equation is satisﬁed: ⊲ π is the ¯ stationary distribution of (x(t) ). ◮ The accepted candidates are simulated with the rejection algorithm. 4 / 32

12. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Metropolis Hasting Algorithm Suppose that x(t) is drawn. 1 Simulate yt ∼ q(·|x(t) ). 2 Set x(t+1) = yt with probability π(yt ) q(x(t) |yt ) α(x(t) , yt ) = min 1, π(x(t) ) q(yt |x(t) ) Otherwise, set x(t+1) = x(t) . 3 α is such that the detailed balance equation is satisﬁed: π(x)q(y|x)α(x, y) = π(y)q(x|y)α(y, x). ⊲ π is the stationary distribution of (x(t) ). ¯ ◮ The accepted candidates are simulated with the rejection algorithm. 4 / 32

13. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Metropolis Hasting Algorithm Suppose that x(t) is drawn. 1 Simulate yt ∼ q(·|x(t) ). 2 Set x(t+1) = yt with probability π(yt ) q(x(t) |yt ) α(x(t) , yt ) = min 1, π(x(t) ) q(yt |x(t) ) Otherwise, set x(t+1) = x(t) . 3 α is such that the detailed balance equation is satisﬁed: π(x)q(y|x)α(x, y) = π(y)q(x|y)α(y, x). ⊲ π is the stationary distribution of (x(t) ). ¯ ◮ The accepted candidates are simulated with the rejection algorithm. 4 / 32

14. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Some properties of the HM algorithm 1 Alternative representation of the estimator δ is n Mn 1 1 δ= h(x(t) ) = ni h(zi ) , n t=1 n i=1 where zi ’s are the accepted yj ’s, Mn is the number of accepted yj ’s till time n, ni is the number of times zi appears in the sequence (x(t) )t . 5 / 32

15. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) α(zi , ·) q(·|zi ) q(·|zi ) q (·|zi ) = ˜ ≤ , p(zi ) p(zi ) where p(zi ) = α(zi , y) q(y|zi )dy. To simulate from q (·|zi ): ˜ 1 Propose a candidate y ∼ q(·|zi ) 2 Accept with probability q(y|zi ) q (y|zi ) ˜ = α(zi , y) p(zi ) Otherwise, reject it and starts again. ◮ this is the transition of the HM algorithm.The transition kernel q ˜ admits π as a stationary distribution: ˜ π (x)˜(y|x) = ˜ q 6 / 32

18. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) α(zi , ·) q(·|zi ) q(·|zi ) q (·|zi ) = ˜ ≤ , p(zi ) p(zi ) where p(zi ) = α(zi , y) q(y|zi )dy. To simulate from q (·|zi ): ˜ 1 Propose a candidate y ∼ q(·|zi ) 2 Accept with probability q(y|zi ) q (y|zi ) ˜ = α(zi , y) p(zi ) Otherwise, reject it and starts again. ◮ this is the transition of the HM algorithm.The transition kernel q ˜ admits π as a stationary distribution: ˜ π(x)p(x) α(x, y)q(y|x) π (x)˜(y|x) = ˜ q π(u)p(u)du p(x) π (x) ˜ q (y|x) ˜ 6 / 32

19. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) α(zi , ·) q(·|zi ) q(·|zi ) q (·|zi ) = ˜ ≤ , p(zi ) p(zi ) where p(zi ) = α(zi , y) q(y|zi )dy. To simulate from q (·|zi ): ˜ 1 Propose a candidate y ∼ q(·|zi ) 2 Accept with probability q(y|zi ) q (y|zi ) ˜ = α(zi , y) p(zi ) Otherwise, reject it and starts again. ◮ this is the transition of the HM algorithm.The transition kernel q ˜ admits π as a stationary distribution: ˜ π(x)α(x, y)q(y|x) π (x)˜(y|x) = ˜ q π(u)p(u)du 6 / 32

20. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) α(zi , ·) q(·|zi ) q(·|zi ) q (·|zi ) = ˜ ≤ , p(zi ) p(zi ) where p(zi ) = α(zi , y) q(y|zi )dy. To simulate from q (·|zi ): ˜ 1 Propose a candidate y ∼ q(·|zi ) 2 Accept with probability q(y|zi ) q (y|zi ) ˜ = α(zi , y) p(zi ) Otherwise, reject it and starts again. ◮ this is the transition of the HM algorithm.The transition kernel q ˜ admits π as a stationary distribution: ˜ π(y)α(y, x)q(x|y) π (x)˜(y|x) = ˜ q π(u)p(u)du 6 / 32

21. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) α(zi , ·) q(·|zi ) q(·|zi ) q (·|zi ) = ˜ ≤ , p(zi ) p(zi ) where p(zi ) = α(zi , y) q(y|zi )dy. To simulate from q (·|zi ): ˜ 1 Propose a candidate y ∼ q(·|zi ) 2 Accept with probability q(y|zi ) q (y|zi ) ˜ = α(zi , y) p(zi ) Otherwise, reject it and starts again. ◮ this is the transition of the HM algorithm.The transition kernel q ˜ admits π as a stationary distribution: ˜ π (x)˜(y|x) = π (y)˜(x|y) , ˜ q ˜ q 6 / 32

22. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Lemma (Douc & X., AoS, 2011) The sequence (zi , ni ) satisﬁes 1 (zi , ni )i is a Markov chain; 2 zi+1 and ni are independent given zi ; 3 ni is distributed as a geometric random variable with probability parameter p(zi ) := α(zi , y) q(y|zi ) dy ; (1) 4 ˜ (zi )i is a Markov chain with transition kernel Q(z, dy) = q (y|z)dy ˜ and stationary distribution π such that ˜ q (·|z) ∝ α(z, ·) q(·|z) ˜ and π (·) ∝ π(·)p(·) . ˜ 7 / 32

26. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Old bottle, new wine [or vice-versa] zi−1 8 / 32

27. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Old bottle, new wine [or vice-versa] indep zi−1 zi indep ni−1 8 / 32

28. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Old bottle, new wine [or vice-versa] indep indep zi−1 zi zi+1 indep indep ni−1 ni 8 / 32

29. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Old bottle, new wine [or vice-versa] indep indep zi−1 zi zi+1 indep indep ni−1 ni n Mn 1 1 δ= h(x(t) ) = ni h(zi ) . n t=1 n i=1 8 / 32

30. Metropolis Hastings revisited Rao–Blackwellisation Rao-Blackwellisation (2) Old bottle, new wine [or vice-versa] indep indep zi−1 zi zi+1 indep indep ni−1 ni n Mn 1 1 δ= h(x(t) ) = ni h(zi ) . n t=1 n i=1 8 / 32

31. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Importance sampling perspective 1 A natural idea: Mn 1 h(zi ) δ∗ = , n i=1 p(zi ) 9 / 32

32. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Importance sampling perspective 1 A natural idea: Mn h(zi ) Mn π(zi ) i=1 i=1 h(zi ) p(zi ) π (zi ) ˜ δ∗ ≃ = . Mn 1 Mn π(zi ) i=1 i=1 p(zi ) π (zi ) ˜ 9 / 32

33. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Importance sampling perspective 1 A natural idea: Mn h(zi ) Mn π(zi ) i=1 i=1 h(zi ) p(zi ) π (zi ) ˜ δ∗ ≃ = . Mn 1 Mn π(zi ) i=1 i=1 p(zi ) π (zi ) ˜ 2 But p not available in closed form. 9 / 32

34. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Importance sampling perspective 1 A natural idea: Mn h(zi ) Mn π(zi ) i=1 i=1 h(zi ) p(zi ) π (zi ) ˜ δ∗ ≃ = . Mn 1 Mn π(zi ) i=1 i=1 p(zi ) π (zi ) ˜ 2 But p not available in closed form. 3 The geometric ni is the replacement, an obvious solution that is used in the original Metropolis–Hastings estimate since E[ni ] = 1/p(zi ). 9 / 32

35. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations The Bernoulli factory The crude estimate of 1/p(zi ), ∞ ni = 1 + I {uℓ ≥ α(zi , yℓ )} , j=1 ℓ≤j can be improved: Lemma (Douc & X., AoS, 2011) If (yj )j is an iid sequence with distribution q(y|zi ), the quantity ∞ ˆ ξi = 1 + {1 − α(zi , yℓ )} j=1 ℓ≤j is an unbiased estimator of 1/p(zi ) which variance, conditional on zi , is lower than the conditional variance of ni , {1 − p(zi )}/p2 (zi ). 10 / 32

36. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Rao-Blackwellised, for sure? ∞ ˆ ξi = 1 + {1 − α(zi , yℓ )} j=1 ℓ≤j 1 Infinite sum but finite with at least positive probability: π(yt ) q(x(t) |yt ) α(x(t) , yt ) = min 1, π(x(t) ) q(yt |x(t) ) For example: take a symmetric random walk as a proposal. 2 What if we wish to be sure that the sum is finite? Finite horizon k version: ∞ ˆk ξi = 1 + {1 − α(zi , yj )} I {uℓ ≥ α(zi , yℓ )} j=1 1≤ℓ≤k∧j k+1≤ℓ≤j 11 / 32

37. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Rao-Blackwellised, for sure? ∞ ˆ ξi = 1 + {1 − α(zi , yℓ )} j=1 ℓ≤j 1 Infinite sum but finite with at least positive probability: π(yt ) q(x(t) |yt ) α(x(t) , yt ) = min 1, π(x(t) ) q(yt |x(t) ) For example: take a symmetric random walk as a proposal. 2 What if we wish to be sure that the sum is finite? Finite horizon k version: ∞ ˆk ξi = 1 + {1 − α(zi , yj )} I {uℓ ≥ α(zi , yℓ )} j=1 1≤ℓ≤k∧j k+1≤ℓ≤j 11 / 32

38. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Variance improvement Proposition (Douc & X., AoS, 2011) If (yj )j is an iid sequence with distribution q(y|zi ) and (uj )j is an iid uniform sequence, for any k ≥ 0, the quantity ∞ ˆk ξi = 1 + {1 − α(zi , yj )} I {uℓ ≥ α(zi , yℓ )} j=1 1≤ℓ≤k∧j k+1≤ℓ≤j is an unbiased estimator of 1/p(zi ) with an almost sure ﬁnite number of terms. 12 / 32

39. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Variance improvement Proposition (Douc & X., AoS, 2011) If (yj )j is an iid sequence with distribution q(y|zi ) and (uj )j is an iid uniform sequence, for any k ≥ 0, the quantity ∞ ˆk ξi = 1 + {1 − α(zi , yj )} I {uℓ ≥ α(zi , yℓ )} j=1 1≤ℓ≤k∧j k+1≤ℓ≤j is an unbiased estimator of 1/p(zi ) with an almost sure ﬁnite number of terms. Moreover, for k ≥ 1, ˆk 1 − p(zi ) 1 − (1 − 2p(zi ) + r(zi ))k 2 − p(zi ) V ξi z i = 2 (z ) − (p(zi ) − r(zi )) , p i 2p(zi ) − r(zi ) p2 (zi ) where p(zi ) := α(zi , y) q(y|zi ) dy. and r(zi ) := α2 (zi , y) q(y|zi ) dy. 12 / 32

40. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Variance improvement Proposition (Douc & X., AoS, 2011) If (yj )j is an iid sequence with distribution q(y|zi ) and (uj )j is an iid uniform sequence, for any k ≥ 0, the quantity ∞ ˆk ξi = 1 + {1 − α(zi , yj )} I {uℓ ≥ α(zi , yℓ )} j=1 1≤ℓ≤k∧j k+1≤ℓ≤j is an unbiased estimator of 1/p(zi ) with an almost sure ﬁnite number of terms. Therefore, we have ˆ ˆk ˆ0 V ξi zi ≤ V ξi zi ≤ V ξi zi = V [ni | zi ] . 12 / 32

41. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations zi−1 ∞ ˆk ξi = 1 + {1 − α(zi , yj )} I {uℓ ≥ α(zi , yℓ )} j=1 1≤ℓ≤k∧j k+1≤ℓ≤j 13 / 32

42. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations not indep zi−1 zi not indep ˆk ξi−1 ∞ ˆk ξi = 1 + {1 − α(zi , yj )} I {uℓ ≥ α(zi , yℓ )} j=1 1≤ℓ≤k∧j k+1≤ℓ≤j 13 / 32

43. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations not indep not indep zi−1 zi zi+1 not indep not indep ˆk ξi−1 ˆk ξi ∞ ˆk ξi = 1 + {1 − α(zi , yj )} I {uℓ ≥ α(zi , yℓ )} j=1 1≤ℓ≤k∧j k+1≤ℓ≤j 13 / 32

44. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations not indep not indep zi−1 zi zi+1 not indep not indep ˆk ξi−1 ˆk ξi M ˆk k i=1 ξi h(zi ) δM = M ˆk . i=1 ξi 13 / 32

45. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations not indep not indep zi−1 zi zi+1 not indep not indep ˆk ξi−1 ˆk ξi M ˆk k i=1 ξi h(zi ) δM = M ˆk . i=1 ξi 13 / 32

46. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Let M ˆk k i=1 ξi h(zi ) δM = M ˆk . i=1 ξi For any positive function ϕ, we denote Cϕ = {h; |h/ϕ|∞ < ∞}. 14 / 32

47. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Let M ˆk k i=1 ξi h(zi ) δM = M ˆk . i=1 ξi For any positive function ϕ, we denote Cϕ = {h; |h/ϕ|∞ < ∞}. Assume that there exists a positive function ϕ ≥ 1 such that M i=1 h(zi )/p(zi ) P ∀h ∈ Cϕ , M −→ π(h) i=1 1/p(zi ) Theorem (Douc & X., AoS, 2011) Under the assumption that π(p) > 0, the following convergence property holds: i) If h is in Cϕ , then k P δM −→M →∞ π(h) (◮Consistency) 14 / 32

48. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Let M ˆk k i=1 ξi h(zi ) δM = M ˆk . i=1 ξi For any positive function ϕ, we denote Cϕ = {h; |h/ϕ|∞ < ∞}. Assume that there exists a positive function ψ such that √ M i=1 h(zi )/p(zi ) L ∀h ∈ Cψ , M M − π(h) −→ N (0, Γ(h)) i=1 1/p(zi ) Theorem (Douc & X., AoS, 2011) Under the assumption that π(p) > 0, the following convergence property holds: ii) If, in addition, h2 /p ∈ Cϕ and h ∈ Cψ , then √ k L M (δM − π(h)) −→M →∞ N (0, Vk [h − π(h)]) , (◮Clt) where Vk (h) := π(p) ˆk π(dz)V ξi z h2 (z)p(z) + Γ(h) . 14 / 32

49. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations We will need some additional assumptions. Assume a maximal inequality for the Markov chain (zi )i : there exists a measurable function ζ such that for any starting point x,   i nCh (x) ∀h ∈ Cζ , Px  sup [h(zi ) − π (h)] > ǫ ≤ ˜ 0≤i≤n j=0 ǫ2 Theorem (Douc & X., AoS, 2011) Assume that h is such that h/p ∈ Cζ and {Ch/p , h2 /p2 } ⊂ Cφ . Assume moreover that √ 0 L M δM − π(h) −→ N (0, V0 [h − π(h)]) . Then, for any starting point x, n t=1 h(x(t) ) n→+∞ Mn − π(h) −→ N (0, V0 [h − π(h)]) , n where Mn is deﬁned by 15 / 32

50. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations We will need some additional assumptions. Assume a maximal inequality for the Markov chain (zi )i : there exists a measurable function ζ such that for any starting point x,   i nCh (x) ∀h ∈ Cζ , Px  sup [h(zi ) − π (h)] > ǫ ≤ ˜ 0≤i≤n j=0 ǫ2 Moreover, assume that ∃φ ≥ 1 such that for any starting point x, ˜ P ∀h ∈ Cφ , Qn (x, h) −→ π (h) = π(ph)/π(p) , ˜ Theorem (Douc & X., AoS, 2011) Assume that h is such that h/p ∈ Cζ and {Ch/p , h2 /p2 } ⊂ Cφ . Assume moreover that √ 0 L M δM − π(h) −→ N (0, V0 [h − π(h)]) . Then, for any starting point x, n (t) 15 / 32

51. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations We will need some additional assumptions. Assume a maximal inequality for the Markov chain (zi )i : there exists a measurable function ζ such that for any starting point x,   i nCh (x) ∀h ∈ Cζ , Px  sup [h(zi ) − π (h)] > ǫ ≤ ˜ 0≤i≤n j=0 ǫ2 Moreover, assume that ∃φ ≥ 1 such that for any starting point x, ˜ P ∀h ∈ Cφ , Qn (x, h) −→ π (h) = π(ph)/π(p) , ˜ Theorem (Douc & X., AoS, 2011) Assume that h is such that h/p ∈ Cζ and {Ch/p , h2 /p2 } ⊂ Cφ . Assume moreover that √ 0 L M δM − π(h) −→ N (0, V0 [h − π(h)]) . Then, for any starting point x, n (t) 15 / 32

52. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations   i nCh (x) ∀h ∈ Cζ , Px  sup [h(zi ) − π (h)] > ǫ ≤ ˜ 0≤i≤n j=0 ǫ2 ˜ P ∀h ∈ Cφ , Qn (x, h) −→ π (h) = π(ph)/π(p) , ˜ Theorem (Douc & X., AoS, 2011) Assume that h is such that h/p ∈ Cζ and {Ch/p , h2 /p2 } ⊂ Cφ . Assume moreover that √ 0 L M δM − π(h) −→ N (0, V0 [h − π(h)]) . Then, for any starting point x, n t=1 h(x(t) ) n→+∞ Mn − π(h) −→ N (0, V0 [h − π(h)]) , n where Mn is deﬁned by 15 / 32

53. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Theorem (Douc & X., AoS, 2011) Assume that h is such that h/p ∈ Cζ and {Ch/p , h2 /p2 } ⊂ Cφ . Assume moreover that √ 0 L M δM − π(h) −→ N (0, V0 [h − π(h)]) . Then, for any starting point x, n t=1 h(x(t) ) n→+∞ Mn − π(h) −→ N (0, V0 [h − π(h)]) , n where Mn is deﬁned by Mn Mn +1 ˆ0 ξi ≤ n < ˆ0 ξi . i=1 i=1 15 / 32

54. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Variance gain (1) h(x) x x2 IX>0 p(x) τ = .1 0.971 0.953 0.957 0.207 τ =2 0.965 0.942 0.875 0.861 τ =5 0.913 0.982 0.785 0.826 τ =7 0.899 0.982 0.768 0.820 Ratios of the empirical variances of δ ∞ and δ estimating E[h(X)]: 100 MCMC iterations over 103 replications of a random walk Gaussian proposal with scale τ . 16 / 32

55. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Illustration (1) Figure: Overlay of the variations of 250 iid realisations of the estimates δ (gold) and δ ∞ (grey) of E[X] = 0 for 1000 iterations, along with the 90% interquantile range for the estimates δ (brown) and δ ∞ (pink), in the setting of a random walk Gaussian proposal with scale τ = 10. 17 / 32

56. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Extra computational eﬀort median mean q.8 q.9 time τ = .25 0.0 8.85 4.9 13 4.2 τ = .50 0.0 6.76 4 11 2.25 τ = 1.0 0.25 6.15 4 10 2.5 τ = 2.0 0.20 5.90 3.5 8.5 4.5 Additional computing eﬀort due: median and mean numbers of additional iterations, 80% and 90% quantiles for the additional iterations, and ratio of the average R computing times obtained over 105 simulations 18 / 32

57. Formal importance sampling Metropolis Hastings revisited Variance reduction Rao–Blackwellisation Asymptotic results Rao-Blackwellisation (2) Illustrations Illustration (2) Figure: Overlay of the variations of 500 iid realisations of the estimates δ (deep grey), δ ∞ (medium grey) and of the importance sampling version (light grey) of E[X] = 10 when X ∼ Exp(.1) for 100 iterations, along with the 90% interquantile ranges (same colour code), in the setting of an independent exponential proposal with scale µ = 0.02. 19 / 32

58. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Integrating out white noise [C+X, 96] In Casella+X. (1996), averaging over possible past and future histories (by integrating out uniforms) to improve weights of accepted values 20 / 32

59. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Integrating out white noise [C+X, 96] In Casella+X. (1996), averaging over possible past and future histories (by integrating out uniforms) to improve weights of accepted values Given the whole sequence of proposed values yt ∼ µ(yt ), averaging over uniforms is possible: starting with y1 , we can compute T T 1 1 E[h(Xt )|y1 , . . . , yT ] = ϕt h(yt ) T t=1 T t=1 through a recurrence relation: 20 / 32

60. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Integrating out white noise [C+X, 96] In Casella+X. (1996), averaging over possible past and future histories (by integrating out uniforms) to improve weights of accepted values p (i) ϕt = δt ξtj j=t t−1 with δ0 = 1 , δt = δj ξj(t−1) ρjt j=0 j and ξtt = 1 , ξtj = (1 − ρtu ) u=t+1 occurence survivals of the yt ’s, associated with Metropolis–Hastings ratio ωt = π(yt )/µ(yt ) , ρtu = ωu /ωt ∧ 1 . 20 / 32

61. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Integrating out white noise (2) Extension to generic M-H feasible (C+X, 96) Potentialy large variance improvement but cost of O(T 2 )... Possible recovery of eﬃciency thanks to parallelisation: Moving from (ǫ1 , . . . , ǫp ) towards... (ǫ(1) , . . . , ǫ(p) ) by averaging over ”all” possible orders 21 / 32

64. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Case of the independent Metropolis–Hastings algorithm Starting at time t with p processors and a pool of p proposed values, (y1 , . . . , yp ) use processors to examine in parallel p diﬀerent “histories” 22 / 32

65. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Case of the independent Metropolis–Hastings algorithm Starting at time t with p processors and a pool of p proposed values, (y1 , . . . , yp ) use processors to examine in parallel p diﬀerent “histories” 22 / 32

66. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Improvement The standard estimator τ1 of Eπ [h(X)] ˆ p 1 τ1 (xt , y1:p ) = ˆ h(xt+k ) p k=1 is necessarily dominated by the average p 1 τ2 (xt , y1:p ) = 2 ˆ nk h(yk ) p k=0 where y0 = xt and n0 is the number of times xt is repeated. 23 / 32

67. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Further Rao-Blackwellisation E.g., use of the Metropolis–Hastings weights wj : j being the index such that xt+i−1 = yj , update of the weights at each time t + i: wj = wj + 1 − ρ(xt+i−1 , yi ) wi = wi + ρ(xt+i−1 , yi ) resulting into a more stable estimator p 1 τ3 (xt , y1:p ) = ˆ wk h(yk ) p2 k=0 E.g., Casella+X. (1996) p 1 τ4 (xt , y1:p ) = ˆ ϕk h(yk ) p2 k=0 24 / 32

68. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Further Rao-Blackwellisation E.g., use of the Metropolis–Hastings weights wj : j being the index such that xt+i−1 = yj , update of the weights at each time t + i: wj = wj + 1 − ρ(xt+i−1 , yi ) wi = wi + ρ(xt+i−1 , yi ) resulting into a more stable estimator p 1 τ3 (xt , y1:p ) = ˆ wk h(yk ) p2 k=0 E.g., Casella+X. (1996) p 1 τ4 (xt , y1:p ) = ˆ ϕk h(yk ) p2 k=0 24 / 32

69. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Markovian continuity The Markov validity of the chain is not jeopardised! The chain continues (j) by picking one sequence at random and taking the corresponding xt+p as starting point of the next parallel block. 25 / 32

70. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Markovian continuity The Markov validity of the chain is not jeopardised! The chain continues (j) by picking one sequence at random and taking the corresponding xt+p as starting point of the next parallel block. 25 / 32

71. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Impact of Rao-Blackwellisations Comparison of τ1 basic IMH estimator of Eπ [h(X)], ˆ τ2 improving by averaging over permutations of proposed values and ˆ using p times more uniforms τ3 improving upon τ2 by basic Rao-Blackwell argument, ˆ ˆ τ4 improving upon τ2 by integrating out ancillary uniforms, at a cost ˆ ˆ of O(p2 ). 26 / 32

72. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Illustration Variations of estimates based on RB and standard versions of parallel chains and on a standard MCMC chain for the mean and variance of the target N (0, 1) distribution (based on 10, 000 independent replicas). 27 / 32

76. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Impact of the order Parallelisation allows for the partial integration of the uniforms What about the permutation order? Comparison of τ2N with no permutation, ˆ τ2C with circular permutations, ˆ τ2R with random permutations, ˆ τ2H with half-random permutations, ˆ τ2S with stratiﬁed permutations, ˆ 28 / 32

77. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Impact of the order Parallelisation allows for the partial integration of the uniforms What about the permutation order? 28 / 32

81. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Importance target Comparison with the ultimate importance sampling 29 / 32

85. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Extension to the general case Same principle can be applied to any Markov update: if xt+1 = Ψ(xt , ǫt ) then generate (ǫ1 , . . . , ǫp ) in advance and distribute to the p processors in diﬀerent permutation orders Plus use of Douc & X’s (2011) Rao–Blackwellisation ξi ˆk 30 / 32

86. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Extension to the general case Same principle can be applied to any Markov update: if xt+1 = Ψ(xt , ǫt ) then generate (ǫ1 , . . . , ǫp ) in advance and distribute to the p processors in diﬀerent permutation orders Plus use of Douc & X’s (2011) Rao–Blackwellisation ξi ˆk 30 / 32

87. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Implementation (j) Similar run of p parallel chains (xt+i ), use of averages p p (1:p) 1 (j) τ2 (x1:p ) = ˆ nk h(xt+k ) p2 k=1 j=1 and selection of new starting value at random at time t + p: 31 / 32

88. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Implementation (j) Similar run of p parallel chains (xt+i ), use of averages p p (1:p) 1 (j) τ2 (x1:p ) = ˆ nk h(xt+k ) p2 k=1 j=1 and selection of new starting value at random at time t + p: 31 / 32

89. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Illustration Variations of estimates based on RB and standard versions of parallel chains and on a standard MCMC chain for the mean and variance of the target distribution (based on p = 64 parallel processors, 50 blocs of p MCMC steps and 500 independent replicas). 1.3 0.10 1.2 0.05 1.1 0.00 1.0 −0.05 0.9 −0.10 RB par org RB par org 32 / 32

90. Metropolis Hastings revisited Independent case Rao–Blackwellisation General MH algorithms Rao-Blackwellisation (2) Illustration Variations of estimates based on RB and standard versions of parallel chains and on a standard MCMC chain for the mean and variance of the target distribution (based on p = 64 parallel processors, 50 blocs of p MCMC steps and 500 independent replicas). 1.3 0.10 1.2 0.05 1.1 0.00 1.0 −0.05 0.9 −0.10 RB par org RB par org 32 / 32

Trondheim, LGM2012

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (19)

Similaire à Trondheim, LGM2012

Similaire à Trondheim, LGM2012 (20)

Plus de Christian Robert

Plus de Christian Robert (20)

Dernier

Dernier (20)

Trondheim, LGM2012