Python Notes for mca i year students osmania university.docx
A short history of MCMC
1. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
A Short History of Markov Chain Monte Carlo:
Subjective Recollections from Incomplete Data
Christian P. Robert and George Casella
Universit´ Paris-Dauphine, IuF, & CRESt
e
and University of Florida
April 2, 2011
2. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
In memoriam, Julian Besag, 1945–2010
3. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Introduction
Introduction
Markov Chain Monte Carlo (MCMC) methods around for almost
as long as Monte Carlo techniques, even though impact on
Statistics not been truly felt until the late 1980s / early 1990s .
4. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Introduction
Introduction
Markov Chain Monte Carlo (MCMC) methods around for almost
as long as Monte Carlo techniques, even though impact on
Statistics not been truly felt until the late 1980s / early 1990s .
Contents: Distinction between Metropolis-Hastings based
algorithms and those related with Gibbs sampling, and brief entry
into “second-generation MCMC revolution”.
5. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Introduction
A few landmarks
Realization that Markov chains could be used in a wide variety of
situations only came to “mainstream statisticians” with Gelfand
and Smith (1990) despite earlier publications in the statistical
literature like Hastings (1970) and growing awareness in spatial
statistics (Besag, 1986)
6. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Introduction
A few landmarks
Realization that Markov chains could be used in a wide variety of
situations only came to “mainstream statisticians” with Gelfand
and Smith (1990) despite earlier publications in the statistical
literature like Hastings (1970) and growing awareness in spatial
statistics (Besag, 1986)
Several reasons:
lack of computing machinery
lack of background on Markov chains
lack of trust in the practicality of the method
7. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Los Alamos
Bombs before the revolution
Monte Carlo methods born in Los
Alamos, New Mexico, during WWII,
mostly by physicists working on atomic
bombs and eventually producing the
Metropolis algorithm in the early
1950’s.
[Metropolis, Rosenbluth, Rosenbluth, Teller and Teller, 1953]
8. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Los Alamos
Bombs before the revolution
Monte Carlo methods born in Los
Alamos, New Mexico, during WWII,
mostly by physicists working on atomic
bombs and eventually producing the
Metropolis algorithm in the early
1950’s.
[Metropolis, Rosenbluth, Rosenbluth, Teller and Teller, 1953]
9. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Los Alamos
Monte Carlo genesis
Monte Carlo method usually traced to
Ulam and von Neumann:
Stanislaw Ulam associates idea
with an intractable combinatorial
computation attempted in 1946
about “solitaire”
Idea was enthusiastically adopted
by John von Neumann for
implementation on neutron
diffusion
Name “Monte Carlo“ being
suggested by Nicholas Metropolis
[Eckhardt, 1987]
10. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Los Alamos
Monte Carlo genesis
Monte Carlo method usually traced to
Ulam and von Neumann:
Stanislaw Ulam associates idea
with an intractable combinatorial
computation attempted in 1946
about “solitaire”
Idea was enthusiastically adopted
by John von Neumann for
implementation on neutron
diffusion
Name “Monte Carlo“ being
suggested by Nicholas Metropolis
[Eckhardt, 1987]
11. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Los Alamos
Monte Carlo genesis
Monte Carlo method usually traced to
Ulam and von Neumann:
Stanislaw Ulam associates idea
with an intractable combinatorial
computation attempted in 1946
about “solitaire”
Idea was enthusiastically adopted
by John von Neumann for
implementation on neutron
diffusion
Name “Monte Carlo“ being
suggested by Nicholas Metropolis
[Eckhardt, 1987]
12. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Los Alamos
Monte Carlo genesis
Monte Carlo method usually traced to
Ulam and von Neumann:
Stanislaw Ulam associates idea
with an intractable combinatorial
computation attempted in 1946
about “solitaire”
Idea was enthusiastically adopted
by John von Neumann for
implementation on neutron
diffusion
Name “Monte Carlo“ being
suggested by Nicholas Metropolis
[Eckhardt, 1987]
13. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Los Alamos
Monte Carlo with computers
Very close “coincidence” with
appearance of very first
computer, ENIAC, born Feb.
1946, on which von Neumann
implemented Monte Carlo in
1947
14. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Los Alamos
Monte Carlo with computers
Very close “coincidence” with
appearance of very first
computer, ENIAC, born Feb.
1946, on which von Neumann
implemented Monte Carlo in
1947
Same year Ulam and von Neumann (re)invented inversion and
accept-reject techniques
In 1949, very first symposium on Monte Carlo and very first paper
[Metropolis and Ulam, 1949]
15. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Metropolis et al., 1953
The Metropolis et al. (1953) paper
Very first MCMC algorithm associated
with the second computer, MANIAC,
Los Alamos, early 1952.
Besides Metropolis, Arianna W.
Rosenbluth, Marshall N. Rosenbluth,
Augusta H. Teller, and Edward Teller
contributed to create the Metropolis
algorithm...
16. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Metropolis et al., 1953
Motivating problem
Computation of integrals of the form
F (p, q) exp{−E(p, q)/kT }dpdq
I= ,
exp{−E(p, q)/kT }dpdq
with energy E defined as
N N
1
E(p, q) = V (dij ),
2
i=1 j=1
j=i
and N number of particles, V a potential function and dij the
distance between particles i and j.
17. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Metropolis et al., 1953
Boltzmann distribution
Boltzmann distribution exp{−E(p, q)/kT } parameterised by
temperature T , k being the Boltzmann constant, with a
normalisation factor
Z(T ) = exp{−E(p, q)/kT }dpdq
not available in closed form.
18. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Metropolis et al., 1953
Computational challenge
Since p and q are 2N -dimensional vectors, numerical integration is
impossible
Plus, standard Monte Carlo techniques fail to correctly
approximate I: exp{−E(p, q)/kT } is very small for most
realizations of random configurations (p, q) of the particle system.
19. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Metropolis et al., 1953
Metropolis algorithm
Consider a random walk modification of the N particles: for each
1 ≤ i ≤ N , values
xi = xi + αξ1i and yi = yi + αξ2i
are proposed, where both ξ1i and ξ2i are uniform U(−1, 1). The
energy difference between new and previous configurations is ∆E
and the new configuration is accepted with probability
1 ∧ exp{−∆E/kT } ,
and otherwise the previous configuration is replicated∗
∗
counting one more time in the average of the F (pt , pt )’s over the τ moves
of the random walk.
20. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Metropolis et al., 1953
Convergence
Validity of the algorithm established by proving
1. irreducibility
2. ergodicity, that is convergence to the stationary distribution.
21. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Metropolis et al., 1953
Convergence
Validity of the algorithm established by proving
1. irreducibility
2. ergodicity, that is convergence to the stationary distribution.
Second part obtained via discretization of the space: Metropolis et
al. note that the proposal is reversible, then establish that
exp{−E/kT } is invariant.
22. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Metropolis et al., 1953
Convergence
Validity of the algorithm established by proving
1. irreducibility
2. ergodicity, that is convergence to the stationary distribution.
Second part obtained via discretization of the space: Metropolis et
al. note that the proposal is reversible, then establish that
exp{−E/kT } is invariant.
Application to the specific problem of the rigid-sphere collision
model. The number of iterations of the Metropolis algorithm
seems to be limited: 16 steps for burn-in and 48 to 64 subsequent
iterations (that still required four to five hours on the Los Alamos
MANIAC).
23. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Metropolis et al., 1953
Physics and chemistry
The method of Markov chain Monte Carlo immediately
had wide use in physics and chemistry.
[Geyer & Thompson, 1992]
Hammersley and Handscomb, 1967
Piekaar and Clarenburg, 1967
Kennedy and Kutil, 1985
Sokal, 1989
&tc...
24. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Metropolis et al., 1953
Physics and chemistry
Statistics has always been fuelled by energetic mining of
the physics literature.
[Clifford, 1993]
Hammersley and Handscomb, 1967
Piekaar and Clarenburg, 1967
Kennedy and Kutil, 1985
Sokal, 1989
&tc...
25. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Hastings, 1970
A fair generalisation
In Biometrika 1970, Hastings defines MCMC methodology for
finite and reversible Markov chains, the continuous case being
discretised:
26. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Hastings, 1970
A fair generalisation
In Biometrika 1970, Hastings defines MCMC methodology for
finite and reversible Markov chains, the continuous case being
discretised:
Generic acceptance probability for a move from state i to state j is
sij
αij = πi q ,
1 + πj qij
ji
where sij is a symmetric function.
27. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Hastings, 1970
State of the art
Note
Generic form that encompasses both Metropolis et al. (1953) and
Barker (1965).
Peskun’s ordering not yet discovered: Hastings mentions that little
is known about the relative merits of those two choices (even
though) Metropolis’s method may be preferable.
Warning against high rejection rates as indicative of a poor choice
of transition matrix, but not mention of the opposite pitfall of low
rejection.
28. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Hastings, 1970
What else?!
Items included in the paper are
a Poisson target with a ±1 random walk proposal,
a normal target with a uniform random walk proposal mixed with its
reflection (i.e. centered at −X(t) rather than X(t)),
a multivariate target where Hastings introduces Gibbs sampling,
updating one component at a time and defining the composed
transition as satisfying the stationary condition because each
component does leave the target invariant
a reference to Erhman, Fosdick and Handscomb (1960) as a
preliminary if specific instance of this Metropolis-within-Gibbs
sampler
an importance sampling version of MCMC,
some remarks about error assessment,
a Gibbs sampler for random orthogonal matrices
29. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Hastings, 1970
Three years later
Peskun (1973) compares Metropolis’ and Barker’s acceptance
probabilities and shows (again in a discrete setup) that Metropolis’
is optimal (in terms of the asymptotic variance of any empirical
average).
30. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Hastings, 1970
Three years later
Peskun (1973) compares Metropolis’ and Barker’s acceptance
probabilities and shows (again in a discrete setup) that Metropolis’
is optimal (in terms of the asymptotic variance of any empirical
average).
Proof direct consequence of Kemeny and Snell (1960) on
asymptotic variance. Peskun also establishes that this variance can
improve upon the iid case if and only if the eigenvalues of P − A
are all negative, when A is the transition matrix corresponding to
the iid simulation and P the transition matrix corresponding to the
Metropolis algorithm, but he concludes that the trace of P − A is
always positive.
31. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Julian’s early works (1)
Early 1970’s, Hammersley, Clifford, and Besag were working on the
specification of joint distributions from conditional distributions
and on necessary and sufficient conditions for the conditional
distributions to be compatible with a joint distribution.
32. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Julian’s early works (1)
Early 1970’s, Hammersley, Clifford, and Besag were working on the
specification of joint distributions from conditional distributions
and on necessary and sufficient conditions for the conditional
distributions to be compatible with a joint distribution.
[Hammersley and Clifford, 1971]
33. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Julian’s early works (1)
Early 1970’s, Hammersley, Clifford, and Besag were working on the
specification of joint distributions from conditional distributions
and on necessary and sufficient conditions for the conditional
distributions to be compatible with a joint distribution.
What is the most general form of the conditional probability
functions that define a coherent joint function? And what will the
joint look like?
[Besag, 1972]
34. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Hammersley-Clifford theorem
Theorem (Hammersley-Clifford)
Joint distribution of vector associated with a dependence graph
must be represented as product of functions over the cliques of the
graphs, i.e., of functions depending only on the components
indexed by the labels in the clique.
[Cressie, 1993; Lauritzen, 1996]
35. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Hammersley-Clifford theorem
Theorem (Hammersley-Clifford)
A probability distribution P with positive and continuous density f
satisfies the pairwise Markov property with respect to an undirected
graph G if and only if it factorizes according to G , i.e., (F ) ≡ (G)
[Cressie, 1993; Lauritzen, 1996]
36. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Hammersley-Clifford theorem
Theorem (Hammersley-Clifford)
Under the positivity condition, the joint distribution g satisfies
p
g j (y j |y 1 , . . . , y j−1
,y j+1
, . . . , y p)
g(y1 , . . . , yp ) ∝
g j (y j |y 1 , . . . , y j−1
,y , . . . , y p)
j=1 j+1
for every permutation on {1, 2, . . . , p} and every y ∈ Y .
[Cressie, 1993; Lauritzen, 1996]
37. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
An apocryphal theorem
The Hammersley-Clifford theorem was never published by its
authors, but only through Grimmet (1973), Preston (1973),
Sherman (1973), Besag (1974). The authors were dissatisfied with
the positivity constraint: The joint density could only be recovered
from the full conditionals when the support of the joint was made
of the product of the supports of the full conditionals (with
obvious counter-examples. Moussouris’ counter-example put a full
stop to their endeavors.
[Hammersley, 1974]
38. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
To Gibbs or not to Gibbs?
Julian Besag should certainly be credited to a large extent of the
(re?-)discovery of the Gibbs sampler.
39. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
To Gibbs or not to Gibbs?
Julian Besag should certainly be credited to a large extent of the
(re?-)discovery of the Gibbs sampler.
The simulation procedure is to consider the sites
cyclically and, at each stage, to amend or leave unaltered
the particular site value in question, according to a
probability distribution whose elements depend upon the
current value at neighboring sites (...) However, the
technique is unlikely to be particularly helpful in many
other than binary situations and the Markov chain itself
has no practical interpretation.
[Besag, 1974]
40. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Broader perspective
In 1964, Hammersley and Handscomb wrote a (the first?)
textbook on Monte Carlo methods: they cover
They cover such topics as
“Crude Monte Carlo“;
importance sampling;
control variates; and
“Conditional Monte Carlo”, which looks surprisingly like a
missing-data Gibbs completion approach.
They state in the Preface
We are convinced nevertheless that Monte Carlo methods
will one day reach an impressive maturity.
41. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Clicking in
After Peskun (1973), MCMC mostly dormant in mainstream
statistical world for about 10 years, then several papers/books
highlighted its usefulness in specific settings:
Geman and Geman (1984)
Besag (1986)
Strauss (1986)
Ripley (Stochastic Simulation, 1987)
Tanner and Wong (1987)
Younes (1988)
42. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Enters the Gibbs sampler
Geman and Geman (1984), building on Metropolis et al. (1953),
Hastings (1970), and Peskun (1973), constructed a Gibbs sampler
for optimisation in a discrete image processing problem without
completion.
Responsible for the name Gibbs sampling, because method used for
the Bayesian study of Gibbs random fields linked to the physicist
Josiah Willard Gibbs (1839–1903)
Back to Metropolis et al., 1953: the Gibbs sampler is used as a
simulated annealing algorithm and ergodicity is proven on the
collection of global maxima
43. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Besag (1986) integrates GS for SA...
...easy to construct the transition matrix Q, of a discrete
time Markov chain, with state space Ω and limit
distribution (4). Simulated annealing proceeds by
running an associated time inhomogeneous Markov chain
with transition matrices QT , where T is progressively
decreased according to a prescribed “schedule” to a value
close to zero.
[Besag, 1986]
44. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
...and links with Metropolis-Hastings...
There are various related methods of constructing a
manageable QT (Hastings, 1970). Geman and Geman
(1984) adopt the simplest, which they term the ”Gibbs
sampler” (...) time reversibility, a common ingredient in
this type of problem (see, for example, Besag, 1977a), is
present at individual stages but not over complete cycles,
though Peter Green has pointed out that it returns if QT
is taken over a pair of cycles, the second of which visits
pixels in reverse order
[Besag, 1986]
45. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
...seeing the larger picture,...
As Geman and Geman (1984) point out, any property of
the (posterior) distribution P (x|y) can be simulated by
running the Gibbs sampler at “temperature” T = 1.
Thus, if xi maximizes P (xi |y), then it is the most
ˆ
frequently occurring colour at pixel i in an infinite
realization of the Markov chain with transition matrix Q
of Section 2.3. The xi ’s can therefore be simultaneously
ˆ
estimated from a single finite realization of the chain. It
is not yet clear how long the realization needs to be,
particularly for estimation near colour boundaries, but the
amount of computation required is generally prohibitive
for routine purposes
[Besag, 1986]
46. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
...seeing the larger picture,...
P (x|y) can be simulated using the Gibbs sampler, as
suggested by Grenander (1983) and by Geman and
Geman (1984). My dismissal of such an approach for
routine applications was somewhat cavalier:
purpose-built array processors could become relatively
inexpensive (...) suppose that, for 100 complete cycles
say, images have been collected from the Gibbs sampler
(or by Metropolis’ method), following a “settling-in”
period of perhaps another 100 cycles, which should cater
for fairly intricate priors (...) These 100 images should
often be adequate for estimating properties of the
posterior (...) and for making approximate associated
confidence statements, as mentioned by Mr Haslett.
[Besag, 1986]
47. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
...if not going fully Bayes!
...a neater and more efficient procedure [for parameter
estimation] is to adopt maximum ”pseudo-likelihood”
estimation (Besag, 1975)
48. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
...if not going fully Bayes!
...a neater and more efficient procedure [for parameter
estimation] is to adopt maximum ”pseudo-likelihood”
estimation (Besag, 1975)
I have become increasingly enamoured with the Bayesian
paradigm
[Besag, 1986]
49. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
...if not going fully Bayes!
...a neater and more efficient procedure [for parameter
estimation] is to adopt maximum ”pseudo-likelihood”
estimation (Besag, 1975)
I have become increasingly enamoured with the Bayesian
paradigm
[Besag, 1986]
The pair (xi , βi ) is then a (bivariate) Markov field and
can be reconstructed as a bivariate process by the
methods described in Professor Besag’s paper.
[Clifford, 1986]
50. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
...if not going fully Bayes!
...a neater and more efficient procedure [for parameter
estimation] is to adopt maximum ”pseudo-likelihood”
estimation (Besag, 1975)
I have become increasingly enamoured with the Bayesian
paradigm
[Besag, 1986]
The simulation-based estimator Epost Ψ(X) will differ
ˆ
from the m.a.p. estimator Ψ(x).
[Silverman, 1986]
51. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Discussants of Besag (1986)
Impressive who’s who: D.M. Titterington, P. Clifford, P. Green, P.
Brown, B. Silverman, F. Critchley, F. Kelly, K. Mardia, C.
Jennison, J. Kent, D. Spiegelhalter, H. Wynn, D. and S. Geman, J.
Haslett, J. Kay, H. K¨nsch, P. Switzer, B. Torsney, &tc
u
52. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
A comment on Besag (1986)
While special purpose algorithms will determine the
utility of the Bayesian methods, the general purpose
methods-stochastic relaxation and simulation of solutions
of the Langevin equation (Grenander, 1983; Geman and
Geman, 1984; Gidas, 1985a; Geman and Hwang, 1986)
have proven enormously convenient and versatile. We are
able to apply a single computer program to every new
problem by merely changing the subroutine that
computes the energy function in the Gibbs representation
of the posterior distribution.
[Geman and McClure, 1986]
53. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Another one
It is easy to compute exact marginal and joint posterior
probabilities of currently unobserved features, conditional
on those clinical findings currently available
(Spiegelhalter, 1986a,b), the updating taking the form of
‘propagating evidence’ through the network (...) it would
be interesting to see if the techniques described tonight,
which are of intermediate complexity, may have any
applications in this new and exciting area [causal
networks].
[Spiegelhalter, 1986]
54. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
The candidate’s formula
Representation of the marginal likelihood as
π(θ)f (x|θ)
m(x)
π(θ|x)
or of the marginal predictive as
pn (y |y) = f (y |θ)πn (θ|y) πn+1 (θ|y, y )
[Besag, 1989]
55. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
The candidate’s formula
Representation of the marginal likelihood as
π(θ)f (x|θ)
m(x)
π(θ|x)
or of the marginal predictive as
pn (y |y) = f (y |θ)πn (θ|y) πn+1 (θ|y, y )
[Besag, 1989]
Why candidate?
“Equation (2) appeared without explanation in a Durham
University undergraduate final examination script of 1984.
Regrettably, the student’s name is no longer known to me.”
56. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Implications
Newton and Raftery (1994) used this representation to derive
the [infamous] harmonic mean approximation to the marginal
likelihood
Gelfand and Dey (1994)
Geyer and Thompson (1995)
Chib (1995)
57. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Implications
Newton and Raftery (1994)
Gelfand and Dey (1994) also relied on this formula for the
same purpose in a more general perspective
Geyer and Thompson (1995)
Chib (1995)
58. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Implications
Newton and Raftery (1994)
Gelfand and Dey (1994)
Geyer and Thompson (1995) derived MLEs by a Monte Carlo
approximation to the normalising constant
Chib (1995)
59. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Before the revolution
Julian’s early works
Implications
Newton and Raftery (1994)
Gelfand and Dey (1994)
Geyer and Thompson (1995)
Chib (1995) uses this representation to build a MCMC
approximation to the marginal likelihood
60. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Final steps to
Impact
“This is surely a revolution.”
[Clifford, 1993]
Geman and Geman (1984) is one more spark that led to the
explosion, as it had a clear influence on Gelfand, Green, Smith,
Spiegelhalter and others.
Sparked new interest in Bayesian methods, statistical computing,
algorithms, and stochastic processes through the use of computing
algorithms such as the Gibbs sampler and the Metropolis–Hastings
algorithm.
61. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Final steps to
Impact
“[Gibbs sampler] use seems to have been isolated in the spatial
statistics community until Gelfand and Smith (1990)”
[Geyer, 1990]
Geman and Geman (1984) is one more spark that led to the
explosion, as it had a clear influence on Gelfand, Green, Smith,
Spiegelhalter and others.
Sparked new interest in Bayesian methods, statistical computing,
algorithms, and stochastic processes through the use of computing
algorithms such as the Gibbs sampler and the Metropolis–Hastings
algorithm.
62. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Final steps to
Data augmentation
Tanner and Wong (1987) has essentialy the same ingredients as
Gelfand and Smith (1990): simulating from conditionals is
simulating from the joint
63. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Final steps to
Data augmentation
Tanner and Wong (1987) has essentialy the same ingredients as
Gelfand and Smith (1990): simulating from conditionals is
simulating from the joint
Lower impact:
emphasis on missing data problems (hence data augmentation)
MCMC approximation to the target at every iteration
K
1
π(θ|x) ≈ π(θ|x, z t,k ) , z t,k ∼ πt−1 (z|x) ,
ˆ
K
k=1
too close to Rubin’s (1978) multiple imputation
theoretical backup based on functional analysis (Markov kernel had
to be uniformly bounded and equicontinuous)
64. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Gelfand and Smith, 1990
Epiphany
In June 1989, at a Bayesian workshop in Sherbrooke,
Qu´bec, Adrian Smith exposed for the first time (?)
e
the generic features of Gibbs sampler, exhibiting a ten
line Fortran program handling a random effect model
Yij = θi + εij , i = 1, . . . , K, j = 1, . . . , J,
2 2
θi ∼ N(µ, σθ ) εij ∼ N(0, σε )
by full conditionals on µ, σθ , σε ...
[Gelfand and Smith, 1990]
This was enough to convince the whole audience!
65. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Gelfand and Smith, 1990
Garden of Eden
In early 1990s, researchers found that Gibbs and then Metropolis -
Hastings algorithms would crack almost any problem!
Flood of papers followed applying MCMC:
linear mixed models (Gelfand et al., 1990; Zeger and Karim, 1991;
Wang et al., 1993, 1994)
generalized linear mixed models (Albert and Chib, 1993)
mixture models (Tanner and Wong, 1987; Diebolt and X., 1990,
1994; Escobar and West, 1993)
changepoint analysis (Carlin et al., 1992)
point processes (Grenander and Møller, 1994)
&tc
66. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Gelfand and Smith, 1990
Garden of Eden
In early 1990s, researchers found that Gibbs and then Metropolis -
Hastings algorithms would crack almost any problem!
Flood of papers followed applying MCMC:
genomics (Stephens and Smith, 1993; Lawrence et al., 1993;
Churchill, 1995; Geyer and Thompson, 1995)
ecology (George and X, 1992; Dupuis, 1995)
variable selection in regression (George and mcCulloch, 1993)
spatial statistics (Raftery and Banfield, 1991)
longitudinal studies (Lange et al., 1992)
&tc
67. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Gelfand and Smith, 1990
[some of the] early theoretical advances
“It may well be remembered as the afternoon of the 11 Bayesians”
[Clifford, 1993]
Geyer and Thompson, 1992, relied on MCMC methods for ML
estimation
Smith and Roberts, 1993
Besag and Green, 1993
Tierney, 1994
Liu, Wong and Kong, 1994,95
Mengersen and Tweedie, 1996
68. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Gelfand and Smith, 1990
[some of the] early theoretical advances
“It may well be remembered as the afternoon of the 11 Bayesians”
[Clifford, 1993]
Geyer and Thompson, 1992,
Smith and Roberts, 1993 discussed convergence diagnoses and
applications, incl. mixtures for Gibbs and Metropolis–Hastings
Besag and Green, 1993
Tierney, 1994
Liu, Wong and Kong, 1994,95
Mengersen and Tweedie, 1996
69. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Gelfand and Smith, 1990
[some of the] early theoretical advances
“It may well be remembered as the afternoon of the 11 Bayesians”
[Clifford, 1993]
Geyer and Thompson, 1992,
Smith and Roberts, 1993
Besag and Green, 1993 stated the desideratas for
convergences, and connect MCMC with auxiliary and
antithetic variables
Tierney, 1994
Liu, Wong and Kong, 1994,95
Mengersen and Tweedie, 1996
70. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Gelfand and Smith, 1990
[some of the] early theoretical advances
“It may well be remembered as the afternoon of the 11 Bayesians”
[Clifford, 1993]
Geyer and Thompson, 1992,
Smith and Roberts, 1993
Besag and Green, 1993
Tierney, 1994 laid out all of the assumptions needed to
analyze the Markov chains and then developed their
properties, in particular, convergence of ergodic averages and
central limit theorems
Liu, Wong and Kong, 1994,95
Mengersen and Tweedie, 1996
71. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Gelfand and Smith, 1990
[some of the] early theoretical advances
“It may well be remembered as the afternoon of the 11 Bayesians”
[Clifford, 1993]
Geyer and Thompson, 1992,
Smith and Roberts, 1993
Besag and Green, 1993
Tierney, 1994
Liu, Wong and Kong, 1994,95 analyzed the covariance
structure of Gibbs sampling, and were able to formally
establish the validity of Rao-Blackwellization in Gibbs
sampling
Mengersen and Tweedie, 1996
72. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Gelfand and Smith, 1990
[some of the] early theoretical advances
“It may well be remembered as the afternoon of the 11 Bayesians”
[Clifford, 1993]
Geyer and Thompson, 1992,
Smith and Roberts, 1993
Besag and Green, 1993
Tierney, 1994
Liu, Wong and Kong, 1994,95
Mengersen and Tweedie, 1996 set the tone for the study of
the speed of convergence of MCMC algorithms to the target
distribution
73. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Gelfand and Smith, 1990
[some of the] early theoretical advances
“It may well be remembered as the afternoon of the 11 Bayesians”
[Clifford, 1993]
Geyer and Thompson, 1992,
Smith and Roberts, 1993
Besag and Green, 1993
Tierney, 1994
Liu, Wong and Kong, 1994,95
Mengersen and Tweedie, 1996
Gilks, Clayton and Spiegelhalter, 1993
&tc...
74. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
The Revolution
Convergence diagnoses
Convergence diagnoses
Can we really tell when a complicated Markov chain has
reached equilibrium? Frankly, I doubt it.
[Clifford, 1993]
Explosion of methods
Gelman and Rubin (1991)
Besag and Green (1992)
Geyer (1992)
Raftery and Lewis (1992)
Cowles and Carlin (1996) coda
Brooks and Roberts (1998)
&tc
75. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
After the Revolution
Particle systems
Particles, again
Iterating importance sampling is about as old as Monte Carlo
methods themselves!
[Hammersley and Morton,1954; Rosenbluth and Rosenbluth, 1955]
Found in the molecular simulation literature of the 50’s with
self-avoiding random walks and signal processing
[Marshall, 1965; Handschin and Mayne, 1969]
76. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
After the Revolution
Particle systems
Particles, again
Iterating importance sampling is about as old as Monte Carlo
methods themselves!
[Hammersley and Morton,1954; Rosenbluth and Rosenbluth, 1955]
Found in the molecular simulation literature of the 50’s with
self-avoiding random walks and signal processing
[Marshall, 1965; Handschin and Mayne, 1969]
Use of the term “particle” dates back to Kitagawa (1996), and Carpenter
et al. (1997) coined the term “particle filter”.
77. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
After the Revolution
Particle systems
Bootstrap filter and sequential Monte Carlo
Gordon, Salmon and Smith (1993) introduced the bootstrap filter
which, while formally connected with importance sampling,
involves past simulations and possible MCMC steps (Gilks and
Berzuini, 2001).
Sequential imputation was developped in Kong, Liu and Wong
(1994), while Liu and Chen (1995) first formally pointed out the
importance of resampling in “sequential Monte Carlo”, a term they
coined
78. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
After the Revolution
Particle systems
pMC versus pMCMC
Recycling of past simulations legitimate to build better
importance sampling functions as in population Monte Carlo
[Iba, 2000; Capp´ et al, 2004; Del Moral et al., 2007]
e
Recent synthesis by Andrieu, Doucet, and Hollenstein (2010)
using particles to build an evolving MCMC kernel pθ (y1:T ) in
ˆ
state space models p(x1:T )p(y1:T |x1:T ), along with Andrieu’s
and Roberts’ (2009) use of approximations in MCMC
acceptance steps
[Kennedy and Kulti, 1985]
79. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
After the Revolution
Reversible jump
Reversible jump
Generaly considered as the second Revolution.
Formalisation of a Markov chain moving across
models and parameter spaces allows for the
Bayesian processing of a wide variety of models
and to the success of Bayesian model choice
Definition of a proper balance condition on cross-model Markov
kernels gives a generic setup for exploring variable dimension
spaces, even when the number of models under comparison is
infinite.
[Green, 1995]
80. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
After the Revolution
Perfect sampling
Perfect sampling
Seminal paper of Propp and Wilson (1996) showed how to use
MCMC methods to produce an exact (or perfect) simulation from
the target.
81. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
After the Revolution
Perfect sampling
Perfect sampling
Seminal paper of Propp and Wilson (1996) showed how to use
MCMC methods to produce an exact (or perfect) simulation from
the target.
Outburst of papers, particularly from Jesper Møller and coauthors,
but the excitement somehow dried out [except in dedicated areas]
as construction of perfect samplers is hard and coalescence times
very high...
[Møller and Waagepetersen, 2003]
82. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
After the Revolution
Envoi
To be continued...
...standing on the shoulders of giants