This document provides an overview of topics to be covered in a Bayesian case studies course, including:
1. The course will implement computational algorithms and analyze real datasets over 6 sessions.
2. Exponential family distributions can be written in a specific form, and the posterior depends on data only through sufficient statistics for these distributions.
3. Conjugate priors result in posterior distributions from the same family as the prior. Jeffreys' prior is an uninformative prior defined using Fisher information.
1. Bayesian Case Studies, week 1
Robin J. Ryder
7 January 2013
Robin J. Ryder Bayesian Case Studies, week 1
2. About this course
Two aims:
1 Implement computational algorithms
2 Analyse real datasets
6 × 3 hours.
E-mail: ryder@ceremade.dauphine.fr. Office B627.
Evaluation: written-up analysis of a dataset, to hand in by end of
March. The project topic will be given in February.
Robin J. Ryder Bayesian Case Studies, week 1
3. Exponential family
A family of distributions (=a model) is an exponential family if the
density can be written as
fX (x|θ) = h(x) exp[η(θ) · T (x) − A(θ)]
where h, η, T and A are known functions.
Robin J. Ryder Bayesian Case Studies, week 1
4. Exponential family
A family of distributions (=a model) is an exponential family if the
density can be written as
fX (x|θ) = h(x) exp[η(θ) · T (x) − A(θ)]
where h, η, T and A are known functions.
Then T (x) is a sufficient statistic. For iid x1 , . . . , xn , T (xi ) is a
sufficient statistic for the sample: it encapsulates all the
information about the parameters included in the data. The
posterior depends on the sample only through the sufficient
statistic.
η(θ) is called the natural parameter.
A(θ) is the log-partition, the log of the normalizing factor.
Robin J. Ryder Bayesian Case Studies, week 1
5. Conjugate prior
A family of distributions is a conjugate prior for a given model if
the posterior belongs to the same family of distributions.
This is mostly a computational advantage.
If the model is an exponential family, then a conjugate prior exists.
Robin J. Ryder Bayesian Case Studies, week 1
6. Jeffreys’ prior
Jeffreys’ prior, also called the uninformative prior, is invariant by
reparameterization. In the one-dimensional case, it is defined as
π(θ) ∝ I (θ)
where I (θ) is the Fisher information, which is defined as a function
of the log-likelihood :
2
∂ ∂2
I (θ) = EX θ = −EX θ
∂θ ∂θ2
(under certain regularity conditions)
Robin J. Ryder Bayesian Case Studies, week 1
7. Jeffreys’ prior (contd)
Jeffreys’ prior may be improper, which means that it integrates to
infinity.
This is not an issue as long as the corresponding posterior is
proper. This point should always be checked.
Robin J. Ryder Bayesian Case Studies, week 1
8. Data: Ship accidents
The dataset ShipAccidents includes data on accidents of 40
classes of ships. Each row corresponds to one class. Each class of
ship is defined by 3 attributes: type of ship (5 modalities), period
of construction (4 modalities), period of operation (2 modalities).
For each type of ship, we are given the cumulative number of
months in operation and the cumulative number of incidents,
which we expect to follow a Poisson distribution.
Robin J. Ryder Bayesian Case Studies, week 1
9. ABC
Approximate Bayesian Computation is a computational method to
draw approximate samples from a posterior distribution in cases
where the likelihood is intractable, but where it is easy to simulate
new datasets.
Given observed data Dobs , with prior π(θ), we wish to sample θ
from the posterior π(theta)L(D|θ).
The non-approximate version of the algorithm is:
1 Simulate θ from the prior π.
2 Simulate a new dataset Dsim from the model, with parameter
θ.
3 If Dobs = Dsim , then accept θ; else reject θ.
4 Repeat until we get a large enough sample of θ’s.
Robin J. Ryder Bayesian Case Studies, week 1
10. ABC (contd)
It is clear that this algorithm gives samples which follow exactly
the posterior distribution, but the acceptation probability at step 3
is very small, making the algorithm very slow. Instead, an
approximate version is used, by introducing a distance d on
datasets and a tolerance parameter :
Robin J. Ryder Bayesian Case Studies, week 1
11. ABC (contd)
It is clear that this algorithm gives samples which follow exactly
the posterior distribution, but the acceptation probability at step 3
is very small, making the algorithm very slow. Instead, an
approximate version is used, by introducing a distance d on
datasets and a tolerance parameter :
1 Simulate θ from the prior π.
2 Simulate a new dataset Dsim from the model, with parameter
θ.
3 If d(Dobs , Dsim ) < , then accept θ; else reject θ.
4 Repeat until we get a large enough sample of θ’s.
Robin J. Ryder Bayesian Case Studies, week 1
12. ABC (contd)
In the limit → 0, this algorithm is exact.
In practice, the distance is usually computed on a summary
statistic of the data. Ideally, the summary statistic is sufficient,
thus incurring no loss of information.
Robin J. Ryder Bayesian Case Studies, week 1