Reading "Bayesian measures of model complexity and fit"
1. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Bayesian measures of model complexity
and fit
by D. J. Spiegelhalter, N. G. Best, B. P. Carlin and A. van der
Linde, 2002
presented by Ilaria Masiani
TSI-EuroBayes student
Université Paris Dauphine
Reading seminar on Classics, October 21, 2013
Ilaria Masiani
October 21, 2013
2. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Presentation of the paper
Bayesian measures of model complexity and fit by David J.
Spiegelhalter, Nicola G. Best, Bradley P. Carlin and
Angelika van der Linde
Published in 2002 for J. Royal Statistical Society, series B,
vol.64, Part 4, pp. 583-639
Ilaria Masiani
October 21, 2013
3. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Outline
1
Introduction
2
Complexity of a Bayesian model
3
Forms for pD
4
Diagnostics for fit
5
Model comparison criterion
6
Examples
7
Conclusion
Ilaria Masiani
October 21, 2013
4. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Outline
1
Introduction
2
Complexity of a Bayesian model
3
Forms for pD
4
Diagnostics for fit
5
Model comparison criterion
6
Examples
7
Conclusion
Ilaria Masiani
October 21, 2013
5. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Introduction
Model comparison:
measure of fit (ex. deviance statistic)
complexity (n. of free parameters in the model)
=⇒Trade-off of these two quantities
Ilaria Masiani
October 21, 2013
6. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Some of usual model comparison criterion:
ˆ
Akaike information criterion: AIC= −2log{p(y |θ)} + 2p
Bayesian information criterion:
ˆ
BIC= −2log{p(y |θ)} + plog(n)
The problem: both require to know p
Sometimes not clearly defined, e.g., complex hierarchical
models
Ilaria Masiani
October 21, 2013
7. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
=⇒This paper suggests Bayesian measures of complexity and
fit that can be combined to compare complex models.
Ilaria Masiani
October 21, 2013
8. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Bayesian measure of model
complexity
Outline
1
Introduction
2
Complexity of a Bayesian model
3
Forms for pD
4
Diagnostics for fit
5
Model comparison criterion
6
Examples
7
Conclusion
Ilaria Masiani
October 21, 2013
Observations on pD
9. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Bayesian measure of model
complexity
Complexity reflects the ’difficulty in estimation’.
Measure of complexity may depend on:
prior information
observed data
Ilaria Masiani
October 21, 2013
Observations on pD
10. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Bayesian measure of model
complexity
True model
’All models are wrong, but some are useful’
Box (1976)
Ilaria Masiani
October 21, 2013
Observations on pD
11. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Bayesian measure of model
complexity
Observations on pD
True model
pt (Y ) ’true’ distribution of unobserved future data Y
θt ’pseudotrue’ parameter value
p(Y |θt ) likelihood specified by θt
Ilaria Masiani
October 21, 2013
12. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Bayesian measure of model
complexity
Observations on pD
Residual information
residual information in data y conditional on θ:
−2log{p(y |θ)}
up to a multiplicative constant (Kullback and Leibler, 1951)
˜
estimator θ(y ) of θt
excess of the true over the estimated residual information:
˜
˜
dΘ {y , θt , θ(y )} = −2log{p(y |θt )} + 2log[p{y |θ(y )}]
Ilaria Masiani
October 21, 2013
13. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Bayesian measure of model
complexity
Observations on pD
Residual information
residual information in data y conditional on θ:
−2log{p(y |θ)}
up to a multiplicative constant (Kullback and Leibler, 1951)
˜
estimator θ(y ) of θt
excess of the true over the estimated residual information:
˜
˜
dΘ {y , θt , θ(y )} = −2log{p(y |θt )} + 2log[p{y |θ(y )}]
Ilaria Masiani
October 21, 2013
14. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Bayesian measure of model
complexity
Observations on pD
Residual information
residual information in data y conditional on θ:
−2log{p(y |θ)}
up to a multiplicative constant (Kullback and Leibler, 1951)
˜
estimator θ(y ) of θt
excess of the true over the estimated residual information:
˜
˜
dΘ {y , θt , θ(y )} = −2log{p(y |θt )} + 2log[p{y |θ(y )}]
Ilaria Masiani
October 21, 2013
15. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Bayesian measure of model
complexity
Outline
1
Introduction
2
Complexity of a Bayesian model
Bayesian measure of model complexity
3
Forms for pD
4
Diagnostics for fit
5
Model comparison criterion
6
Examples
7
Conclusion
Ilaria Masiani
October 21, 2013
Observations on pD
16. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Bayesian measure of model
complexity
Observations on pD
Bayesian measure of model complexity
unknown θt replaced by random variable θ
˜
dΘ {y , θ, θ(y )} estimated by its posterior expectation w.r.t.
p(θ|y ) :
˜
˜
pD {y , Θ, θ(y )} = Eθ|y [dΘ {y , θ, θ(y )}]
˜
= Eθ|y [−2log{p(y |θ)}] + 2log[p{y |θ(y )}]
pD proposal as the effective number of parameters w.r.t.
model with focus Θ
Ilaria Masiani
October 21, 2013
17. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Bayesian measure of model
complexity
Observations on pD
Bayesian measure of model complexity
unknown θt replaced by random variable θ
˜
dΘ {y , θ, θ(y )} estimated by its posterior expectation w.r.t.
p(θ|y ) :
˜
˜
pD {y , Θ, θ(y )} = Eθ|y [dΘ {y , θ, θ(y )}]
˜
= Eθ|y [−2log{p(y |θ)}] + 2log[p{y |θ(y )}]
pD proposal as the effective number of parameters w.r.t.
model with focus Θ
Ilaria Masiani
October 21, 2013
18. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Bayesian measure of model
complexity
Observations on pD
Bayesian measure of model complexity
unknown θt replaced by random variable θ
˜
dΘ {y , θ, θ(y )} estimated by its posterior expectation w.r.t.
p(θ|y ) :
˜
˜
pD {y , Θ, θ(y )} = Eθ|y [dΘ {y , θ, θ(y )}]
˜
= Eθ|y [−2log{p(y |θ)}] + 2log[p{y |θ(y )}]
pD proposal as the effective number of parameters w.r.t.
model with focus Θ
Ilaria Masiani
October 21, 2013
19. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Bayesian measure of model
complexity
Observations on pD
Effective number of parameters
˜
¯
tipically θ(y ) = E(θ|y ) = θ.
f (y ) fully specified standardizing term, function of the data
Then
Definition
¯
pD = D(θ) − D(θ)
where
D(θ) = −2log{p(y |θ)} + 2log{f (y )}
is the ’Bayesian deviance’.
Ilaria Masiani
October 21, 2013
(1)
20. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Bayesian measure of model
complexity
Observations on pD
Effective number of parameters
˜
¯
tipically θ(y ) = E(θ|y ) = θ.
f (y ) fully specified standardizing term, function of the data
Then
Definition
¯
pD = D(θ) − D(θ)
where
D(θ) = −2log{p(y |θ)} + 2log{f (y )}
is the ’Bayesian deviance’.
Ilaria Masiani
October 21, 2013
(1)
21. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Bayesian measure of model
complexity
Outline
1
Introduction
2
Complexity of a Bayesian model
Observations on pD
3
Forms for pD
4
Diagnostics for fit
5
Model comparison criterion
6
Examples
7
Conclusion
Ilaria Masiani
October 21, 2013
Observations on pD
22. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Bayesian measure of model
complexity
Observations on pD
Observations on pD
1
¯
(1) can be rewritten as D(θ) = D(θ) + pD =⇒ measure of
’adeguacy’
3
pD depends on: data, choice of focus Θ, prior info, choice
˜
of θ(y ) =⇒ lack of invariance to tranformations
˜
using θ(y ) = E(θ|y ), pD ≥ 0 for any log-concave likelihood
in θ (Jensen’s inequality) =⇒ negative pD s indicate conflict
between prior and data
4
pD easily calculated after a MCMC run
2
Ilaria Masiani
October 21, 2013
23. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Bayesian measure of model
complexity
Observations on pD
Observations on pD
1
¯
(1) can be rewritten as D(θ) = D(θ) + pD =⇒ measure of
’adeguacy’
3
pD depends on: data, choice of focus Θ, prior info, choice
˜
of θ(y ) =⇒ lack of invariance to tranformations
˜
using θ(y ) = E(θ|y ), pD ≥ 0 for any log-concave likelihood
in θ (Jensen’s inequality) =⇒ negative pD s indicate conflict
between prior and data
4
pD easily calculated after a MCMC run
2
Ilaria Masiani
October 21, 2013
24. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Bayesian measure of model
complexity
Observations on pD
Observations on pD
1
¯
(1) can be rewritten as D(θ) = D(θ) + pD =⇒ measure of
’adeguacy’
3
pD depends on: data, choice of focus Θ, prior info, choice
˜
of θ(y ) =⇒ lack of invariance to tranformations
˜
using θ(y ) = E(θ|y ), pD ≥ 0 for any log-concave likelihood
in θ (Jensen’s inequality) =⇒ negative pD s indicate conflict
between prior and data
4
pD easily calculated after a MCMC run
2
Ilaria Masiani
October 21, 2013
25. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Bayesian measure of model
complexity
Observations on pD
Observations on pD
1
¯
(1) can be rewritten as D(θ) = D(θ) + pD =⇒ measure of
’adeguacy’
3
pD depends on: data, choice of focus Θ, prior info, choice
˜
of θ(y ) =⇒ lack of invariance to tranformations
˜
using θ(y ) = E(θ|y ), pD ≥ 0 for any log-concave likelihood
in θ (Jensen’s inequality) =⇒ negative pD s indicate conflict
between prior and data
4
pD easily calculated after a MCMC run
2
Ilaria Masiani
October 21, 2013
26. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
pD for approximately normal
likelihoods
pD for normal likelihoods
Outline
1
Introduction
2
Complexity of a Bayesian model
3
Forms for pD
4
Diagnostics for fit
5
Model comparison criterion
6
Examples
7
Conclusion
Ilaria Masiani
October 21, 2013
pD for exponential family likelihoods
27. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
pD for approximately normal
likelihoods
pD for normal likelihoods
Outline
1
Introduction
2
Complexity of a Bayesian model
3
Forms for pD
pD for approximately normal likelihoods
4
Diagnostics for fit
5
Model comparison criterion
6
Examples
7
Conclusion
Ilaria Masiani
October 21, 2013
pD for exponential family likelihoods
28. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
pD for approximately normal
likelihoods
pD for normal likelihoods
pD for exponential family likelihoods
Negligible prior informations
ˆ
ˆ
Assume θ|y ∼ N(θ, −Lθ ), then expanding D(θ) around θ
ˆ
ˆ
ˆ
ˆ
D(θ) ≈ D(θ) − (θ − θ)T Lθ (θ − θ)
ˆ
ˆ
≈ D(θ) + χ2
p
=⇒
ˆ
pD = Eθ|y {D(θ)} − D(θ) ≈ p
Ilaria Masiani
October 21, 2013
(2)
29. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
pD for approximately normal
likelihoods
pD for normal likelihoods
pD for exponential family likelihoods
Negligible prior informations
ˆ
ˆ
Assume θ|y ∼ N(θ, −Lθ ), then expanding D(θ) around θ
ˆ
ˆ
ˆ
ˆ
D(θ) ≈ D(θ) − (θ − θ)T Lθ (θ − θ)
ˆ
ˆ
≈ D(θ) + χ2
p
=⇒
ˆ
pD = Eθ|y {D(θ)} − D(θ) ≈ p
Ilaria Masiani
October 21, 2013
(2)
30. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
pD for approximately normal
likelihoods
pD for normal likelihoods
Outline
1
Introduction
2
Complexity of a Bayesian model
3
Forms for pD
pD for normal likelihoods
4
Diagnostics for fit
5
Model comparison criterion
6
Examples
7
Conclusion
Ilaria Masiani
October 21, 2013
pD for exponential family likelihoods
31. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
pD for approximately normal
likelihoods
pD for normal likelihoods
pD for exponential family likelihoods
General hierarchical normal model (know variance)
y ∼ N(A1 θ, C1 )
θ ∼ N(A2 φ, C2 )
¯
Then θ|y is normal with mean θ = Vb and covariance V .
=⇒
pD = tr (−L V )
−1
where −L = AT C1 A1 is the Fisher information.
1
In this case, pD is invariant to affine tranformations of θ.
Ilaria Masiani
October 21, 2013
32. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
pD for approximately normal
likelihoods
pD for normal likelihoods
pD for exponential family likelihoods
General hierarchical normal model (know variance)
y ∼ N(A1 θ, C1 )
θ ∼ N(A2 φ, C2 )
¯
Then θ|y is normal with mean θ = Vb and covariance V .
=⇒
pD = tr (−L V )
−1
where −L = AT C1 A1 is the Fisher information.
1
In this case, pD is invariant to affine tranformations of θ.
Ilaria Masiani
October 21, 2013
33. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
pD for approximately normal
likelihoods
pD for normal likelihoods
pD for exponential family likelihoods
General hierarchical normal model (know variance)
y ∼ N(A1 θ, C1 )
θ ∼ N(A2 φ, C2 )
¯
Then θ|y is normal with mean θ = Vb and covariance V .
=⇒
pD = tr (−L V )
−1
where −L = AT C1 A1 is the Fisher information.
1
In this case, pD is invariant to affine tranformations of θ.
Ilaria Masiani
October 21, 2013
34. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
pD for approximately normal
likelihoods
pD for normal likelihoods
pD for exponential family likelihoods
In normal models:
ˆ
y = Hy , with H hat matrix (that projects the data onto the
−1
fitted values) =⇒ H = A1 VAT C1
1
Then
pD = tr (H)
tr (H) = sum of leverages (influence of each observation
on its fitted value)
Ilaria Masiani
October 21, 2013
35. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
pD for approximately normal
likelihoods
pD for normal likelihoods
pD for exponential family likelihoods
Conjugate normal-gamma model (unknow precision τ )
y ∼ N(A1 θ, τ −1 C1 )
θ ∼ N(A2 φ, τ −1 C2 )
¯ τ ˆ
τ
pD = tr (H) + q(θ)(¯ − τ ) − n{log(τ ) − log(ˆ)}
−1
where q(θ) = (y − A1 θ)T C1 (y − A1 θ).
It can be shown that for large n the choice of parameterization
of τ will make little difference to pD .
Ilaria Masiani
October 21, 2013
36. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
pD for approximately normal
likelihoods
pD for normal likelihoods
pD for exponential family likelihoods
Conjugate normal-gamma model (unknow precision τ )
y ∼ N(A1 θ, τ −1 C1 )
θ ∼ N(A2 φ, τ −1 C2 )
¯ τ ˆ
pD = tr (H) + q(θ)(¯ − τ ) − n{log(τ ) − log(ˆ)}
τ
−1
where q(θ) = (y − A1 θ)T C1 (y − A1 θ).
It can be shown that for large n the choice of parameterization
of τ will make little difference to pD .
Ilaria Masiani
October 21, 2013
37. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
pD for approximately normal
likelihoods
pD for normal likelihoods
pD for exponential family likelihoods
Conjugate normal-gamma model (unknow precision τ )
y ∼ N(A1 θ, τ −1 C1 )
θ ∼ N(A2 φ, τ −1 C2 )
¯ τ ˆ
pD = tr (H) + q(θ)(¯ − τ ) − n{log(τ ) − log(ˆ)}
τ
−1
where q(θ) = (y − A1 θ)T C1 (y − A1 θ).
It can be shown that for large n the choice of parameterization
of τ will make little difference to pD .
Ilaria Masiani
October 21, 2013
38. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
pD for approximately normal
likelihoods
pD for normal likelihoods
Outline
1
Introduction
2
Complexity of a Bayesian model
3
Forms for pD
pD for exponential family likelihoods
4
Diagnostics for fit
5
Model comparison criterion
6
Examples
7
Conclusion
Ilaria Masiani
October 21, 2013
pD for exponential family likelihoods
39. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
pD for approximately normal
likelihoods
pD for normal likelihoods
pD for exponential family likelihoods
One-parameter exponential family
Definition
Assume to have p groups of observations, each of ni
observations in group i has same distribution.
For jth observation in ith group:
log{p(yij |θi , φ)} = wi {yij θi − b(θi )}/φ + c(yij , φ)
where
µi = E(Yij |θi , φ) = b (θi )
V (Yij |θi , φ) = b (θi )φ/wi
wi constant.
Ilaria Masiani
October 21, 2013
40. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
pD for approximately normal
likelihoods
pD for normal likelihoods
pD for exponential family likelihoods
One-parameter exponential family
¯
If Θ focus, bi = Eθi |y {b(θi )}, then the contribution of ith group to
the effective number of parameters:
Θ
¯
¯
pDi = 2ni wi {bi − b(θi )}/φ
=⇒ lack of invariance of pD to reparametrization
Ilaria Masiani
October 21, 2013
41. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
pD for approximately normal
likelihoods
pD for normal likelihoods
pD for exponential family likelihoods
One-parameter exponential family
¯
If Θ focus, bi = Eθi |y {b(θi )}, then the contribution of ith group to
the effective number of parameters:
Θ
¯
¯
pDi = 2ni wi {bi − b(θi )}/φ
=⇒ lack of invariance of pD to reparametrization
Ilaria Masiani
October 21, 2013
42. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Outline
1
Introduction
2
Complexity of a Bayesian model
3
Forms for pD
4
Diagnostics for fit
5
Model comparison criterion
6
Examples
7
Conclusion
Ilaria Masiani
October 21, 2013
43. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Sampling theory diagnostics for lack of Bayesian fit
Eθ|y {D(θ)} = D(θ) measure of fit or ’adeguacy’
If the model is true
¯
EY (D) = EY [Eθ|y {D(θ)}]
= Eθ (EY |θ [−2log
p(Y |θ)
])
ˆ
p{Y |θ(Y )}
≈ Eθ [EY |θ (χ2 )]
p
= Eθ (p) = p
For one-parameter exponential family p = n, then
¯
EY (D) ≈ n
Ilaria Masiani
October 21, 2013
44. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Sampling theory diagnostics for lack of Bayesian fit
Eθ|y {D(θ)} = D(θ) measure of fit or ’adeguacy’
If the model is true
¯
EY (D) = EY [Eθ|y {D(θ)}]
= Eθ (EY |θ [−2log
p(Y |θ)
])
ˆ
p{Y |θ(Y )}
≈ Eθ [EY |θ (χ2 )]
p
= Eθ (p) = p
For one-parameter exponential family p = n, then
¯
EY (D) ≈ n
Ilaria Masiani
October 21, 2013
45. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Sampling theory diagnostics for lack of Bayesian fit
Eθ|y {D(θ)} = D(θ) measure of fit or ’adeguacy’
If the model is true
¯
EY (D) = EY [Eθ|y {D(θ)}]
= Eθ (EY |θ [−2log
p(Y |θ)
])
ˆ
p{Y |θ(Y )}
≈ Eθ [EY |θ (χ2 )]
p
= Eθ (p) = p
For one-parameter exponential family p = n, then
¯
EY (D) ≈ n
Ilaria Masiani
October 21, 2013
46. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Definition of the problem
Classical criteria for model
comparison
Outline
1
Introduction
2
Complexity of a Bayesian model
3
Forms for pD
4
Diagnostics for fit
5
Model comparison criterion
6
Examples
7
Conclusion
Ilaria Masiani
October 21, 2013
Bayesian criteria for model
comparison
47. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Definition of the problem
Classical criteria for model
comparison
Outline
1
Introduction
2
Complexity of a Bayesian model
3
Forms for pD
4
Diagnostics for fit
5
Model comparison criterion
Definition of the problem
6
Examples
7
Conclusion
Ilaria Masiani
October 21, 2013
Bayesian criteria for model
comparison
48. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Definition of the problem
Classical criteria for model
comparison
Bayesian criteria for model
comparison
Model comparison: the problem
Yrep = independent replicate data set
˜
˜
L(Y , θ) = loss in assigning to data Y a probability p(Y |θ)
˜
L(y , θ(y )) = ’apparent’ loss repredicting the observed y
˜
˜
˜
EYrep |θt [L{y , θ(y )}] = L{y , θ(y )} + cΘ {y , θt , θ(y )}
˜
where cΘ is the ’optimism’ associated with the estimator θ(y )
(Efron, 1986)
Ilaria Masiani
October 21, 2013
49. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Definition of the problem
Classical criteria for model
comparison
Bayesian criteria for model
comparison
Model comparison: the problem
Yrep = independent replicate data set
˜
˜
L(Y , θ) = loss in assigning to data Y a probability p(Y |θ)
˜
L(y , θ(y )) = ’apparent’ loss repredicting the observed y
˜
˜
˜
EYrep |θt [L{y , θ(y )}] = L{y , θ(y )} + cΘ {y , θt , θ(y )}
˜
where cΘ is the ’optimism’ associated with the estimator θ(y )
(Efron, 1986)
Ilaria Masiani
October 21, 2013
50. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Definition of the problem
Classical criteria for model
comparison
Bayesian criteria for model
comparison
˜
˜
Assuming L(Y , θ) = −2log{p(Y |θ)},
to estimate cΘ :
1
Classical approach: attempts to estimate the sampling
expectation of cΘ
2
Bayesian approach: direct calculation of the posterior
expectation of cΘ
Ilaria Masiani
October 21, 2013
51. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Definition of the problem
Classical criteria for model
comparison
Bayesian criteria for model
comparison
˜
˜
Assuming L(Y , θ) = −2log{p(Y |θ)},
to estimate cΘ :
1
Classical approach: attempts to estimate the sampling
expectation of cΘ
2
Bayesian approach: direct calculation of the posterior
expectation of cΘ
Ilaria Masiani
October 21, 2013
52. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Definition of the problem
Classical criteria for model
comparison
Bayesian criteria for model
comparison
˜
˜
Assuming L(Y , θ) = −2log{p(Y |θ)},
to estimate cΘ :
1
Classical approach: attempts to estimate the sampling
expectation of cΘ
2
Bayesian approach: direct calculation of the posterior
expectation of cΘ
Ilaria Masiani
October 21, 2013
53. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Definition of the problem
Classical criteria for model
comparison
Outline
1
Introduction
2
Complexity of a Bayesian model
3
Forms for pD
4
Diagnostics for fit
5
Model comparison criterion
Classical criteria for model comparison
6
Examples
7
Conclusion
Ilaria Masiani
October 21, 2013
Bayesian criteria for model
comparison
54. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Definition of the problem
Classical criteria for model
comparison
Bayesian criteria for model
comparison
˜
Expected optimism: π(θt ) = EY |θt [cΘ {Y , θt , θ(Y )}]
All criteria for models comparison based on minimizing
ˆ
˜
˜
EYrep |θt [L{Yrep , θ(y )}] = L{y , θ(y )} + π (θt )
ˆ
Efron (1986) π(θt ) for the log-loss function: πE (θt ) ≈ 2p
Considered as corresponding to a plug-in estimate of fit +
twice the effective number of parameters in the model
Ilaria Masiani
October 21, 2013
55. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Definition of the problem
Classical criteria for model
comparison
Bayesian criteria for model
comparison
˜
Expected optimism: π(θt ) = EY |θt [cΘ {Y , θt , θ(Y )}]
All criteria for models comparison based on minimizing
ˆ
˜
˜
EYrep |θt [L{Yrep , θ(y )}] = L{y , θ(y )} + π (θt )
ˆ
Efron (1986) π(θt ) for the log-loss function: πE (θt ) ≈ 2p
Considered as corresponding to a plug-in estimate of fit +
twice the effective number of parameters in the model
Ilaria Masiani
October 21, 2013
56. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Definition of the problem
Classical criteria for model
comparison
Bayesian criteria for model
comparison
˜
Expected optimism: π(θt ) = EY |θt [cΘ {Y , θt , θ(Y )}]
All criteria for models comparison based on minimizing
ˆ
˜
˜
EYrep |θt [L{Yrep , θ(y )}] = L{y , θ(y )} + π (θt )
ˆ
Efron (1986) π(θt ) for the log-loss function: πE (θt ) ≈ 2p
Considered as corresponding to a plug-in estimate of fit +
twice the effective number of parameters in the model
Ilaria Masiani
October 21, 2013
57. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Definition of the problem
Classical criteria for model
comparison
Bayesian criteria for model
comparison
˜
Expected optimism: π(θt ) = EY |θt [cΘ {Y , θt , θ(Y )}]
All criteria for models comparison based on minimizing
ˆ
˜
˜
EYrep |θt [L{Yrep , θ(y )}] = L{y , θ(y )} + π (θt )
ˆ
Efron (1986) π(θt ) for the log-loss function: πE (θt ) ≈ 2p
Considered as corresponding to a plug-in estimate of fit +
twice the effective number of parameters in the model
Ilaria Masiani
October 21, 2013
58. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Definition of the problem
Classical criteria for model
comparison
Outline
1
Introduction
2
Complexity of a Bayesian model
3
Forms for pD
4
Diagnostics for fit
5
Model comparison criterion
Bayesian criteria for model comparison
6
Examples
7
Conclusion
Ilaria Masiani
October 21, 2013
Bayesian criteria for model
comparison
59. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Definition of the problem
Classical criteria for model
comparison
Bayesian criteria for model
comparison
AIME: identify models that best explain the observed data
but
with the expectation that they minimize uncertainty about
observations generated in the same way
Ilaria Masiani
October 21, 2013
60. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Definition of the problem
Classical criteria for model
comparison
Bayesian criteria for model
comparison
Deviance information criterion (DIC)
Definition
¯
DIC = D(θ) + 2pD
¯
= D + pD
Classical estimate of fit + twice the effective number of
parameters
Also a Bayesian measure of fit, penalized by complexity pD
Ilaria Masiani
October 21, 2013
61. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Definition of the problem
Classical criteria for model
comparison
Bayesian criteria for model
comparison
DIC and AIC
ˆ
Akaike information criterion=⇒ AIC= 2p − 2log{p(y |θ)}
ˆ
θ =MLE
From result (2): pD ≈ p in models with negligible prior
¯
information =⇒ DIC≈ 2p + D(θ)
Ilaria Masiani
October 21, 2013
62. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Spatial distribution of lip cancer
Outline
1
Introduction
2
Complexity of a Bayesian model
3
Forms for pD
4
Diagnostics for fit
5
Model comparison criterion
6
Examples
7
Conclusion
Ilaria Masiani
October 21, 2013
Six-cities study
63. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Spatial distribution of lip cancer
Outline
1
Introduction
2
Complexity of a Bayesian model
3
Forms for pD
4
Diagnostics for fit
5
Model comparison criterion
6
Examples
Spatial distribution of lip cancer in Scotland
7
Conclusion
Ilaria Masiani
October 21, 2013
Six-cities study
64. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Spatial distribution of lip cancer
Six-cities study
Data on the rates of lip cancer in 56 districts in Scotland
(Clayton and Kaldor, 1987; Breslow and Clayton, 1993)
yi observed numbers of cases for each county i
Ei expected numbers of cases for each county i
Ai list for each county of its ni adjacent counties
yi ∼ Pois(exp{θi }Ei )
exp{θi } underlying true area-specific relative risk of lip cancer
Ilaria Masiani
October 21, 2013
65. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Spatial distribution of lip cancer
Six-cities study
Data on the rates of lip cancer in 56 districts in Scotland
(Clayton and Kaldor, 1987; Breslow and Clayton, 1993)
yi observed numbers of cases for each county i
Ei expected numbers of cases for each county i
Ai list for each county of its ni adjacent counties
yi ∼ Pois(exp{θi }Ei )
exp{θi } underlying true area-specific relative risk of lip cancer
Ilaria Masiani
October 21, 2013
66. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Spatial distribution of lip cancer
Six-cities study
Candidate models for θi
Model 1:
θi = α0
Model 2:
θi = α0 + γi
(exchangeable random effect)
Model 3:
θi = α0 + δi
(spatial random effect)
Model 4:
θi = α0 + γi + δi
Model 5:
θi = αi
Ilaria Masiani
(pooled)
(exchang.+ spatial effects)
(saturated)
October 21, 2013
67. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Spatial distribution of lip cancer
Six-cities study
Priors
α0 improper uniform prior
αi (i = 1, ..., 56) normal priors with large variance
γi ∼ N(0, λ−1 )
γ
δi |δi ∼ N
1
ni
j∈Ai
δj , ni1 δ
λ
with
56
i=1 δi
=0
conditional autoregressive prior (Besag, 1974)
λγ , λδ ∼ Gamma(0.5, 0.0005)
Ilaria Masiani
October 21, 2013
68. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Spatial distribution of lip cancer
Six-cities study
Saturated deviance
[yi log{yi /exp(θi )Ei } − {yi − exp(θi )Ei }]
D(θ) = 2
i
(McCullagh and Nelder, 1989, pg 34)
obtained by taking as standardizing factor:
ˆ
−2log{f (y )} = −2 i log{p(yi |θi )} = 208.0
Ilaria Masiani
October 21, 2013
69. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Spatial distribution of lip cancer
Six-cities study
Results
For each model, two independent chains of MCMC (WinBUGS)
for 15000 iterations each (burn-in after 5000 it.)
Deviance summaries using three alternative parameterizations
(mean, canonical, median).
Ilaria Masiani
October 21, 2013
70. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Spatial distribution of lip cancer
Six-cities study
Deviance calculations
¯
D mean of the posterior samples of the saturated deviance
D(¯) by plugging the posterior mean of µi = exp(θi )Ei into
µ
the saturated deviance
¯
D(θ) by plugging the posterior means of α0 , αi , γi , δi into
the linear predictor θi
D(med) by plugging the posterior median of θi into the
saturated deviance
Ilaria Masiani
October 21, 2013
71. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Spatial distribution of lip cancer
Six-cities study
Deviance calculations
¯
D mean of the posterior samples of the saturated deviance
D(¯) by plugging the posterior mean of µi = exp(θi )Ei into
µ
the saturated deviance
¯
D(θ) by plugging the posterior means of α0 , αi , γi , δi into
the linear predictor θi
D(med) by plugging the posterior median of θi into the
saturated deviance
Ilaria Masiani
October 21, 2013
72. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Spatial distribution of lip cancer
Six-cities study
Deviance calculations
¯
D mean of the posterior samples of the saturated deviance
D(¯) by plugging the posterior mean of µi = exp(θi )Ei into
µ
the saturated deviance
¯
D(θ) by plugging the posterior means of α0 , αi , γi , δi into
the linear predictor θi
D(med) by plugging the posterior median of θi into the
saturated deviance
Ilaria Masiani
October 21, 2013
73. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Spatial distribution of lip cancer
Six-cities study
Deviance calculations
¯
D mean of the posterior samples of the saturated deviance
D(¯) by plugging the posterior mean of µi = exp(θi )Ei into
µ
the saturated deviance
¯
D(θ) by plugging the posterior means of α0 , αi , γi , δi into
the linear predictor θi
D(med) by plugging the posterior median of θi into the
saturated deviance
Ilaria Masiani
October 21, 2013
74. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Spatial distribution of lip cancer
Observations on pD s results
Ilaria Masiani
October 21, 2013
Six-cities study
75. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Spatial distribution of lip cancer
Six-cities study
Observations on pD s results
From result (2): pD ≈ p
pooled model 1: pD = 1.0
saturated model 5: pD from 52.8 to 55.9
models 3-4 with spatial random effects: pD around 31
model 2 with only exchangeable random effects: pD
around 43
Ilaria Masiani
October 21, 2013
76. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Spatial distribution of lip cancer
Comparison of DIC
Ilaria Masiani
October 21, 2013
Six-cities study
77. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Spatial distribution of lip cancer
Six-cities study
Comparison of DIC
DIC subject to Monte Carlo sampling error (function of
stochastic quantities)
Either of models 3 or 4 is superior to the others
Models 2 and 5 are superior to model 1
Ilaria Masiani
October 21, 2013
78. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Spatial distribution of lip cancer
Six-cities study
¯
Absolute measure of fit: compare D with n = 56
All models (except pooled model 1) adequate overall fit to the
data =⇒ comparison essentially based on pD s
Ilaria Masiani
October 21, 2013
79. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Spatial distribution of lip cancer
Six-cities study
¯
Absolute measure of fit: compare D with n = 56
All models (except pooled model 1) adequate overall fit to the
data =⇒ comparison essentially based on pD s
Ilaria Masiani
October 21, 2013
80. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Spatial distribution of lip cancer
Outline
1
Introduction
2
Complexity of a Bayesian model
3
Forms for pD
4
Diagnostics for fit
5
Model comparison criterion
6
Examples
Six-cities study
7
Conclusion
Ilaria Masiani
October 21, 2013
Six-cities study
81. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Spatial distribution of lip cancer
Six-cities study
Subset of data from the six-cities study: longitudinal study of
health effects of air pollution (Fitzmaurice and Laird, 1993)
yij repeated binary measurement of the wheezing status of
child i at time j (1, yes; 0, no), i = 1, ..., I, j = 1, ..., J
I = 537 children living in Stuebenville, Ohio
J = 4 time points
aij age of child i in years at measurement point j (7, 8, 9,
10 years)
si smoking status of child i’s mother (1, yes; 0, no)
Ilaria Masiani
October 21, 2013
82. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Spatial distribution of lip cancer
Six-cities study
Subset of data from the six-cities study: longitudinal study of
health effects of air pollution (Fitzmaurice and Laird, 1993)
yij repeated binary measurement of the wheezing status of
child i at time j (1, yes; 0, no), i = 1, ..., I, j = 1, ..., J
I = 537 children living in Stuebenville, Ohio
J = 4 time points
aij age of child i in years at measurement point j (7, 8, 9,
10 years)
si smoking status of child i’s mother (1, yes; 0, no)
Ilaria Masiani
October 21, 2013
83. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Spatial distribution of lip cancer
Six-cities study
Conditional response model
Yij ∼ Bernoulli(pij )
pij = Pr(Yij = 1) = g −1 (µij )
µij = β0 + β1 zij1 + β2 zij2 + β3 zij3 + bi
¯
zijk = xijk − x ..k , k = 1, 2, 3
xij1 = aij , xij2 = si , xij3 = aij si
bi individual-specific random effects: bi ∼ N(0, λ−1 )
Ilaria Masiani
October 21, 2013
84. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Spatial distribution of lip cancer
Six-cities study
Conditional response model
Yij ∼ Bernoulli(pij )
pij = Pr(Yij = 1) = g −1 (µij )
µij = β0 + β1 zij1 + β2 zij2 + β3 zij3 + bi
¯
zijk = xijk − x ..k , k = 1, 2, 3
xij1 = aij , xij2 = si , xij3 = aij si
bi individual-specific random effects: bi ∼ N(0, λ−1 )
Ilaria Masiani
October 21, 2013
85. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Spatial distribution of lip cancer
Six-cities study
Conditional response model
Yij ∼ Bernoulli(pij )
pij = Pr(Yij = 1) = g −1 (µij )
µij = β0 + β1 zij1 + β2 zij2 + β3 zij3 + bi
¯
zijk = xijk − x ..k , k = 1, 2, 3
xij1 = aij , xij2 = si , xij3 = aij si
bi individual-specific random effects: bi ∼ N(0, λ−1 )
Ilaria Masiani
October 21, 2013
86. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Spatial distribution of lip cancer
Six-cities study
Conditional response model
Yij ∼ Bernoulli(pij )
pij = Pr(Yij = 1) = g −1 (µij )
µij = β0 + β1 zij1 + β2 zij2 + β3 zij3 + bi
¯
zijk = xijk − x ..k , k = 1, 2, 3
xij1 = aij , xij2 = si , xij3 = aij si
bi individual-specific random effects: bi ∼ N(0, λ−1 )
Ilaria Masiani
October 21, 2013
87. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Spatial distribution of lip cancer
Six-cities study
Model choice: link function g(·)
Model 1:
g(pij ) = logit(pij ) = log{pij /(1 − pij )}
Model 2:
g(pij ) = probit(pij ) = Φ−1 (pij )
Model 3:
g(pij ) = cloglog(pij ) = log{−log(1 − pij )}
Ilaria Masiani
October 21, 2013
88. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Spatial distribution of lip cancer
Six-cities study
Priors and deviance form
βk flat priors
λ ∼ Gamma(0.001, 0.001)
D = −2
{yij log(pij ) + (1 − yij )log(1 − pij )}
i,j
Ilaria Masiani
October 21, 2013
89. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Spatial distribution of lip cancer
Six-cities study
Results
Gibbs sampler for 5000 iterations (burn-in after 1000 it.)
Deviance summaries for canonical and mean
parameterizations.
Ilaria Masiani
October 21, 2013
90. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Outline
1
Introduction
2
Complexity of a Bayesian model
3
Forms for pD
4
Diagnostics for fit
5
Model comparison criterion
6
Examples
7
Conclusion
Ilaria Masiani
October 21, 2013
91. Introduction
Complexity
Forms for pD
Diagnostics for fit
Model comparison criterion
Examples
Conclusion
Conclusion
pD may not be invariant to the chosen parametrization
Similarities to frequentist measures but based on
expectations w.r.t. parameters, in place of sampling
expectations
DIC viewed as a Bayesian analogue of AIC, similar
justification but wider applicability
Involves Monte Carlo sampling and negligible analytic work
Ilaria Masiani
October 21, 2013
92. Appendix
References
References I
McCullagh, P. and Nelder, J.
Generalized Linear Models.
2nd edn. London: Chapman and Hall, 1989.
Besag, J.
Spatial interaction and the statistical analysis of lattice
systems.
J. R. Statist. Soc., series B, 36, 192-236, 1974.
Clayton, D.G. and Kaldor, J.
Empirical Bayes estimates of age-standardised relative risk
for use in disease mapping.
Biometrics, 43, 671-681, 1987.
Ilaria Masiani
October 21, 2013
93. Appendix
References
References II
Efron, B.
How biased is the apparent error rate of a prediction rule?
J. Ann. Statistic. Ass., 81, 461-470, 1986.
Fitzmaurice, G. and Laird, N.
A likelihood-based method for analysing longitudinal binary
responses.
Biometrika, 80, 141-151, 1993.
Kullback, S. and Leibler, R.A.
On information and sufficienty.
Ann. Math. Statist., 22, 79-86, 1951.
Ilaria Masiani
October 21, 2013
94. Appendix
References
References III
Spiegelhalter, D.J., Best, N.G., Carlin, B.P. and van der
Linde, A.
Bayesian measures of model complexity and fit.
J. Royal Statistical Society, series B, vol.64, Part 4, pp.
583-639, 2002.
Ilaria Masiani
October 21, 2013