Uneak White's Personal Brand Exploration Presentation
Minghui Conference Cross-Validation Talk
1. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
.
......
Challenges with the Use of Cross-validation for
Comparing Structured Models
Wei Wang
joint work with Andrew Gelman
Department of Statistics, Columbia University
April 13, 2013
2. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
Overview
...1 Multilevel Models
...2 Decision-eoretic Model Assessment Framework
...3 Data and Model
...4 Results
4. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
Bayesian Interpretation of Multilevel Models
Multilevel Models have long been proposed to handle data with
group structures, e.g., longitudinal study with multiple obs. for
each participant, national survey with various demographic and
geographic variables.
5. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
Bayesian Interpretation of Multilevel Models
Multilevel Models have long been proposed to handle data with
group structures, e.g., longitudinal study with multiple obs. for
each participant, national survey with various demographic and
geographic variables.
From a Bayesian point of view, what Multilevel Modeling does is
to partially pool the estimates through a prior, as opposed to
doing separate analysis for each group (no pooling) or analyzing
the data as if there is no group structure (complete pooling).
6. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
Multilevel Models for Deeply Nested Data Structure
Our substantive interest is survey data with deeply nested
structures resulting from various categorical
demographic-geographic variables, e.g., state, income, education,
ethnicity et al.
7. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
Multilevel Models for Deeply Nested Data Structure
Our substantive interest is survey data with deeply nested
structures resulting from various categorical
demographic-geographic variables, e.g., state, income, education,
ethnicity et al.
One typical conundrum is how many interactions between those
demographic-geographic variables to include in the model.
8. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
ree Prototypes of Models
In the simple case of two predictors, the three prototypes of models are
shown below. e response yi is binary.
Complete Pooling model
Eyij ∼ g−1
(µij)
µij = µ0 + ai + bj
No Pooling model
Eyij ∼ g−1
(µij)
µij = µ0 + ai + bj + rij
Partial Pooling model
Eyij ∼ g−1
(µij)
µij = µ0 + ai + bj + γij
γ ∼ Φ(·)
10. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
True model, Pseudo-true model and Actual Belief model
We assume there is a true underlying model pt(·), from which the
observations (both available and future observations) come from.
While acknowledging the fact that the true distribution is never
accessible, some researchers propose basing the discussion on a
rich enough Actual Belief Model), which supposedly fully re ects
the uncertainty of future data. (Bernardo and Smith 1994)
11. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
M-closed, M-completed and M-open views
In M-closed view, it is assumed that the true model is included
in a enumerable collection of models, and the Actual Belief
Model is the Bayesian Model Averaging predictive distribution.
In M-completed view, the Actual Belief Model p(˜y|D, M) is
considered to be the best available description of the uncertainty
of future data.
In M-open view, the correct speci cation of the Actual Belief
Model is avoided and the strategy is to generate Monte Carlo
samples from it, such as sample re-use methods.
12. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
A Decision-eoretical Framework
We de ne a loss function l(˜y, aM), which is the loss incurred
from our inferential action aM, based on a model M, in face of
future observation ˜y.
en the predictive loss from our inferential action aM is
Lp(pt
, M, D, l) = Ept(˜y)l(˜y, aM) =
∫
l(˜y, aM)pt
(˜y)d˜y
It is oen convenient and theoretically desirable to use the whole
posterior predictive distribution as aM and the log loss as l(·, ·).
Lpred(pt,M,D)=Ept [− log p(˜y|D,M)]=−
∫
pt(˜y) log p(˜y|D,M)d˜y
13. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
Decision-eoretic Framework Cont'd
For Model Selection task, from a pool of candidate models
{Mk : k ∈ K}, we should select the model that minimizes the
expected predictive loss.
min
Mk:k∈K
−
∫
pt
(˜y) log p(˜y|D, M)d˜y
14. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
Decision-eoretic Framework Cont'd
For Model Selection task, from a pool of candidate models
{Mk : k ∈ K}, we should select the model that minimizes the
expected predictive loss.
min
Mk:k∈K
−
∫
pt
(˜y) log p(˜y|D, M)d˜y
For Model Assessment task of a particular model M, we look at
the Kullback-Leibler divergence between the true model and the
posterior predictive distribution. We call it the predictive error.
Err(pt
, M, D) = −
∫
pt
(˜y) log p(˜y|D, M)d˜y +
∫
pt
(˜y) log pt
(˜y)d˜y
= KL(p(·|D, M); pt
(·))
15. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
Estimating Expected Predictive Loss
e central obstacle of getting the Expected Predicitve Loss is
that we don't know the true distribution pt(·).
16. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
Estimating Expected Predictive Loss
e central obstacle of getting the Expected Predicitve Loss is
that we don't know the true distribution pt(·).
A M-closed or M-completed view will substitute the true
distribution with a reference distribution.
17. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
Estimating Expected Predictive Loss
e central obstacle of getting the Expected Predicitve Loss is
that we don't know the true distribution pt(·).
A M-closed or M-completed view will substitute the true
distribution with a reference distribution.
From a M-open view, plug in available sample gives us the
Training Loss, which has a downward bias, since we used the
sample twice.
Ltraining(M, D) = −
1
n
n∑
i=1
log p(yi|D, M)
18. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
Estimating Expected Predictive Loss
e central obstacle of getting the Expected Predicitve Loss is
that we don't know the true distribution pt(·).
A M-closed or M-completed view will substitute the true
distribution with a reference distribution.
From a M-open view, plug in available sample gives us the
Training Loss, which has a downward bias, since we used the
sample twice.
Ltraining(M, D) = −
1
n
n∑
i=1
log p(yi|D, M)
ere exist two approaches to get an unbiased estimate of
Predictive Loss: Bias Correction which leads to various
Information Criteria; Held-out Practices which lead to
Leave-one-out Cross Validation and k-fold Cross Validation.
19. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
Estimation Methods
ere is a long list of variants of Information Criteria,
AIC/BIC/DIC/TIC/NIC/WAIC et al.
20. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
Estimation Methods
ere is a long list of variants of Information Criteria,
AIC/BIC/DIC/TIC/NIC/WAIC et al.
LOO Cross Validation has been shown to be asymptotically
equivalent to AIC/WAIC. But the computational burden is huge.
e Importance Sampling method introduces new problem of
the reliability of the importance weights.
21. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
Estimation Methods
ere is a long list of variants of Information Criteria,
AIC/BIC/DIC/TIC/NIC/WAIC et al.
LOO Cross Validation has been shown to be asymptotically
equivalent to AIC/WAIC. But the computational burden is huge.
e Importance Sampling method introduces new problem of
the reliability of the importance weights.
We are using the computationally convenient k-fold cross
validation, in which the data set is randomly partitioned into k
parts, and in each fold, one part is used as the testing set while
the rest serve as the training set.
22. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
k-fold Cross Validation
en the k-fold Cross Validation estimate of the Predictive Loss
is given by
LCV(M, D) = −
K∑
k=1
∑
i∈testk
log p(yi|Dk
, M) = −
N∑
i=1
log p(yi|D(i)
, M)
23. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
k-fold Cross Validation
en the k-fold Cross Validation estimate of the Predictive Loss
is given by
LCV(M, D) = −
K∑
k=1
∑
i∈testk
log p(yi|Dk
, M) = −
N∑
i=1
log p(yi|D(i)
, M)
To estimate the Predictive Error, we still need an estimate of the
Entropy of the true distribution. We can use the training loss of
the saturated model as a surrogate.
−
∫
pt(˜y) log pt(˜y)d˜y = −
1
n
n∑
i=1
log p(˙yi|D, Msaturated)
25. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
Data Set
Cooperative Congressional Election Survey 2006
N=30,000
71 social and political response outcomes
Deeply nested demographic variables, e.g., state, inc, edu, ethn,
gender et al.
26. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
Data Set Cont'd
Figure: A sample of the questions in CCES 2006 survey.
27. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
Model Setup
For demonstration, we only consider two demographic variables,
state and income, together with their interaction. e responses
are all yes-no binary outcomes.
Complete Pooling
πj1j2
= logit−1
(
βstt
j1
+ βinc
j2
)
No Pooling
πj1j2
= logit−1
(
βstt
j1
+ βinc
j2
+ βstt*inc
j1j2
)
Partial Pooling
πj1j2
= logit−1
(
βstt
j1
+ βinc
j2
+ βstt*inc
j1j2
)
βstt*inc
j1j2
∼ Φ(·)
28. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
k-fold Cross Validation Estimate
Due to computational constraints, we are using Maximum A
Posteriori plug-in estimate instead of full Bayesian estimate.
p(˜y|D, M) ≈ p(˜y|ˆπij(D), M)
29. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
k-fold Cross Validation Estimate
Due to computational constraints, we are using Maximum A
Posteriori plug-in estimate instead of full Bayesian estimate.
p(˜y|D, M) ≈ p(˜y|ˆπij(D), M)
en under the aforementioned setup, the Cross Validation
estimate of the Predictive Loss is
LCV(M,D)=− 1
N
∑K
k=1
∑
l∈testk
log p(yl|Dk,M)
=− 1
N
∑K
k=1
∑
i,j[y
testk
ij log ˆπij(Dtraink )+(n
testk
ij −y
testk
ij ) log(1−ˆπij(Dtraink ))]
=− 1
N
∑
i,j
∑K
k=1[log ˆπij(Dtraink )y
testk
ij +log(1−ˆπij(Dtraink ))(n
testk
ij −y
testk
ij )]
=− 1
N
∑
i,j
[
log ˆπij(Dtrain)yij+log(1−ˆπij(Dtrain))(nij−yij)
]
=−
∑
i,j
nij
N
[
log ˆπij(Dtrain)˜πij+log(1−ˆπij(Dtrain))(1−˜πij)
]
30. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
Calibration of Improvement
Let's suppose we only have one cell, with true proportion .4, and
the good model gives a posterior estimate of log proportion at
roughly log(0.41), and the lesser model gives a estimate of
log(0.44) or log(0.38).
en the Predictive Loss under the good model is
−[.4 ∗ log(.41) + .6 ∗ log(.59)] = 0.67322, and under the two
lesser models is −[.4 ∗ log(.44) + .6 ∗ log(.56)] = 0.67386 and
−[.4 ∗ log(.38) + .6 ∗ log(.62)] = 0.67628. We can see the
improvement of the Predictive Loss is between 0.0006 to 0.003.
Also, the lower bound is given by
−[.4 ∗ log(.4) + .6 ∗ log(.6)] = 0.67301, so the Predictive Error
of the good model is about 0.0002.
32. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
Cross Validation Results on All Outcomes
Responses (ordered by the lower bound)
EstimatedPredictiveError
0.01
0.02
0.03
0.04
0.05
10 20 30 40 50 60 70
models
complete pooling
partial pooling
no pooling
Figure: Measure of t (Estimated Predictive Error) for all response outcomes
in CCES 2006 survey data. Responses are ordered by the lower bound
(training loss of the saturated model). No Pooling model gives very bad t,
while Predictive Error of Partial Pooling is dominated by Complete Pooling,
but the differences seem small.
33. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
Compare Partial Pooling and Complete Pooling
In the previous gure, apparently No Pooling is doing very badly,
but the differences between Partial Pooling and Complete
Pooling seem small. We need to further calibrate them.
34. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
Compare Partial Pooling and Complete Pooling
In the previous gure, apparently No Pooling is doing very badly,
but the differences between Partial Pooling and Complete
Pooling seem small. We need to further calibrate them.
e summary of the differences between Partial Pooling and
Complete Pooling for all the outcomes is
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.0003405 0.0001821 0.0003827 0.0006041 0.0005630 0.0053770
35. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
Compare Partial Pooling and Complete Pooling
In the previous gure, apparently No Pooling is doing very badly,
but the differences between Partial Pooling and Complete
Pooling seem small. We need to further calibrate them.
e summary of the differences between Partial Pooling and
Complete Pooling for all the outcomes is
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.0003405 0.0001821 0.0003827 0.0006041 0.0005630 0.0053770
We can see that the improvement in terms of the Predictive Loss
indeed corresponds to some meaningful improvement in
prediction accuracy.
36. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
Simulations Based on Real Data
We want to explore how the structure of the multilevel models
affects the dynamics of the performance of different models.
Speci cally, we are interested in total sample size and how
balanced the cells are in terms of cell size.
37. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
Simulations Based on Real Data
We want to explore how the structure of the multilevel models
affects the dynamics of the performance of different models.
Speci cally, we are interested in total sample size and how
balanced the cells are in terms of cell size.
We generated simulated data sets based on the real data set, i.e.,
we use the estimated from the Multilevel model t of the real data
sets and enlarge the total sample size by 2, 3 and 4 times, either
keeping the original relative proportions (highly unequal) of
different cells or making the proportions roughly equal.
38. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
Simulation Results: Total Sample Size
Responses (ordered by the lower bound)
EstimatedPredictiveError
0.002
0.003
0.004
0.005
10 20 30 40 50 60 70
models
complete pooling
partial pooling
no pooling
Responses (ordered by the lower bound)
EstimatedPredictiveError
0.0020
0.0025
0.0030
0.0035
0.0040
0.0045
10 20 30 40 50 60 70
models
complete pooling
partial pooling
no pooling
Responses (ordered by the lower bound)
EstimatedPredictiveError
0.002
0.003
0.004
0.005
0.006
10 20 30 40 50 60 70
models
complete pooling
partial pooling
no pooling
Figure: Estimated Predictive Error of all response outcomes for
``augmented'' data sets.
39. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
Simulation Results: Total Sample Size on House Rep Vote
sample size
EstimatedPredictiveError
0.002
0.004
0.006
0.008
0.010
0.012
0.014
50000 100000 150000 200000
models
complete pooling
partial pooling
no pooling
Figure: Predictive Error of the three models as sample size grows. e
outcome under consideration is the Republican vote in the House election.
40. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
Simulation Results: Balancedness of the Structure
Responses (ordered by the lower bound)
EstimatedPredictiveError
0.010
0.015
0.020
0.025
0.030
10 20 30 40 50 60 70
models
complete pooling
partial pooling
no pooling
Figure: Measure of t (Predictive Error) for all responses, ordered by lower
bound. e data set is simulated from real data set, and has the same sample
size in total as the real data set, but keeping all demographic-geographic cells
balanced.
41. . . . . . .
Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results
Conclusions
Cross-validation is not a very sensitive instrument in comparing
multilevel models.
Careful calibrations are needed for better understanding of the
results.
We also explored how different aspects of the data set structure
affect the margin of improvement.