GHME 2013 Conference
Session: Dismod MR workshop
Date: June 18 2013
Presenter: Hannah Peterson
Institute:
Institute for Health Metrics and Evaluation (IHME),
University of Washington
9. Potential experimental frameworks
• Data collection
o Ideal
o Impractical
• Simulation
o Impossible to know true data distribution
• Out-of-sample cross validation
o Do not have to choose distribution
9
14. Out-of-sample predictive validity
• Randomly select 25% of
data to use as “test data”
• Fit the remaining 75% of
data (“training data”)
• Use fit to calculate statistics
for test data
14
15. Out-of-sample predictive validity
• Randomly select 25% of
data to use as “test data”
• Fit the remaining 75% of
data (“training data”)
• Use fit to calculate statistics
for test data
• For each distribution
• For 1000 test-train splits
• For each disease data set
15
18. Results
18
Percent of wins (%)
Distribution Bias MAE PC Total
Normal 22.1 20.6 34.6 25.7
Lognormal 29.7 13.0 36.5 26.4
Binomial 26.3 48.3 1.9 25.5
Negative-
binomial
21.9 18.1 27.1 22.4
19. Conclusions
• Choice of distribution doesn’t greatly influence results
• Best overall performance: lognormal distribution
o Contingent on method to adjust data whose value is 0
• Further investigate when each distribution performs best
o Dependent on number of covariates, priors, amount of data?
19
Global Burden of Disease Study 2010 (GBD)-huge endeavor to measure health loss from disease, injuries, and risk using the Disability Adjusted Life Year (DALY)-coarsely described in the this 18-step process-I am just going to focus on a small subsection, the calculation of DALYs for injuries and disease-further narrow focus to the calculation of YLDsfigure:Murray, Ezzati, et. al. 2013. “GBD 2010: design, definitions, and metrics”. The Lancet. 380(9859):2063-2066.
-YLDsmeasure morbidity, or years lived in less than full health-the YLD calculation needs age-specific prevalence estimates, for GBD, this means ---for 291 outcomes ---for 2 sexes---for 187 countries---for 3 years-however prevalence data is often less than ideal, -examples all available data in Western Europe for GDB2010 Study---sparse (fungal diseases) ---noisy (lower back pain) ---sparse and noisy (cannabis dependence data)-to calculate age-specific prevalence, used a tool called DisMod-MR
-DisMod-MR is designed to address missing data and inconsistency ---used epidemiologic data and covariate data to calculate the age-specific prevalence based on a negative-binomial distribution---assumes all epidemiological data follows a negative-binomial distribution-is it really the best distribution to model the epidemiologic data?figure: Vos, Flaxman, et. al. 2013. “Years lived with disability (YLDs) for 1160 sequelae of 289 diseases and injuries 1990-2010: a systematic analysis for the Global Burden of Disease Study 2010”. The Lancet. 380(9859):2163-2196.
Normal𝜇=𝑚𝑒𝑎𝑛𝜎=𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛-mathematically convenient-PROBLEM: allows negative estimates of prevalence, physiological impossibleNegative-binomial𝑁=𝑖𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙𝑠 𝑡𝑒𝑠𝑡𝑒𝑑𝑥=𝑡𝑒𝑠𝑡𝑒𝑑 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑝=𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑡𝑦discrete modeltransformation yields an overdispersion parameter which allows the standard deviation to vary
Lognormal𝜇=𝑚𝑒𝑎𝑛𝜎=𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛-bounds estimates at 0-PROBLEM: doesn’t allow prevalence to be 0---can’t take the log of 0-changed values of 0 to be 1 observation-other options would be to use an offset lognormal distribution-but somehow, have to work around estimates of 0Negative-binomial𝑁=𝑖𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙𝑠 𝑡𝑒𝑠𝑡𝑒𝑑𝑥=𝑡𝑒𝑠𝑡𝑒𝑑 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑝=𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑡𝑦discrete modeltransformation yields an overdispersion parameter which allows the standard deviation to vary
Binomial-which Dr. Flaxman already discussed-discrete model𝑁=𝑖𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙𝑠 𝑡𝑒𝑠𝑡𝑒𝑑𝑥=𝑡𝑒𝑠𝑡𝑒𝑑 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑝=𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑡𝑦Negative-binomial𝑁=𝑖𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙𝑠 𝑡𝑒𝑠𝑡𝑒𝑑𝑥=𝑡𝑒𝑠𝑡𝑒𝑑 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑝=𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑡𝑦discrete modeltransformation yields an overdispersion parameter which allows the standard deviation to vary
Negative-binomial𝑁=𝑖𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙𝑠 𝑡𝑒𝑠𝑡𝑒𝑑𝑥=𝑡𝑒𝑠𝑡𝑒𝑑 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑝=𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑡𝑦discrete modeltransformation yields an overdispersion parameter which allows the standard deviation to varyNegative-binomial𝑁=𝑖𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙𝑠 𝑡𝑒𝑠𝑡𝑒𝑑𝑥=𝑡𝑒𝑠𝑡𝑒𝑑 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑝=𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑡𝑦discrete modeltransformation yields an overdispersion parameter which allows the standard deviation to vary
Several ways to test which distribution is the best-ideal-data collection---actually go to country (region??) and measure age-specific prevalence---expensiveimpractical-simulation---great for testing, not for validation---problem: have to choose from what distribution the simulated data/measurements come------this is what we’re testing------simulation can showwhatever you want------impossible to know from what distribution measurement-out-of-sample cross validation---way to evaluate and compare distributions---shows how model performs in real life------can test out-of-sample predictive validity------don’t have to choose data distribution---concerns------unstable with sparse data-----------not just the epidemiologic data-----------also covariates and priors
This experiment-57 different disease data sets---met inclusion criteria of more than 4 prevalence points in western europe---not a birth-condition meaning prevalence data is only at age 0-restricted to Western EuropeTo explain out-of-sample cross validation usedan example from GBD2010fungal diseases
Randomly select 25% of data to withhold as test datatest data used to evaluate results
Test data is withheld from DisMod-MR
And the remaining data is fit
From the fit, these estimates are compared to the test dataThis comparison of the estimate to the test data is where the statistics are calculatedthe same test-train split fits are created for each of the distribution so we can make a comparison
-process repeated 1000 times with different test-train splits-repeated for 57 different disease data set---met inclusion criteria of more than 4 prevalence points in western europe---not a birth-condition meaning prevalence data is only at age 057 disease/injury conditions met this criteria
metrics that capture different aspects of model performanceWant a model that is precise, accurate, well-calibrated -precise (bias)---measures average difference between the test data and prediction-accurate (median absolute error-MAE)---measure of overall error---many small errors create one large number---sensitive to mean and scale---less sensitive to outliers-calibrated (percent coverage-PC)---calibrated, meaning that our estimates are in the correct range of values------if we aim for 95% uncertainty, we expect 95% of our estimates to be good------more than that and the model is over confident------less than that and the model isn’t very good---percent of time the uncertainty interval of the prediction contains the observation---sensitive to discrete distributionsto determine which distribution performed the best, counted the the winner for each disease data set and split
-for different metrics different distributions are superior---makes sense, since each distribution has it’s strengths and weaknesses---smallest bias: lognormal---minimum MAE: binomial---closest percent coverage: lognormal-concern about most frequent results and not raw numbers:---differences are small ------bias, ten-thousandths (E-4), average bias is negative binomial------mae, hundreds-overall winner: lognormal
-previously saw, distribution choice doesn’t greatly influence DisMod-MR’s estimates of age-specific prev-results differ by metric-Best overall performance: lognormal distribution---STRESS:Contingent on method to adjust data whose value is 0-Further investigate when each distribution performs best---Dependent on number of covariates, priors, amount of data?DisMod-MR is robust in that choice of distribution for epidemiological values does not greatly influence estimates, but one distribution performs the best most frequently