This document discusses the development of a new predictive risk model. It describes predicting future costs and emergency admissions using data from inpatient, A&E, outpatient and GP databases. It shows the tradeoff between predictive accuracy and number of patients flagged as risk scores are varied, with higher cutoffs improving positive predictive value but reducing sensitivity. The model achieved good discrimination with an ROC C statistic of 0.78.
4. WHAT I’M GOING TO TALK ABOUT
• What to predict?
– Costs
– Future admissions
– Something else
5. WHAT I’M GOING TO TALK ABOUT
• What to predict?
• How to assess performance of the model?
– Avoid “over-fitting”
– Positive predictive value (PPV)
– Sensitivity, receiver operating curve C statistic, etc.
6. WHAT I’M GOING TO TALK ABOUT
• What to predict?
• How to assess performance of the model?
• What data bases to use?
– Inpatient
– A&E
– Outpatient
– GP electronic medical records
– Social care information
–
–
7. WHAT I’M GOING TO TALK ABOUT
• What to predict?
• How to assess performance of the model?
• What data bases to use?
• What variables to use?
– Demographics
– Prior utilization (frequency/recentness)
– Prior cost
– Diagnostic history
– Test results (GP electronic medical records)
– Other stuff (missed appointments, unplanned A&E follow-up visits,
etc)
8. WHAT I’M GOING TO TALK ABOUT
• What to predict?
• How to assess performance of the model?
• What data bases to use?
• What variables to use?
• Who is in the denominator?
– Patients with prior emergency admission (PARR)
– Patients with any HES history (inpatient, A&E, outpatient)
– All registered patients
9. WHAT I’M GOING TO TALK ABOUT
• What to predict?
• How to assess performance of the model?
• What data bases to use?
• What variables to use?
• Who is in the denominator?
• Can you apply "national" models to local data?
– Do you have to develop your own local model
– Or can you use coefficients from national model
10. WHAT I’M GOING TO TALK ABOUT
• What to predict?
• How to assess performance of the model?
• What data bases to use?
• What variables to use?
• Who is in the denominator?
• Can you apply "national" models to local data?
• What interventions work best for high risk patients?
11. WHAT I’M GOING TO TALK ABOUT
• What to predict?
• How to assess performance of the model?
• What data bases to use?
• What variables to use?
• Who is in the denominator?
• Can you apply "national" models to local data?
• What interventions work best for high risk patients?
It really
doesn’t
matter
very much
12. WHAT I’M GOING TO TALK ABOUT
• What to predict?
• How to assess performance of the model?
• What data bases to use?
• What variables to use?
• Who is in the denominator?
• Can you apply "national" models to local data?
• What interventions work best for high risk patients?
I really have no idea
13. OUR CONTINUING WORK
• Data from convenience sample of five PCT areas
‒ Cornwall
‒ Croydon
‒ Kent
‒ Newham
‒ Redbridge
14. OUR CONTINUING WORK
• Data from convenience sample of five PCT areas
• Used four data sets
‒ SUS inpatient
‒ SUS A&E
‒ SUS outpatient
‒ GP electronic medical records
15. OUR CONTINUING WORK
• Data from convenience sample of five PCT areas
• Used four data sets
• Data are for period August 2007 – September 2010
‒ Looked back 2 years
‒ Predict admission in next 12 months with 2-month data lage
16. OUR CONTINUING WORK
• Data from convenience sample of five PCT areas
• Used four data sets
• Data are for period August 2007 – September 2010
• Modeling limited to patients age 18-95
20. HOW TO ASSESS PERFORMANCE
OF THE MODEL
• Avoid “over-fitting”
– Develop the model with a 50% sample
– Test the coefficients on the other half
21. HOW TO ASSESS PERFORMANCE
OF THE MODEL
• Avoid “over-fitting”
– Develop the model with a 50% sample
– Test the coefficients on the other half
• Remember there are tradeoffs between accuracy (as
measured by PPV) and number of patients “flagged” (as
measured by sensitivity)
22. TRADE-OFF BETWEEN
ACCURACY AND NUMBER OF CASES
FLAGGED
Cumulative at Cut-Off Level
RiskScore Flagged TruePos PPV Sensitivity
1-5 1,836,099 94,692 0.052 1.000
5-10 463,346 61,498 0.133 0.649
10-15 181,910 39,986 0.220 0.422
15-20 101,346 28,697 0.283 0.303
20-25 64,821 21,601 0.333 0.228
25-30 44,142 16,672 0.378 0.176
30-35 31,653 13,196 0.417 0.139
35-40 23,360 10,516 0.450 0.111
40-45 17,747 8,494 0.479 0.090
45-50 13,564 6,921 0.510 0.073
50-55 10,545 5,669 0.538 0.060
55-60 8,157 4,581 0.562 0.048
60-65 6,360 3,735 0.587 0.039
65-70 4,911 3,034 0.618 0.032
70-75 3,806 2,453 0.645 0.026
75-80 2,885 1,921 0.666 0.020
80-85 2,124 1,478 0.696 0.016
85-90 1,567 1,114 0.711 0.012
90-95 1,022 754 0.738 0.008
95-100 567 437 0.771 0.005
Top 1% 18,363 8,722 0.475 0.092
Top 5% 91,837 26,991 0.294 0.285
ROC C Statistic 0.780
These are results from the “full model”
- Inpatient
- A&E
- Outpatient
- GP electronic medical records
But comparable results for other models
- IP
- IP+A&E
- IP+A&E+OP
33. HOW TO ASSESS PERFORMANCE
OF THE MODEL
• Avoid “over-fitting”
– Develop the model with a 50% sample
– Test the coefficients on the other half
• Remember there are tradeoffs between accuracy (as
measured by PPV) and number of patients “flagged” (as
measured by sensitivity)
• For most users, accuracy is likely to be most important
(this is not a screening test to identify some dread disease)
– From a “business case” perspective, it is important not to
target patients who will not have a future admission
43. WHAT DATA BASES TO USE?
• HES (or SUS)
– Inpatient
– A&E
– Outpatient
• GP Electronic Medical Records
• Social care information
•
•
• How good is the data set?
• How hard is the data set to use?
• Does it improve case finding/accuracy?
44. WHAT DATA BASES TO USE?
• HES (or SUS) – Easy to obtain/relatively standardised
– Inpatient
– A&E
– Outpatient
• GP Electronic Medical Records
• Social care information
•
•
45. WHAT DATA BASES TO USE?
• HES (or SUS)
– Inpatient – Fairly rich/reasonably accurate/valuable DX/procedure info
– A&E
– Outpatient
• GP Electronic Medical Records
• Social care information
•
•
46. WHAT DATA BASES TO USE?
• HES (or SUS)
– Inpatient
– A&E – Not so standardised, DX info limited, some procedure info
– Outpatient
• GP Electronic Medical Records
• Social care information
•
•
47. WHAT DATA BASES TO USE?
• HES (or SUS)
– Inpatient
– A&E
– Outpatient – Very limited (visit volume, missed attendances, spec type)
• GP Electronic Medical Records
• Social care information
•
•
48. WHAT DATA BASES TO USE?
• HES (or SUS)
– Inpatient
– A&E
– Outpatient
• GP Electronic Medical Records
• Social care information
•
•
- Difficult to obtain (historically)
- Difficult to link (historically)
- Difficult to use - Read code
nightmares
- Not standardised
49. WHAT DATA BASES TO USE?
• HES (or SUS)
– Inpatient
– A&E
– Outpatient
• GP Electronic Medical Records
• Social care information
•
•
- Difficult to obtain (historically)
- Difficult to link (historically)
- Difficult to use
- Not standardised
51. WHAT DATA BASES TO USE?
IMPROVING CASE FINDING/ACCURACY
• Additional data sets do improve case finding
– Especially A&E and GP data
– But these improvements are modest
▪ 23% with full data sets at risk score 50+ cut-off
▪ 31% with full data sets at risk score 30+ cut-off
57. WHAT DATA BASES TO USE?
IMPROVING CASE FINDING/ACCURACY
• Additional data sets do improve case finding
– Especially A&E and GP data
– But these improvements are modest
▪ 23% with full data sets at risk score 50+ cut-off
▪ 31% with full data sets at risk score 30+ cut-off
• There is no loss in predictive accuracy with inclusion of
additional data sets
61. WHAT DATA BASES TO USE?
IMPROVING CASE FINDING/ACCURACY
• Additional data sets do improve case finding
– Especially A&E and GP data
– But these improvements are modest
▪ 23% with full data sets at risk score 50+ cut-off
▪ 31% with full data sets at risk score 30+ cut-off
• There is no loss in predictive accuracy with inclusion of
additional data sets
• Improved case finding is greatest for lower risk patients
using GP data
63. WHAT DATA BASES TO USE?
IMPROVING CASE FINDING/ACCURACY
• Additional data sets do improve case finding
– Especially A&E and GP data
– But these improvements are modest
▪ 23% with full data sets at risk score 50+ cut-off
▪ 31% with full data sets at risk score 30+ cut-off
• There is no loss in predictive accuracy with inclusion of
additional data sets
• Improved case finding is greatest for lower risk patients
using GP data
• But it is still difficult to identify patients early in any cycle of
emergency admissions (that first emergency admission)
64. WHAT DATA BASES TO USE?
IMPROVING CASE FINDING/ACCURACY
Proportion of Patients
With No Emergency Admissions Prior 2 Years
Risk Score Threshold
IP
Data
IPAE
Data
IPAEOP
Data
IPAEOPGP
Data
Risk Score 50+ 0.3% 1.2% 2.3% 3.2%
Risk Score 30+ 2.7% 4.4% 6.3% 12.4%
Top 1% 1.5% 2.9% 4.2% 6.5%
Top 5% 25.9% 26.4% 26.7% 30.8%
65. WHAT DATA BASES TO USE?
IMPROVING CASE FINDING/ACCURACY
Proportion of Patients
With No Emergency Admissions Prior 2 Years
Risk Score Threshold
IP
Data
IPAE
Data
IPAEOP
Data
IPAEOPGP
Data
Risk Score 50+ 0.3% 1.2% 2.3% 3.2%
Risk Score 30+ 2.7% 4.4% 6.3% 12.4%
Top 1% 1.5% 2.9% 4.2% 6.5%
Top 5% 25.9% 26.4% 26.7% 30.8%
Risk Score Cut-Off Level = 17
67. WHAT VARIABLES TO USE?
• Demographics
– Age
– Gender
– IMD (GP practice)
– Months registered at current GP practicre
68. WHAT VARIABLES TO USE?
• Demographics
• Inpatient data
– Number of emergency admissions for various periods prior two years
– Number of elective admissions for various periods prior year
– Any day case/night attendance prior year
– History of 16 diagnostic conditions (chronic) prior two years
– Charlson Index
69. WHAT VARIABLES TO USE?
• Demographics
• Inpatient data
– Number of emergency admissions for various periods prior two years
– Number of elective admissions for various periods prior year
– Any day case/night attendance prior year
– History of 16 diagnostic conditions (chronic) prior two years
– Charlson Index
Note that we did not use cost as a variable
- Can be difficult to obtain and apply
- Added little, if any, predictive power
70. WHAT VARIABLES TO USE?
• Demographics
• Inpatient data
• A&E data
– Number of A&E visits for various periods prior two years
– A&E procedures performed for various periods prior two years
– Unplanned follow-up A&E visits for various periods prior two years
71. WHAT VARIABLES TO USE?
• Demographics
• Inpatient data
• A&E data
• Outpatient data
– Number of outpatient visits for various periods prior two years
– Number of outpatient visits missed for various periods prior two years
72. WHAT VARIABLES TO USE?
• Demographics
• Inpatient data
• A&E data
• Outpatient data
• GP electronic medical records
– Number long term conditions, specific DX conditions, QOF registries
– Drug prescription history
– BMI, current smoker
– HbA1c, high blood pressure, glomerular filtration rate
– Number of GP visits
– Increase in GP visits last 12 months
– Number of phone consults last 90 days
73. WHAT VARIABLES TO USE?
A full list of variables used and their definitions
will available imminently at Nuffield website
75. WHO IS IN THE DENOMINATOR?
• Patients with prior emergency admissions [PARR]
• Patients with any HES history [inpatient, A&E, outpatient]
• All registered patients
76. WHO IS IN THE DENOMINATOR?
Demominator Flagged TruePos PPV
Risk Score Cut-Off 50+
Had Emergency Admission Prior 12 Months (PARR) 5,622 3,339 0.594
Any SUS record last 2 years 8,273 4,532 0.548
GP registry population July, 2009 9,892 5,172 0.523
Risk Score Cut-Off 30+
Had Emergency Admission Prior 12 Months (PARR) 19,058 8,818 0.463
Any SUS record last 2 years 24,833 10,671 0.430
GP registry population July, 2009 26,304 11,011 0.419
78. CAN YOU APPLY “NATIONAL” MODELS
TO LOCAL DATA?
• For each of the five sites:
– We created individual site models
– Created models using the other four sites, and then applied the 4-site
coefficients to local data
– Then compared case finding and accuracy
87. WHAT INTERVENTIONS WORK BEST
FOR HIGH RISK PATIENTS?
Well, actually I have plenty of ideas…
- just no strong evidence on much
88. WHAT TO DO
• Model development limitations
– Predict risks of expensive things you think you do something about
– Don’t stress too much about which data bases (but more are likely better)
– Recognize the trade-offs between model accuracy and sensitivity
89. WHAT TO DO
• Model development limitations
– Predict risks of expensive things you think you do something about
– Don’t stress too much about which data bases (but more are likely better)
– Recognize the trade-offs between model accuracy and sensitivity
• Intervention design
– Design the intervention after the risk model has been developed
– Use data from model development to help design the intervention
– Recognize you are probably going to need more information
– Get the incentives right
90. WHAT TO DO
• Model development limitations
– Predict risks of expensive things you think you do something about
– Don’t stress too much about which data bases (but more are likely better)
– Recognize the trade-offs between model accuracy and sensitivity
• Intervention design
– Design the intervention after the risk model has been developed
– Use data from model development to help design the intervention
– Recognize you are probably going to need more information
– Get the incentives right
• Intervention implementation
– Roll it out in at least quasi-experimental mode
– Track “dosage” levels (who does what to whom and how)
– Avoid enrollment criteria “leakage”
– Evaluate impact of the intervention as rigorously as possible
91. WHAT TO DO
• Model development limitations
– Predict risks of expensive things you think you do something about
– Don’t stress too much about which data bases (but more are likely better)
– Recognize the trade-offs between model accuracy and sensitivity
• Intervention design
– Design the intervention after the risk model has been developed
– Use data from model development to help design the intervention
– Recognize you are probably going to need more information
– Get the incentives right
• Intervention implementation
– Roll it out in at least quasi-experimental mode
– Track “dosage” levels (who does what to whom and how)
– Avoid enrollment criteria “leakage”
– Evaluate impact of the intervention as rigorously as possible
92. WHAT TO DO
• Model development limitations
– Predict risks of expensive things you think you do something about
– Don’t stress too much about which data bases (but more are likely better)
– Recognize the trade-offs between model accuracy and sensitivity
• Intervention design
– Design the intervention after the risk model has been developed
– Use data from model development to help design the intervention
– Recognize you are probably going to need more information
– Get the incentives right
• Intervention implementation
– Roll it out in at least quasi-experimental mode
– Track “dosage” levels (who does what to whom and how)
– Avoid enrollment criteria “leakage”
– Evaluate impact of the intervention as rigorously as possible