SlideShare une entreprise Scribd logo
1  sur  38
New Machine Learning Algorithms for
Electronic Health Records (EHRs)
David Page
Chair, Department of Biostatistics & Bioinformatics
Duke University
Acknowledgements
NIH BD2K, NLM, NIGMS, FDA
Office of Generic Drugs
Center for Predictive
Computational Phenotyping
Ross Kleiman
Charles Kuang
Xiayuan Huang
Wei Zhang
Collin Engstrom
Scott Hebbring
Paul Bennett
Yujia Bao
Jon Badger
Robert Elston
Guillerme Rosa
Aubrey Barnard
Michael Caldwell
Jesse Davis
Eric Lantz
Jie Liu
Houssam Nassif
Peggy Peissig
Vitor Santos Costa
Humberto Vidaillet
Jeremy Weiss
Becca Willett
Wisconsin Genomics Initiative
The Electronic Health Record (EHR)
ID Year of
Birth
Gender
P1 3.22.1963 M
ID Date Diagnosis Sign/Symptom
P1 6.2.1990 PVC Palpitations
Demographics
Diagnoses
The Electronic Health Record (EHR)
ID Date Diagnosis Symptoms
P1 2011.06.
02
Atrial
fibrillation
Dizzy,
discomfort
Demographics
Diagnoses
ID Date Diagnosis Sign/Symptom
P1 7.3.1997 Elevated BP
ID Year of
Birth
Gender
P1 3.22.1963 M
The Electronic Health Record (EHR)
ID Date Diagnosis Symptoms
P1 2011.06.
02
Atrial
fibrillation
Dizzy,
discomfort
Demographics
Diagnoses
ID Date Diagnosis Symptoms
P1 2011.06.
02
Atrial
fibrillation
Dizzy,
discomfort
ID Date Diagnosis Sign/Symptom
P1 9.1.1998 Atrial
Fibrillation
Shortness of
Breath, Panic
ID Year of
Birth
Gender
P1 3.22.1963 M
EHR Data in a Data Warehouse
Patient ID Gender Birthdate
P1 M 3/22/1963
Patient ID Date Physician Symptoms Diagnosis
P1 1/1/2001 Smith palpitations hypoglycemic
P1 2/1/2001 Jones fever, aches influenza
Patient ID Date Lab Test Result
P1 1/1/2001 blood glucose 42
P1 1/9/2001 blood glucose 45
Patient ID Date Observation Result
P1 1/1/2001 Height 5'11
P2 1/9/2001 BMI 34.5
Patient ID
Date
Prescribed Date Filled Physician Medication Dose Duration
P1 5/17/1998 5/18/1998 Jones Prilosec 10mg 3 months
Alternative View of EHR Data
Sample Input Data Set for Typical Machine Learning
Patient Gender Age Hypertension
within last
year
... Average
LDL last 5
years
Statin MI in
next 5
years
P1 F 32 No ... 120 No No
P2 F 45 Yes ... 154 No No
P3 M 24 No ... 136 No No
P4 M 58 Yes ... 210 No Yes
... ... ... ... ... ... ... ...
Predictive Personalized
Medicine
Personalized
Treatment
Individual
Patient
G + C + E
Predictive Model
for Disease
Susceptibility
& Treatment
ResponseState-of-the-Art
Machine
Learning
Genetic,
Clinical,
&
Environmental
Data
Wisconsin Genomics Initiative (WGI), 2008
Marshfield Clinic EMR
•Marshfield Clinic
−Health system in North
Central Wisconsin
•1.5M Patient Records
spanning 40 years
−Demographics
−Diagnoses (ICD-9)
−Labs
−Procedures
−Vitals
10
ROC Curves for Afib-related Predictions
(Davis et al., 2009; Lantz et al., 2012)
Figure 1. ROC Curves for our four target prediction tasks: predicting AF/F onset and, given AF/F onset, predicting
each of stroke, 1-year mortality, and 3-year mortality, all using only the roughly 25,000 features extracted from the
Motivation and Next Steps
• If we can predict atrial fibrillation accurately, in
advance, then we can intervene to prevent it in some
cases
−aspirin or stronger blood thinner
−more rest, less stress
−beta blocker to control BP, etc.
• Some other prediction tasks we’ve addressed:
− Myocardial infarction (heart attack)
− Warfarin dosing
− Age related macular degeneration
− Breast cancer (from mammograms and genetics)
• How accurately can everything be predicted?
• How far in advance?
High-Throughput ML (Kleiman et al.)
Predicting Every ICD Diagnosis Code at the Press of a Button
Simulated Prospective Study Results
14
Remember this View of EHR Data
Adverse Drug Events (ADEs)
Warfarin
Cox2 inhibitor
ACE inhibitor
Heart Attack
Angioedema
Bleeding
… …
OMOP Ground Truth and Marshfield Data
Drug Prescription Timeline from EHR (Becca Willett)
Drug Risk Windows (OMOP)
Multiple Self-Controlled Case Series
(Simpson et al., 2013)
MSCCS Details
MSCCS for Numeric Response (Kuang et al., KDD 2016)
Electronic Health Records (EHRs)
Person 1
200
160
Properties
Longitudinal
Diabetes
Drug 1
30
150
120
Drug 2
60
Observational
Applications
Age
Adverse Drug
Person 2
Bleeding
90 104 100
Drug 2
Age
Reaction (ADR)
discovery
Computational
Drug
Repositioning
(CDR)
Time-Varying Baseline (Kuang et al., IJCAI 2016)
Person 1
Diabetes
Insulin Insulin
Age
30 45 60
Time-Varying Baseline: add regularization to minimize change
in proximal, consecutive tij values:
yij | xij = tij + β
⊤xij + ϵij, ∼ N(0,σ2).ϵij
Last term penalizes consecutive baseline values
that are close together in time but far apart in
numerical value.
Last Year: Recovery of Known Glucose Lowering Agents
INDX CODE DRUG NAME SCORE COUNT INDX CODE DRUG NAME SCORE COUNT
1 4485 HUMALOG -11.786 124 1 4802 INSULIN 47 635
2 7470 PIOGLITAZONE HCL -10.220 3075 2 8316 REZULIN 50 120
3 8437 ROSIGLITAZONE MALEATE -9.731 1019 3 824 AVANDIA 59 449
4 4837 INSULN ASP PRT/INSULIN ASPART -9.658 258 4 416 AMARYL 65 503
5 6382 NEEDLES INSULIN DISPOSABLE -9.464 2827 5 5226 LANTUS 66 33
6 4171 GLUCOTROL XL -8.117 2853 6 5789 METFORMIN HYDROCHLORIDE 75 10
7 4106 GLIMEPIRIDE -7.940 3384 7 4485 HUMALOG 81 63
8 160 ACTOS -7.721 1125 8 4132 GLUCOPHAGE 86 1813
9 824 AVANDIA -6.802 1239 9 4811 INSULIN NPH 88 19
10 9152 SYRING W-NDL DISP INSUL 0.5ML -6.623 4186 10 144 ACTIGALL 90 34
11 4132 GLUCOPHAGE -6.322 6736 11 1389 CAL 90 45
12 4184 GLYBURIDE -6.021 8879 12 4171 GLUCOTROL XL 90 701
13 4170 GLUCOTROL -5.721 1259 13 9155 SYRNG W-NDL DISP INSUL 0.333ML 95 29
14 4208 GLYNASE -5.670 591 14 4116 GLUCAGON 97 121
15 416 AMARYL -5.599 2240 15 6652 NOVOLOG 98 51
16 4107 GLIPIZIDE -5.563 9993 16 160 ACTOS 106 480
17 844 AXID -4.682 189 17 6646 NOVOFINE 31 106 31
18 2830 DILTIAZEM -4.297 1021 18 4813 INSULIN NPL/INSULIN LISPRO 108 118
19 4806 INSULIN GLARGINE HUM.REC.ANLOG -4.175 4213 19 8437 ROSIGLITAZONE MALEATE 109 332
20 5787 METFORMIN HCL -4.147 19584 20 4170 GLUCOTROL 113 641
21 2824 DILAUDID -4.076 39 21 9889 URSODIOL 113 123
22 5786 METFORMIN -3.890 3838 22 5052 KAY CIEL 114 23
23 7731 PRAVACHOL -3.532 1700 23 4118 GLUCAGON HUMAN RECOMBINANT 115 227
24 1760 CELEXA -3.517 1473 24 2521 DARVOCET-N 116 11
25 4497 HUM INSULIN NPH/REG INSULIN HM -3.501 1829 25 7470 PIOGLITAZONE HCL 121 705
26 9889 URSODIOL -3.132 376 26 5786 METFORMIN 125 2149
27 4813 INSULIN NPL/INSULIN LISPRO -2.972 623 27 10366 ZINC SULFATE 130 34
28 4133 GLUCOPHAGE XR -2.845 765 28 4500 HUMULIN 135 33
29 6445 NEURONTIN -2.615 1418 29 4172 GLUCOVANCE 136 115
30 6656 NPH HUMAN INSULIN ISOPHANE -2.500 2874 30 7471 PIOGLITAZONE HCL/METFORMIN HCL 136 16
31 9379 THIAMINE HCL -2.383 341 31 6382 NEEDLES INSULIN DISPOSABLE 137 649
32 1636 CARDURA -2.198 1079 32 4184 GLYBURIDE 144 1354
33 1218 BLOOD SUGAR DIAGNOSTIC DRUM -2.073 2593 33 4208 GLYNASE 145 115
34 8025 PROZAC -2.037 1525 34 4210 GLYSET 148 7
35 8316 REZULIN -1.895 444 35 4163 GLUCOSE 159 1778
36 9136 SYRINGE & NEEDLE INSULIN 1 ML -1.885 3542 36 5977 MINIPRESS 163 19
37 4802 INSULIN -1.812 1526 37 7946 PROPANTHELINE 163 6
38 7674 POTASSIUM CHLORIDE -1.779 9842 38 1602 CARBOCAINE 182 63
39 4804 INSULIN ASPART -1.752 2476 39 1305 BUDEPRION SR 185 9
40 1200 BLOOD-GLUCOSE METER -1.719 5289 40 6657 NPH INSULIN 185 43
Time-Varying Baseline for Binary Response (Kuang et al.,
KDD 2017)
• Response variable: heart attack occurrence on a
particular day (binary)
• Features: drug exposure statuses on a particular day
(binary) + baseline occurrence rates
OMOP Ground Truth and Marshfield Data
Set
Experimental Results
Thus Far
• Charles Kuang: time-varying patient-specific baseline into
linear regression (KDD 2016), time-varying baseline into
Poisson regression (KDD 2017)
• Yujia Bao (with Becca Willett): patient-specific baseline into
point process models (MLHC 2017)
• Wei Zhang: patient-specific baseline into deep neural nets
• Sinong Geng and Charles Kuang: patient-specific baseline
into graphical models (temporal Poisson square root
models), ICML 2018
• Can we bring time-varying baselines into deep neural
networks or random forests? May need other types of
regularization, EM rather than convex optimization.
Major Challenge: Evaluation
• Real-world data: limited causal ground truth
– Just 9 true drug-event pair ADEs in OMOP
– Are we now overfitting OMOP ground truth?
– Some criticism of each ground truth available
• Synthetic data: self-fulfilling prophecy?
• Theory: very strong assumptions, proofs usually
about guaranteed correct causal inferences
Probably Approximately Correct (PAC)
learning [Valiant, CACM 1984]
• Consider a class C of possible target concepts defined over a set of
instances X of length n, and a learner L using hypothesis space H
• C is PAC learnable by L using H if, for all
c∈ C
probability distributions D over X
ε such that 0 < ε < 0.5
δ such that 0 < δ < 0.5
• learner L will, with probability at least (1-δ), output a hypothesis h ∈
H such that errorD(h) ≤ ε in time that is polynomial in
1/ε
1/δ
n
size(c)
Proposal: PAC Causal Inference
• Randomized Turing Machines M1 and M2 generate strings over A
• M2 sees string generated by M1 before generating its own
• Training protocol P governs what else learner L may be told and
what part of full training set it sees
• learner L then sees a test set T drawn entirely from either M1 or M2
(chosen uniformly), and must accurately identify the source of T
• Family F of pairs <M1, M2> is PPACC inferable under protocol P by
learner L if for all 0 < ε < 0.5, given time, training data, and test data
polynomial in 1/ε and any other relevant parameters of P, learner L
will, with probability at least (1-ε), correctly identify the source of the
test set T
• Intuition: M1 and M2 might differ only by one causal association
• Since M2 is a TM, it can learn; positive (negative) result for PPACC
inference is negative (positive) result for adversarial training (GAN)
Concrete Example
• Let M1 and M2 be causal models (Bayesian networks where edges
capture direct causation) over same set of variables
• M1 and M2 identical except that M2 drops one edge, from A to B
(definition allows hardest conditional probability settings)
• Different protocols P let us inherit different existing results
• Easy if A binary, random, uniform; influence of A on B in M1 is big
• If P provides to L the shared structure of M1 and M2 and leaves
open only the question of an edge from A to B, then for some
shared structures we can employ do-calculus using the test data
• If P properly constrains the shared structure of M1 and M2 including
the size of the largest separator-set, we can use the PC algorithm
on the test data only
A C
B
A C
B
Summary of Other Results So Far
• Positive result for self-controlled case series (SCCS) if EHR data
generated by piecewise-constant conditional intensity model
(PCIM) with different baseline risk of event (intensity) for each
patient, the only influences are of drugs on events, drugs and drug
effects are all independent of one another, and effects big enough
• Open: more complex dependencies, time-varying baseline,
baseline regularization
Summary of Other Results So Far
• Positive result for causal models where causal variable G is known
and has no parents (e.g., genetic variation such as FGXS
Premutation), and we want to know if there is any causal effect
representable by a PAC-learnable model class on observed features
• Practical utility: Movaghar et al., 2019 (Science Advances) showing
FGXS Premutation has clinical effect
• Requires ”Agnostic” variant like Agnostic PAC learning
• Can be extended to allow causal variable D with parents if in addition
we can partition remaining variables into ancestors and non-
ancestors of D (e.g., by date of measurement in EHR data), Pr(D) is
linear (e.g., linear propensity model for D), and protocol P also lets us
sample from IPW distribution based on learned approximation to this
model
Another Application
• My Health Insurance just switched me from a Brand to Generic drug
• Is this Generic safe and effective?
• In other words, is there a combination of clinical variables that occurs
more in patients after taking Generic than Brand?
• Generic vs. Brand should eliminate confounding by indication
• Using only switchers should also control for patient-specific
confounders, even if unmeasured or completely unconsidered
• The same patients are on brand, then generic, for the same indication
• The problem: in practice, time-varying confounding is rampant
• Example: generic drug becomes available in 2005; electronic
transmission of prescriptions starts in 2005: looks like generic drug
causes electronic transmission of prescription!
Two Possible Approaches
• Suppose everyone on the drug was on Brand for all 2004, then
switched to Generic for all 2005
• Run Supervised ML
• Patient is a Positive Example while on Generic (2005)
• Same Patient is a Negative Example while on Brand (2004)
• Take controls not on drug at all, and use to offset time-varying effects
• Patient is a Positive Example during 2004
• Patient is a Negative Example during 2005
• Alternative approach: Use controls to learn to predict 2005 vs. 2004
(temporal propensity), use IPW to weight brand and generic patients
• PPACC Inference model says the first approach needs four times as
much data and will fail at lower event rates than the second approach
• Synthetic data further supports this message
Conclusion
• Wide variety of clinical events can be
accurately predicted by ML from EHR, but…
• Models tempt us to wrong causal inferences
• Adding to ML latent patient-specific and time-
varying baselines greatly improves this
• We don’t demand perfection in prediction,
don’t demand it in causal inference, so…
• Need a PAC-like theory of causal inference
even more than of prediction (ground truth)

Contenu connexe

Tendances

The role of cilostazol for the treatment of
The role of cilostazol for the treatment ofThe role of cilostazol for the treatment of
The role of cilostazol for the treatment ofuvcd
 
Detect Perfect Protect: care of patients with Atrial Fibrillation across Wess...
Detect Perfect Protect: care of patients with Atrial Fibrillation across Wess...Detect Perfect Protect: care of patients with Atrial Fibrillation across Wess...
Detect Perfect Protect: care of patients with Atrial Fibrillation across Wess...Health Innovation Wessex
 
Antiplatelets and elderly patients
Antiplatelets and elderly patientsAntiplatelets and elderly patients
Antiplatelets and elderly patientsA.Salam Sharif
 
Myeloid Cells and Cardiovascular Health
Myeloid Cells and Cardiovascular HealthMyeloid Cells and Cardiovascular Health
Myeloid Cells and Cardiovascular HealthInsideScientific
 
Emgerency Malpractice Claims
Emgerency Malpractice ClaimsEmgerency Malpractice Claims
Emgerency Malpractice ClaimsSun Yai-Cheng
 
Stettler Windecker Meta Analysis 18k 10 20 07 V04
Stettler Windecker Meta Analysis 18k 10 20 07 V04Stettler Windecker Meta Analysis 18k 10 20 07 V04
Stettler Windecker Meta Analysis 18k 10 20 07 V04Jeff Budden
 

Tendances (7)

The role of cilostazol for the treatment of
The role of cilostazol for the treatment ofThe role of cilostazol for the treatment of
The role of cilostazol for the treatment of
 
Detect Perfect Protect: care of patients with Atrial Fibrillation across Wess...
Detect Perfect Protect: care of patients with Atrial Fibrillation across Wess...Detect Perfect Protect: care of patients with Atrial Fibrillation across Wess...
Detect Perfect Protect: care of patients with Atrial Fibrillation across Wess...
 
Antiplatelets and elderly patients
Antiplatelets and elderly patientsAntiplatelets and elderly patients
Antiplatelets and elderly patients
 
Myeloid Cells and Cardiovascular Health
Myeloid Cells and Cardiovascular HealthMyeloid Cells and Cardiovascular Health
Myeloid Cells and Cardiovascular Health
 
Strive Teleconf Presentation March22 2006
Strive Teleconf Presentation March22 2006Strive Teleconf Presentation March22 2006
Strive Teleconf Presentation March22 2006
 
Emgerency Malpractice Claims
Emgerency Malpractice ClaimsEmgerency Malpractice Claims
Emgerency Malpractice Claims
 
Stettler Windecker Meta Analysis 18k 10 20 07 V04
Stettler Windecker Meta Analysis 18k 10 20 07 V04Stettler Windecker Meta Analysis 18k 10 20 07 V04
Stettler Windecker Meta Analysis 18k 10 20 07 V04
 

Similaire à 2019 Triangle Machine Learning Day - Machine Learning from De-Identified Coded Electronic Health Records (EHRs) - David Page, September 20, 2019

Recherche clinique en cardiologie interventionnelle - Gilles MONTALESCOT - Re...
Recherche clinique en cardiologie interventionnelle - Gilles MONTALESCOT - Re...Recherche clinique en cardiologie interventionnelle - Gilles MONTALESCOT - Re...
Recherche clinique en cardiologie interventionnelle - Gilles MONTALESCOT - Re...PharmaSuccess
 
Meaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchMeaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchNolan Nichols
 
fhir and loinc
fhir and loincfhir and loinc
fhir and loincDevDays
 
Metabolomics in the 21st century - perspective
Metabolomics in the 21st century - perspectiveMetabolomics in the 21st century - perspective
Metabolomics in the 21st century - perspectiveDinesh Barupal
 
Endeavor IV-A Randomized Comparison of a Zotarolimus-Eluting Stent Endeavor...
Endeavor IV-A Randomized Comparison of a Zotarolimus-Eluting Stent 	 Endeavor...Endeavor IV-A Randomized Comparison of a Zotarolimus-Eluting Stent 	 Endeavor...
Endeavor IV-A Randomized Comparison of a Zotarolimus-Eluting Stent Endeavor...MedicineAndFamily
 
Kshivets O. Lung Cancer Surgery
Kshivets O. Lung Cancer SurgeryKshivets O. Lung Cancer Surgery
Kshivets O. Lung Cancer SurgeryOleg Kshivets
 
Elsevier Medical Graph – mit Machine Learning zu Precision Medicine
Elsevier Medical Graph – mit Machine Learning zu Precision MedicineElsevier Medical Graph – mit Machine Learning zu Precision Medicine
Elsevier Medical Graph – mit Machine Learning zu Precision MedicineRising Media Ltd.
 
Using a Quality Database in Private Practice
Using a Quality Database in Private PracticeUsing a Quality Database in Private Practice
Using a Quality Database in Private PracticeSpectrummedGrp
 
Plenary presentation saturday 11 7_dr. lucie bruijn
Plenary presentation  saturday 11 7_dr. lucie bruijnPlenary presentation  saturday 11 7_dr. lucie bruijn
Plenary presentation saturday 11 7_dr. lucie bruijnThe ALS Association
 
Professor Chris Williams et al - Healthcare condition monitoring using ICU data
Professor Chris Williams et al - Healthcare condition monitoring using ICU dataProfessor Chris Williams et al - Healthcare condition monitoring using ICU data
Professor Chris Williams et al - Healthcare condition monitoring using ICU dataBayes Nets meetup London
 
10 rencontres biomédicale LIR Faiez Zannad
10 rencontres biomédicale LIR Faiez Zannad10 rencontres biomédicale LIR Faiez Zannad
10 rencontres biomédicale LIR Faiez ZannadAssociation LIR
 
Maximising the value of routine NHS Data - Innovation Show
Maximising the value of routine NHS Data - Innovation ShowMaximising the value of routine NHS Data - Innovation Show
Maximising the value of routine NHS Data - Innovation ShowInnovation Agency
 
La mia esperienza innovativa è... la perfusione degli organi
La mia esperienza innovativa è... la perfusione degli organiLa mia esperienza innovativa è... la perfusione degli organi
La mia esperienza innovativa è... la perfusione degli organiNetwork Trapianti
 
Weaning from mechanical ventilation
Weaning from mechanical ventilationWeaning from mechanical ventilation
Weaning from mechanical ventilationSucharita Ray
 
2010 smg training_cardiff_day1_session1(2 of 3) altman
2010 smg training_cardiff_day1_session1(2 of 3) altman2010 smg training_cardiff_day1_session1(2 of 3) altman
2010 smg training_cardiff_day1_session1(2 of 3) altmanrgveroniki
 
Centros evaluacion de Riesgo Cardio Vascular en Mexico (rev 170912)
Centros evaluacion de Riesgo Cardio Vascular en Mexico (rev 170912)Centros evaluacion de Riesgo Cardio Vascular en Mexico (rev 170912)
Centros evaluacion de Riesgo Cardio Vascular en Mexico (rev 170912)Ernesto Diaz
 
NetBioSIG2013-KEYNOTE Esti Yeger-Lotem
NetBioSIG2013-KEYNOTE Esti Yeger-LotemNetBioSIG2013-KEYNOTE Esti Yeger-Lotem
NetBioSIG2013-KEYNOTE Esti Yeger-LotemAlexander Pico
 
Rutger Ploeg - The Netherlands - Tuesday 29 - Use of Perfusion Machines or ...
Rutger Ploeg  - The Netherlands  - Tuesday 29 - Use of Perfusion Machines or ...Rutger Ploeg  - The Netherlands  - Tuesday 29 - Use of Perfusion Machines or ...
Rutger Ploeg - The Netherlands - Tuesday 29 - Use of Perfusion Machines or ...incucai_isodp
 

Similaire à 2019 Triangle Machine Learning Day - Machine Learning from De-Identified Coded Electronic Health Records (EHRs) - David Page, September 20, 2019 (20)

Recherche clinique en cardiologie interventionnelle - Gilles MONTALESCOT - Re...
Recherche clinique en cardiologie interventionnelle - Gilles MONTALESCOT - Re...Recherche clinique en cardiologie interventionnelle - Gilles MONTALESCOT - Re...
Recherche clinique en cardiologie interventionnelle - Gilles MONTALESCOT - Re...
 
Meaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchMeaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine research
 
fhir and loinc
fhir and loincfhir and loinc
fhir and loinc
 
Metabolomics in the 21st century - perspective
Metabolomics in the 21st century - perspectiveMetabolomics in the 21st century - perspective
Metabolomics in the 21st century - perspective
 
Endeavor IV-A Randomized Comparison of a Zotarolimus-Eluting Stent Endeavor...
Endeavor IV-A Randomized Comparison of a Zotarolimus-Eluting Stent 	 Endeavor...Endeavor IV-A Randomized Comparison of a Zotarolimus-Eluting Stent 	 Endeavor...
Endeavor IV-A Randomized Comparison of a Zotarolimus-Eluting Stent Endeavor...
 
RNA (gene expression) analysis of Prostate cancers and non-cancerous tissues t
RNA (gene expression) analysis of Prostate cancers and non-cancerous tissues tRNA (gene expression) analysis of Prostate cancers and non-cancerous tissues t
RNA (gene expression) analysis of Prostate cancers and non-cancerous tissues t
 
Kshivets O. Lung Cancer Surgery
Kshivets O. Lung Cancer SurgeryKshivets O. Lung Cancer Surgery
Kshivets O. Lung Cancer Surgery
 
Elsevier Medical Graph – mit Machine Learning zu Precision Medicine
Elsevier Medical Graph – mit Machine Learning zu Precision MedicineElsevier Medical Graph – mit Machine Learning zu Precision Medicine
Elsevier Medical Graph – mit Machine Learning zu Precision Medicine
 
Using a Quality Database in Private Practice
Using a Quality Database in Private PracticeUsing a Quality Database in Private Practice
Using a Quality Database in Private Practice
 
Plenary presentation saturday 11 7_dr. lucie bruijn
Plenary presentation  saturday 11 7_dr. lucie bruijnPlenary presentation  saturday 11 7_dr. lucie bruijn
Plenary presentation saturday 11 7_dr. lucie bruijn
 
Photic2014
Photic2014Photic2014
Photic2014
 
Professor Chris Williams et al - Healthcare condition monitoring using ICU data
Professor Chris Williams et al - Healthcare condition monitoring using ICU dataProfessor Chris Williams et al - Healthcare condition monitoring using ICU data
Professor Chris Williams et al - Healthcare condition monitoring using ICU data
 
10 rencontres biomédicale LIR Faiez Zannad
10 rencontres biomédicale LIR Faiez Zannad10 rencontres biomédicale LIR Faiez Zannad
10 rencontres biomédicale LIR Faiez Zannad
 
Maximising the value of routine NHS Data - Innovation Show
Maximising the value of routine NHS Data - Innovation ShowMaximising the value of routine NHS Data - Innovation Show
Maximising the value of routine NHS Data - Innovation Show
 
La mia esperienza innovativa è... la perfusione degli organi
La mia esperienza innovativa è... la perfusione degli organiLa mia esperienza innovativa è... la perfusione degli organi
La mia esperienza innovativa è... la perfusione degli organi
 
Weaning from mechanical ventilation
Weaning from mechanical ventilationWeaning from mechanical ventilation
Weaning from mechanical ventilation
 
2010 smg training_cardiff_day1_session1(2 of 3) altman
2010 smg training_cardiff_day1_session1(2 of 3) altman2010 smg training_cardiff_day1_session1(2 of 3) altman
2010 smg training_cardiff_day1_session1(2 of 3) altman
 
Centros evaluacion de Riesgo Cardio Vascular en Mexico (rev 170912)
Centros evaluacion de Riesgo Cardio Vascular en Mexico (rev 170912)Centros evaluacion de Riesgo Cardio Vascular en Mexico (rev 170912)
Centros evaluacion de Riesgo Cardio Vascular en Mexico (rev 170912)
 
NetBioSIG2013-KEYNOTE Esti Yeger-Lotem
NetBioSIG2013-KEYNOTE Esti Yeger-LotemNetBioSIG2013-KEYNOTE Esti Yeger-Lotem
NetBioSIG2013-KEYNOTE Esti Yeger-Lotem
 
Rutger Ploeg - The Netherlands - Tuesday 29 - Use of Perfusion Machines or ...
Rutger Ploeg  - The Netherlands  - Tuesday 29 - Use of Perfusion Machines or ...Rutger Ploeg  - The Netherlands  - Tuesday 29 - Use of Perfusion Machines or ...
Rutger Ploeg - The Netherlands - Tuesday 29 - Use of Perfusion Machines or ...
 

Plus de The Statistical and Applied Mathematical Sciences Institute

Plus de The Statistical and Applied Mathematical Sciences Institute (20)

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
 
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
 
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
 
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
 
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
 
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
 
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
 
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
 
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
 
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
 
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
 
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
 
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
 
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
 
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 

Dernier

Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 

Dernier (20)

Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 

2019 Triangle Machine Learning Day - Machine Learning from De-Identified Coded Electronic Health Records (EHRs) - David Page, September 20, 2019

  • 1. New Machine Learning Algorithms for Electronic Health Records (EHRs) David Page Chair, Department of Biostatistics & Bioinformatics Duke University
  • 2. Acknowledgements NIH BD2K, NLM, NIGMS, FDA Office of Generic Drugs Center for Predictive Computational Phenotyping Ross Kleiman Charles Kuang Xiayuan Huang Wei Zhang Collin Engstrom Scott Hebbring Paul Bennett Yujia Bao Jon Badger Robert Elston Guillerme Rosa Aubrey Barnard Michael Caldwell Jesse Davis Eric Lantz Jie Liu Houssam Nassif Peggy Peissig Vitor Santos Costa Humberto Vidaillet Jeremy Weiss Becca Willett Wisconsin Genomics Initiative
  • 3. The Electronic Health Record (EHR) ID Year of Birth Gender P1 3.22.1963 M ID Date Diagnosis Sign/Symptom P1 6.2.1990 PVC Palpitations Demographics Diagnoses
  • 4. The Electronic Health Record (EHR) ID Date Diagnosis Symptoms P1 2011.06. 02 Atrial fibrillation Dizzy, discomfort Demographics Diagnoses ID Date Diagnosis Sign/Symptom P1 7.3.1997 Elevated BP ID Year of Birth Gender P1 3.22.1963 M
  • 5. The Electronic Health Record (EHR) ID Date Diagnosis Symptoms P1 2011.06. 02 Atrial fibrillation Dizzy, discomfort Demographics Diagnoses ID Date Diagnosis Symptoms P1 2011.06. 02 Atrial fibrillation Dizzy, discomfort ID Date Diagnosis Sign/Symptom P1 9.1.1998 Atrial Fibrillation Shortness of Breath, Panic ID Year of Birth Gender P1 3.22.1963 M
  • 6. EHR Data in a Data Warehouse Patient ID Gender Birthdate P1 M 3/22/1963 Patient ID Date Physician Symptoms Diagnosis P1 1/1/2001 Smith palpitations hypoglycemic P1 2/1/2001 Jones fever, aches influenza Patient ID Date Lab Test Result P1 1/1/2001 blood glucose 42 P1 1/9/2001 blood glucose 45 Patient ID Date Observation Result P1 1/1/2001 Height 5'11 P2 1/9/2001 BMI 34.5 Patient ID Date Prescribed Date Filled Physician Medication Dose Duration P1 5/17/1998 5/18/1998 Jones Prilosec 10mg 3 months
  • 8. Sample Input Data Set for Typical Machine Learning Patient Gender Age Hypertension within last year ... Average LDL last 5 years Statin MI in next 5 years P1 F 32 No ... 120 No No P2 F 45 Yes ... 154 No No P3 M 24 No ... 136 No No P4 M 58 Yes ... 210 No Yes ... ... ... ... ... ... ... ...
  • 9. Predictive Personalized Medicine Personalized Treatment Individual Patient G + C + E Predictive Model for Disease Susceptibility & Treatment ResponseState-of-the-Art Machine Learning Genetic, Clinical, & Environmental Data Wisconsin Genomics Initiative (WGI), 2008
  • 10. Marshfield Clinic EMR •Marshfield Clinic −Health system in North Central Wisconsin •1.5M Patient Records spanning 40 years −Demographics −Diagnoses (ICD-9) −Labs −Procedures −Vitals 10
  • 11. ROC Curves for Afib-related Predictions (Davis et al., 2009; Lantz et al., 2012) Figure 1. ROC Curves for our four target prediction tasks: predicting AF/F onset and, given AF/F onset, predicting each of stroke, 1-year mortality, and 3-year mortality, all using only the roughly 25,000 features extracted from the
  • 12. Motivation and Next Steps • If we can predict atrial fibrillation accurately, in advance, then we can intervene to prevent it in some cases −aspirin or stronger blood thinner −more rest, less stress −beta blocker to control BP, etc. • Some other prediction tasks we’ve addressed: − Myocardial infarction (heart attack) − Warfarin dosing − Age related macular degeneration − Breast cancer (from mammograms and genetics) • How accurately can everything be predicted? • How far in advance?
  • 13. High-Throughput ML (Kleiman et al.) Predicting Every ICD Diagnosis Code at the Press of a Button
  • 15. Remember this View of EHR Data
  • 16. Adverse Drug Events (ADEs) Warfarin Cox2 inhibitor ACE inhibitor Heart Attack Angioedema Bleeding … …
  • 17. OMOP Ground Truth and Marshfield Data
  • 18. Drug Prescription Timeline from EHR (Becca Willett)
  • 20. Multiple Self-Controlled Case Series (Simpson et al., 2013)
  • 22. MSCCS for Numeric Response (Kuang et al., KDD 2016) Electronic Health Records (EHRs) Person 1 200 160 Properties Longitudinal Diabetes Drug 1 30 150 120 Drug 2 60 Observational Applications Age Adverse Drug Person 2 Bleeding 90 104 100 Drug 2 Age Reaction (ADR) discovery Computational Drug Repositioning (CDR)
  • 23. Time-Varying Baseline (Kuang et al., IJCAI 2016) Person 1 Diabetes Insulin Insulin Age 30 45 60 Time-Varying Baseline: add regularization to minimize change in proximal, consecutive tij values: yij | xij = tij + β ⊤xij + ϵij, ∼ N(0,σ2).ϵij Last term penalizes consecutive baseline values that are close together in time but far apart in numerical value.
  • 24. Last Year: Recovery of Known Glucose Lowering Agents INDX CODE DRUG NAME SCORE COUNT INDX CODE DRUG NAME SCORE COUNT 1 4485 HUMALOG -11.786 124 1 4802 INSULIN 47 635 2 7470 PIOGLITAZONE HCL -10.220 3075 2 8316 REZULIN 50 120 3 8437 ROSIGLITAZONE MALEATE -9.731 1019 3 824 AVANDIA 59 449 4 4837 INSULN ASP PRT/INSULIN ASPART -9.658 258 4 416 AMARYL 65 503 5 6382 NEEDLES INSULIN DISPOSABLE -9.464 2827 5 5226 LANTUS 66 33 6 4171 GLUCOTROL XL -8.117 2853 6 5789 METFORMIN HYDROCHLORIDE 75 10 7 4106 GLIMEPIRIDE -7.940 3384 7 4485 HUMALOG 81 63 8 160 ACTOS -7.721 1125 8 4132 GLUCOPHAGE 86 1813 9 824 AVANDIA -6.802 1239 9 4811 INSULIN NPH 88 19 10 9152 SYRING W-NDL DISP INSUL 0.5ML -6.623 4186 10 144 ACTIGALL 90 34 11 4132 GLUCOPHAGE -6.322 6736 11 1389 CAL 90 45 12 4184 GLYBURIDE -6.021 8879 12 4171 GLUCOTROL XL 90 701 13 4170 GLUCOTROL -5.721 1259 13 9155 SYRNG W-NDL DISP INSUL 0.333ML 95 29 14 4208 GLYNASE -5.670 591 14 4116 GLUCAGON 97 121 15 416 AMARYL -5.599 2240 15 6652 NOVOLOG 98 51 16 4107 GLIPIZIDE -5.563 9993 16 160 ACTOS 106 480 17 844 AXID -4.682 189 17 6646 NOVOFINE 31 106 31 18 2830 DILTIAZEM -4.297 1021 18 4813 INSULIN NPL/INSULIN LISPRO 108 118 19 4806 INSULIN GLARGINE HUM.REC.ANLOG -4.175 4213 19 8437 ROSIGLITAZONE MALEATE 109 332 20 5787 METFORMIN HCL -4.147 19584 20 4170 GLUCOTROL 113 641 21 2824 DILAUDID -4.076 39 21 9889 URSODIOL 113 123 22 5786 METFORMIN -3.890 3838 22 5052 KAY CIEL 114 23 23 7731 PRAVACHOL -3.532 1700 23 4118 GLUCAGON HUMAN RECOMBINANT 115 227 24 1760 CELEXA -3.517 1473 24 2521 DARVOCET-N 116 11 25 4497 HUM INSULIN NPH/REG INSULIN HM -3.501 1829 25 7470 PIOGLITAZONE HCL 121 705 26 9889 URSODIOL -3.132 376 26 5786 METFORMIN 125 2149 27 4813 INSULIN NPL/INSULIN LISPRO -2.972 623 27 10366 ZINC SULFATE 130 34 28 4133 GLUCOPHAGE XR -2.845 765 28 4500 HUMULIN 135 33 29 6445 NEURONTIN -2.615 1418 29 4172 GLUCOVANCE 136 115 30 6656 NPH HUMAN INSULIN ISOPHANE -2.500 2874 30 7471 PIOGLITAZONE HCL/METFORMIN HCL 136 16 31 9379 THIAMINE HCL -2.383 341 31 6382 NEEDLES INSULIN DISPOSABLE 137 649 32 1636 CARDURA -2.198 1079 32 4184 GLYBURIDE 144 1354 33 1218 BLOOD SUGAR DIAGNOSTIC DRUM -2.073 2593 33 4208 GLYNASE 145 115 34 8025 PROZAC -2.037 1525 34 4210 GLYSET 148 7 35 8316 REZULIN -1.895 444 35 4163 GLUCOSE 159 1778 36 9136 SYRINGE & NEEDLE INSULIN 1 ML -1.885 3542 36 5977 MINIPRESS 163 19 37 4802 INSULIN -1.812 1526 37 7946 PROPANTHELINE 163 6 38 7674 POTASSIUM CHLORIDE -1.779 9842 38 1602 CARBOCAINE 182 63 39 4804 INSULIN ASPART -1.752 2476 39 1305 BUDEPRION SR 185 9 40 1200 BLOOD-GLUCOSE METER -1.719 5289 40 6657 NPH INSULIN 185 43
  • 25. Time-Varying Baseline for Binary Response (Kuang et al., KDD 2017) • Response variable: heart attack occurrence on a particular day (binary) • Features: drug exposure statuses on a particular day (binary) + baseline occurrence rates
  • 26. OMOP Ground Truth and Marshfield Data Set
  • 28. Thus Far • Charles Kuang: time-varying patient-specific baseline into linear regression (KDD 2016), time-varying baseline into Poisson regression (KDD 2017) • Yujia Bao (with Becca Willett): patient-specific baseline into point process models (MLHC 2017) • Wei Zhang: patient-specific baseline into deep neural nets • Sinong Geng and Charles Kuang: patient-specific baseline into graphical models (temporal Poisson square root models), ICML 2018 • Can we bring time-varying baselines into deep neural networks or random forests? May need other types of regularization, EM rather than convex optimization.
  • 29. Major Challenge: Evaluation • Real-world data: limited causal ground truth – Just 9 true drug-event pair ADEs in OMOP – Are we now overfitting OMOP ground truth? – Some criticism of each ground truth available • Synthetic data: self-fulfilling prophecy? • Theory: very strong assumptions, proofs usually about guaranteed correct causal inferences
  • 30. Probably Approximately Correct (PAC) learning [Valiant, CACM 1984] • Consider a class C of possible target concepts defined over a set of instances X of length n, and a learner L using hypothesis space H • C is PAC learnable by L using H if, for all c∈ C probability distributions D over X ε such that 0 < ε < 0.5 δ such that 0 < δ < 0.5 • learner L will, with probability at least (1-δ), output a hypothesis h ∈ H such that errorD(h) ≤ ε in time that is polynomial in 1/ε 1/δ n size(c)
  • 31. Proposal: PAC Causal Inference • Randomized Turing Machines M1 and M2 generate strings over A • M2 sees string generated by M1 before generating its own • Training protocol P governs what else learner L may be told and what part of full training set it sees • learner L then sees a test set T drawn entirely from either M1 or M2 (chosen uniformly), and must accurately identify the source of T • Family F of pairs <M1, M2> is PPACC inferable under protocol P by learner L if for all 0 < ε < 0.5, given time, training data, and test data polynomial in 1/ε and any other relevant parameters of P, learner L will, with probability at least (1-ε), correctly identify the source of the test set T • Intuition: M1 and M2 might differ only by one causal association • Since M2 is a TM, it can learn; positive (negative) result for PPACC inference is negative (positive) result for adversarial training (GAN)
  • 32. Concrete Example • Let M1 and M2 be causal models (Bayesian networks where edges capture direct causation) over same set of variables • M1 and M2 identical except that M2 drops one edge, from A to B (definition allows hardest conditional probability settings) • Different protocols P let us inherit different existing results • Easy if A binary, random, uniform; influence of A on B in M1 is big • If P provides to L the shared structure of M1 and M2 and leaves open only the question of an edge from A to B, then for some shared structures we can employ do-calculus using the test data • If P properly constrains the shared structure of M1 and M2 including the size of the largest separator-set, we can use the PC algorithm on the test data only A C B A C B
  • 33. Summary of Other Results So Far • Positive result for self-controlled case series (SCCS) if EHR data generated by piecewise-constant conditional intensity model (PCIM) with different baseline risk of event (intensity) for each patient, the only influences are of drugs on events, drugs and drug effects are all independent of one another, and effects big enough • Open: more complex dependencies, time-varying baseline, baseline regularization
  • 34. Summary of Other Results So Far • Positive result for causal models where causal variable G is known and has no parents (e.g., genetic variation such as FGXS Premutation), and we want to know if there is any causal effect representable by a PAC-learnable model class on observed features • Practical utility: Movaghar et al., 2019 (Science Advances) showing FGXS Premutation has clinical effect • Requires ”Agnostic” variant like Agnostic PAC learning • Can be extended to allow causal variable D with parents if in addition we can partition remaining variables into ancestors and non- ancestors of D (e.g., by date of measurement in EHR data), Pr(D) is linear (e.g., linear propensity model for D), and protocol P also lets us sample from IPW distribution based on learned approximation to this model
  • 35. Another Application • My Health Insurance just switched me from a Brand to Generic drug • Is this Generic safe and effective? • In other words, is there a combination of clinical variables that occurs more in patients after taking Generic than Brand? • Generic vs. Brand should eliminate confounding by indication • Using only switchers should also control for patient-specific confounders, even if unmeasured or completely unconsidered • The same patients are on brand, then generic, for the same indication • The problem: in practice, time-varying confounding is rampant • Example: generic drug becomes available in 2005; electronic transmission of prescriptions starts in 2005: looks like generic drug causes electronic transmission of prescription!
  • 36. Two Possible Approaches • Suppose everyone on the drug was on Brand for all 2004, then switched to Generic for all 2005 • Run Supervised ML • Patient is a Positive Example while on Generic (2005) • Same Patient is a Negative Example while on Brand (2004) • Take controls not on drug at all, and use to offset time-varying effects • Patient is a Positive Example during 2004 • Patient is a Negative Example during 2005 • Alternative approach: Use controls to learn to predict 2005 vs. 2004 (temporal propensity), use IPW to weight brand and generic patients • PPACC Inference model says the first approach needs four times as much data and will fail at lower event rates than the second approach • Synthetic data further supports this message
  • 37.
  • 38. Conclusion • Wide variety of clinical events can be accurately predicted by ML from EHR, but… • Models tempt us to wrong causal inferences • Adding to ML latent patient-specific and time- varying baselines greatly improves this • We don’t demand perfection in prediction, don’t demand it in causal inference, so… • Need a PAC-like theory of causal inference even more than of prediction (ground truth)