ELECTRONIC PHENOTYPING WITH APHRODITE

Electronic phenotyping with
APHRODITE and the Observational
Health Sciences and Informatics
(OHDSI) data network
S16: Papers - EHR-based Phenotyping
2017 Joint Summits on Translational Science,
March 28th, 2017
J U A N M . B A N D A , P H D 1 , Y O N I H A L P E R N , P H D 2 , D AV I D S O N T A G , P H D 3 , N I G A M
H . S H A H , P H D 1
1 S T A N F O R D C E N T E R F O R B I O M E D I C A L I N F O R M A T I C S R E S E A R C H , S T A N F O R D , C A ;
2 N E W Y O R K U N I V . , N E W Y O R K , N Y ;
3 M I T , C A M B R I D G E , M A

DISCLOSURE:
SPEAKER DISCLOSES THAT HE HAS NO RELATIONSHIPS WITH COMMERCIAL
INTERESTS.
1
Electronic phenotyping with APHRODITE and the
Observational Health Sciences and Informatics
(OHDSI) data network

Motivation
- Phenotyping is the beginning step to almost every biomedical study, yet
in many cases is very complicated
- Popular modern approaches are rule-based which take very long to
develop
- Can we build a statistical model which provides performance
comparable to a consensus definition and requires less development
effort? …. Turns out we can! (Agarwal V, et al. and Halpern Y, et al.)
- Can we port these two approaches to the Observational Health
Sciences and Informatics (OHDSI) tech stack?
2
Agarwal V, et al. Learning statistical models of phenotypes using noisy labeled training data. JAMIA. 2016. 23 (6), 1166-1173
Halpern Y, et al. Electronic medical record phenotyping using the anchor and learn framework. JAMIA 2016. 23 (4): 731-740

Automated PHenotype Routine for Observational
Definition, Identification, Training and Evaluation
(APHRODITE)
- Framework is designed to enable phenotyping via supervised (or semi-
supervised) learning of phenotype models
- Reads patient data from the OHDSI/OMOP CDM version 5
- Combines two recently published phenotyping approaches:
- XPRESS (Agarwal V, et al.)
- Achor learning (Halpern Y, et al.)
- In addition, to enable sharing and reproducibility of the underlying
phenotype 'recipes', APHRODITE allows sharing of either the trained
model or sharing of the configuration settings and anchor selections
across multiple sites
3

4
+140 collaborators
16 countries
Common CDM and
vocabulary
The patient network available includes 84 databases, both clinical and claims,
totaling over 650 million patients

OHDSI /OMOP CDMv5
APHRODITE Overview
5

Step 1 – Build Keyword List
Function call:
wordLists <- buildKeywordList(conn, aphrodite_concept_name, cdmSchema, dbms)
Generated keywords for “Myocardial Infarction”
Vocabulary v5
6

Step 2 – Find Cases and Controls
CDMv5
Function call:
casesANDcontrolspatient_ids_df<- getdPatientCohort(conn, dbms,
as.character(keywordList_FF$V3),as.character(ignoreList_FF$V3), cdmSchema,nCases,nControls)
7

Step 3 – Extract Patient Data
CDMv5
Function calls:
dataFcases <- getPatientData(conn, dbms, cases, as.character(ignoreList_FF$V3), flag, cdmSchema)
dataFcontrols <- getPatientData(conn, dbms, controls, as.character(ignoreList_FF$V3), flag, cdmSchema)
8

Step 4 – Create Feature Vector
Function calls:
fv_all<-buildFeatureVector(flag, dataFcases,dataFcontrols)
fv_full_data <- combineFeatureVectors(flag, data.frame(cases), controls, fv_all, outcomeName)
9

Step 5 – Build Model
Function calls:
model_predictors <- buildModel(flag, fv_full_data, outcomeName, folder)
model<-model_predictors$model
predictorsNames<-model_predictors$predictorsNames
10

Step 6 – Get Evaluation Results
11

How do we know ‘noisy labeling’ works?
Using the evaluation of Agarwal V., et al.:
- Selected one phenotype each from defined by the Electronic Medical
Records and Genomics (eMERGE) and the Observational Medical
Outcomes Partnership (OMOP) initiatives:
- Myocardial infarction
- Type 2 diabetes mellitus
- Created manually reviewed gold standard of cases and controls
Agarwal V., et al. Learning statistical models of phenotypes using noisy labeled training data. JAMIA. 2016. 23 (6), 1166-1173
Cases Controls Accuracy Recall PPV Cases Controls Accuracy Recall PPV
Source Myocardial Infarction (MI) Type 2 Diabetes Mellitus (T2DM)
OMOP / PheKB
Definition 94 94 0.87 0.91 0.84 152 152 0.92 0.88 0.96
XPRESS Noisy
labels 94 94 0.85 0.93 0.8 152 152 0.89 0.99 0.81
APHRODITE
Noisy labels 94 94 0.94 0.87 1.00 152 152 0.91 0.98 0.87
12

Building phenotype models with APHRODITE
Using the following keywords
Performance of noisly labeled trained models
Performance of gold-standard test set
Myocardial Infarction (MI) Type 2 diabetes mellitus (T2DM)
Old myocardial infarction Type 2 diabetes mellitus with hyperosmolar coma
True posterior myocardial infarction Type 2 diabetes mellitus
Myocardial infarction with complication Pre-existing type 2 diabetes mellitus
Myocardial infarction in recovery phase
Type 2 diabetes mellitus with multiple
complications
Microinfarct of heart Type 2 diabetes mellitus in non-obese
Cases Cont. Acc. Recall PPV Acc. Recall PPV
Myocardial Infarction (MI) Type 2 Diabetes Mellitus (T2DM)
XPRESS 750 750 0.86 0.89 0.84 0.88 0.89 0.87
APHRODITE 750 750 0.9 0.92 0.89 0.89 0.92 0.87
APHRODITE 1,500 1,500 0.9 0.93 0.9 0.91 0.93 0.88
APHRODITE 10,000 10,000 0.91 0.93 0.91 0.92 0.94 0.89
Cases Cont. Acc. Recall PPV Cases Cont. Acc. Recall PPV
OMOP/PheKB 94 94 0.87 0.91 0.84 152 152 0.92 0.88 0.96
XPRESS 94 94 0.89 0.93 0.86 152 152 0.89 0.88 0.9
APHRODITE
(750) 94 94 0.91 0.93 0.90 152 152 0.91 0.95 0.88
APHRODITE
(1,500) 94 94 0.92 0.93 0.91 152 152 0.92 0.95 0.89
APHRODITE
(10,000) 94 94 0.92 0.94 0.91 152 152 0.93 0.96 0.89
13

15
Top 15 features for T2DM and MI
Type 2 diabetes mellitus (T2DM) Myocardial Infarction (MI)
Rank Feature Concept Name Concept Vocabulary Rank Feature Concept Name
Concept
Vocabulary
1 drugEx:1503297 Metformin RxNorm 1 obs:40398391 Hypertensive disease SNOMED
2 obs: 433736 Obesity SNOMED 2 obs:4243768 Auscultation SNOMED
3 obs:16890009 Insulin measurement SNOMED 3 obs:4342792 Infarction NDFRT
4 obs:434005 Morbid Obesity SNOMED 4 obs:4191705
Coronary heart disease
review SNOMED
5 lab:4149519:NA Glucose measurement SNOMED 5 obs:4167217
Family history of clinical
finding SNOMED
6 drugEx:253182 Regular Insulin, Human RxNorm 6 obs:4282140 Chewable tablet SNOMED
7 drugEx:1112807 Aspirin RxNorm 7 drugEx:1112807 Aspirin RxNorm
8 lab: 4144235:NA
Glucose measurement,
blood SNOMED 8 obs:4325480 Aspirin NDFRT
9 obs:45889912 Hemoglobin SNOMED 9 lab:3002030 Lymphocytes % LOINC
10 obs:45889669 Insulin CPT4 10 lab:3014576 Chloride serum/plasma LOINC
11 lab:3004410 Hemoglobin A1c (Glycated) LOINC 11 obs:4329041 Pain SNOMED
12 obs: 40398391 Hypertensive disease SNOMED 12 lab:40789893 Potassium | Bld-Ser-Plas LOINC
13 obs:1503297 Metformin RxNorm 13 obs:433595 Edema SNOMED
14 obs:4325480 Aspirin NDFRT 14 visit:77670 Chest Pain SNOMED
15 obs:40301556 Past medical history SNOMED 15 obs:1126658 Hydromorphone RxNorm
14

Using Anchors with APHRODITE:
Step 1 – Build Keyword List - Anchors
Function call:
wordLists <- buildKeywordList(conn, aphrodite_concept_name, cdmSchema, dbms)
Generated keywords for “Myocardial Infarction”
15

Step 2 – Find Anchors
CDM
v5 Function call:
anchor_list <- getAnchors(conn, dbms, cdmSchema, cases, controls, as.character(ignoreList_FF$V3),
studyName, outcomeName, flag, numAnchors=50)
16

Step 3 – Curate Anchors
Manual Step
17

Step 4 – Find Cases and Controls - Anchors
CDMv5
Function call:
Anchored_casesANDcontrolspatient_ids_df<- getPatientCohort_w_Anchors(conn, dbms,
as.character(keywordList_FF$V3),as.character(ignoreList_FF$V3),
cdmSchema,nCases,nControls, flag)
18

Step 5 – Extract Patient Data - Anchors
CDMv5
Function calls:
dataFcases <- getPatientData(conn, dbms, cases, as.character(ignoreList_FF$V3), flag, cdmSchema)
dataFcontrols <- getPatientData(conn, dbms, controls, as.character(ignoreList_FF$V3), flag, cdmSchema)
19

Step 6 – Create Feature Vector - Anchors
Function calls:
fv_all<-buildFeatureVector(flag, dataFcases,dataFcontrols)
fv_full_data <- combineFeatureVectors(flag, data.frame(cases), controls, fv_all, outcomeName)
20

Step 7 – Build Model - Anchors
Function calls:
model_predictors <- buildModel(flag, fv_full_data, outcomeName, folder)
model<-model_predictors$model
predictorsNames<-model_predictors$predictorsNames
21

Anchors sugested for MI and T2DM
Myocardial Infarction (MI) Type 2 diabetes mellitus (T2DM)
Source
importance /
rank
concept_name Source
Importance /
rank
concept_name
obs 1.9036 1 Renal function lab 0.5637 1
Serum HDL/non-HDL cholesterol
ratio measurement
obs 1.8554 2 Every eight hours lab 0.5466 2 Glucose measurement
obs 1.7831 3 Chest CT obs 0.5328 3 Lipid panel
obs 1.7108 4 Cataract obs 0.5156 4 Asthma
obs 1.6386 5 Hypercholesterolemia obs 0.4984 5 Diabetes mellitus
obs 1.5663 6 Osteopenia lab 0.4778 6 Hyaline casts
obs 1.4699 7 Palpitations obs 0.4675 7 Pulmonary edema
obs 1.3735 8 Commode obs 0.4366 8 Obesity
obs 1.3012 9
Tightness sensation
quality
obs 0.4228 9 Hypoglycemia
obs 1.253 10 Afternoon obs 0.3987 10 Metformin
obs 1.012 11 Deficiency obs 0.3816 11 Duplex
drugEx 0.8916 12 ferrous sulfate obs 0.3712 12 Palpitations
drugEx 0.7711 13 Dobutamine visit 0.3541 13 Obesity
obs 0.7229 14 Fibrovascular lab 0.3334 14 Glucose
lab 0.5542 15
Sodium [Moles/volume] in
Blood
lab 0.3059 15 Hemoglobin A1c
drugEx 0.4819 16 clopidogrel obs 0.2784 16 Subclavicular approach
lab 0.3855 17 Lactic acid measurement obs 0.2303 17 Colonoscopy
obs 0.3133 18 Pressure ulcer obs 0.2131 18 Cholesterol
drugEx 0.1928 19 zolpidem drugEx 0.1925 19 Insulin, Regular, Human
obs 0.0964 20
Aspirin 81 MG Enteric
Coated Tablet
drugEx 0.0206 20 Oxycodone
22

APHRODITE anchored phenotype performance
Myocardial Infarction (MI) Type 2 Diabetes Mellitus (T2DM)
OMOP/PheKB 94 94 0.87 0.91 0.84 152 152 0.92 0.88 0.96
XPRESS 94 94 0.89 0.93 0.86 152 152 0.89 0.99 0.9
APHRODITE 94 94 0.91 0.93 0.9 152 152 0.91 0.98 0.88
APHRODITE
(Anchors)
94 94 0.92 0.97 0.89 152 152 0.92 0.95 0.9
23

Conclusions
We have successfully implemented the framework proposed by Agarwal
et al. and the Anchor learning framework by Halpern et al. to build and
refine phenotype models
We have demonstrated that it is possible identify anchors during the
model building process to generate a better labeled training set which
leads to a better performing model than just using keyword for noisy
labeling
Our main contribution is the APHRODITE package, which allows for the
potential redistribution of locally validated phenotype models as well as
the sharing of the workflows for learning phenotype models at multiple
sites of the OHDSI data network
24

Acknowledgements
Stanford
- Nigam Shah
- Vibhu Agarwal
- Shah lab
New York University
- Yoni Halpern
MIT
- David Sontag
OHDSI Community
25

QUESTIONS?
Download Aphrodite:
https://github.com/OHDSI/Aphrodite
Learn more:
http://tinyurl.com/use-aphrodite
25
@drjmbanda

28
Why do noisy labels work
Learning theory - Generalization error of a model trained using
data with inaccurate labels can be made small
 Model trained using m observations with accurate labels has the
same gen error as a model trained using m/(1 – 2τ) 2 observations
with noisy labels.
 Model inaccuracies in automatic labeling as random classification
noise given by error rate τ
 Unless τ is 0.5, having more observations will help
The cost of getting additional observations is negligible

29
Performance results for data removal experiments
APHRODITE 94 94 0.91 0.93 0.9 152 152 0.91 0.98 0.88
Observations
removed
94 94 0.75 0.84 0.78 152 152 0.67 0.76 0.71
Labs removed 94 94 0.87 0.85 0.82 152 152 0.69 0.78 0.72
Drugs removed 94 94 0.85 0.85 0.82 152 152 0.83 0.9 0.84
Visits removed 94 94 0.89 0.88 0.86 152 152 0.86 0.91 0.86
APHRODITE w
Anchors
94 94 0.92 0.97 0.89 152 152 0.92 0.95 0.9
Observations
removed
94 94 0.77 0.89 0.79 152 152 0.7 0.77 0.73
Labs removed 94 94 0.89 0.88 0.81 152 152 0.71 0.79 0.75
Drugs removed 94 94 0.86 0.87 0.84 152 152 0.86 0.91 0.85
Visits removed 94 94 0.91 0.9 0.87 152 152 0.88 0.9 0.84

ELECTRONIC PHENOTYPING WITH APHRODITE

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (6)

Similaire à ELECTRONIC PHENOTYPING WITH APHRODITE

Similaire à ELECTRONIC PHENOTYPING WITH APHRODITE (20)

Dernier

Dernier (20)

ELECTRONIC PHENOTYPING WITH APHRODITE

Notes de l'éditeur