1. Electronic phenotyping with
APHRODITE and the Observational
Health Sciences and Informatics
(OHDSI) data network
S16: Papers - EHR-based Phenotyping
2017 Joint Summits on Translational Science,
March 28th, 2017
J U A N M . B A N D A , P H D 1 , Y O N I H A L P E R N , P H D 2 , D AV I D S O N T A G , P H D 3 , N I G A M
H . S H A H , P H D 1
1 S T A N F O R D C E N T E R F O R B I O M E D I C A L I N F O R M A T I C S R E S E A R C H , S T A N F O R D , C A ;
2 N E W Y O R K U N I V . , N E W Y O R K , N Y ;
3 M I T , C A M B R I D G E , M A
2. DISCLOSURE:
SPEAKER DISCLOSES THAT HE HAS NO RELATIONSHIPS WITH COMMERCIAL
INTERESTS.
1
Electronic phenotyping with APHRODITE and the
Observational Health Sciences and Informatics
(OHDSI) data network
3. Motivation
- Phenotyping is the beginning step to almost every biomedical study, yet
in many cases is very complicated
- Popular modern approaches are rule-based which take very long to
develop
- Can we build a statistical model which provides performance
comparable to a consensus definition and requires less development
effort? …. Turns out we can! (Agarwal V, et al. and Halpern Y, et al.)
- Can we port these two approaches to the Observational Health
Sciences and Informatics (OHDSI) tech stack?
2
Agarwal V, et al. Learning statistical models of phenotypes using noisy labeled training data. JAMIA. 2016. 23 (6), 1166-1173
Halpern Y, et al. Electronic medical record phenotyping using the anchor and learn framework. JAMIA 2016. 23 (4): 731-740
4. Automated PHenotype Routine for Observational
Definition, Identification, Training and Evaluation
(APHRODITE)
- Framework is designed to enable phenotyping via supervised (or semi-
supervised) learning of phenotype models
- Reads patient data from the OHDSI/OMOP CDM version 5
- Combines two recently published phenotyping approaches:
- XPRESS (Agarwal V, et al.)
- Achor learning (Halpern Y, et al.)
- In addition, to enable sharing and reproducibility of the underlying
phenotype 'recipes', APHRODITE allows sharing of either the trained
model or sharing of the configuration settings and anchor selections
across multiple sites
Agarwal V, et al. Learning statistical models of phenotypes using noisy labeled training data. JAMIA. 2016. 23 (6), 1166-1173
Halpern Y, et al. Electronic medical record phenotyping using the anchor and learn framework. JAMIA 2016. 23 (4): 731-740
3
5. 4
+140 collaborators
16 countries
Common CDM and
vocabulary
The patient network available includes 84 databases, both clinical and claims,
totaling over 650 million patients
13. How do we know ‘noisy labeling’ works?
Using the evaluation of Agarwal V., et al.:
- Selected one phenotype each from defined by the Electronic Medical
Records and Genomics (eMERGE) and the Observational Medical
Outcomes Partnership (OMOP) initiatives:
- Myocardial infarction
- Type 2 diabetes mellitus
- Created manually reviewed gold standard of cases and controls
Agarwal V., et al. Learning statistical models of phenotypes using noisy labeled training data. JAMIA. 2016. 23 (6), 1166-1173
Cases Controls Accuracy Recall PPV Cases Controls Accuracy Recall PPV
Source Myocardial Infarction (MI) Type 2 Diabetes Mellitus (T2DM)
OMOP / PheKB
Definition 94 94 0.87 0.91 0.84 152 152 0.92 0.88 0.96
XPRESS Noisy
labels 94 94 0.85 0.93 0.8 152 152 0.89 0.99 0.81
APHRODITE
Noisy labels 94 94 0.94 0.87 1.00 152 152 0.91 0.98 0.87
12
14. Building phenotype models with APHRODITE
Using the following keywords
Performance of noisly labeled trained models
Performance of gold-standard test set
Myocardial Infarction (MI) Type 2 diabetes mellitus (T2DM)
Old myocardial infarction Type 2 diabetes mellitus with hyperosmolar coma
True posterior myocardial infarction Type 2 diabetes mellitus
Myocardial infarction with complication Pre-existing type 2 diabetes mellitus
Myocardial infarction in recovery phase
Type 2 diabetes mellitus with multiple
complications
Microinfarct of heart Type 2 diabetes mellitus in non-obese
Cases Cont. Acc. Recall PPV Acc. Recall PPV
Myocardial Infarction (MI) Type 2 Diabetes Mellitus (T2DM)
XPRESS 750 750 0.86 0.89 0.84 0.88 0.89 0.87
APHRODITE 750 750 0.9 0.92 0.89 0.89 0.92 0.87
APHRODITE 1,500 1,500 0.9 0.93 0.9 0.91 0.93 0.88
APHRODITE 10,000 10,000 0.91 0.93 0.91 0.92 0.94 0.89
Cases Cont. Acc. Recall PPV Cases Cont. Acc. Recall PPV
Source Myocardial Infarction (MI) Type 2 Diabetes Mellitus (T2DM)
OMOP/PheKB 94 94 0.87 0.91 0.84 152 152 0.92 0.88 0.96
XPRESS 94 94 0.89 0.93 0.86 152 152 0.89 0.88 0.9
APHRODITE
(750) 94 94 0.91 0.93 0.90 152 152 0.91 0.95 0.88
APHRODITE
(1,500) 94 94 0.92 0.93 0.91 152 152 0.92 0.95 0.89
APHRODITE
(10,000) 94 94 0.92 0.94 0.91 152 152 0.93 0.96 0.89
13
25. Conclusions
We have successfully implemented the framework proposed by Agarwal
et al. and the Anchor learning framework by Halpern et al. to build and
refine phenotype models
We have demonstrated that it is possible identify anchors during the
model building process to generate a better labeled training set which
leads to a better performing model than just using keyword for noisy
labeling
Our main contribution is the APHRODITE package, which allows for the
potential redistribution of locally validated phenotype models as well as
the sharing of the workflows for learning phenotype models at multiple
sites of the OHDSI data network
Agarwal V, et al. Learning statistical models of phenotypes using noisy labeled training data. JAMIA. 2016. 23 (6), 1166-1173
Halpern Y, et al. Electronic medical record phenotyping using the anchor and learn framework. JAMIA 2016. 23 (4): 731-740
24
28. 28
Why do noisy labels work
Learning theory - Generalization error of a model trained using
data with inaccurate labels can be made small
Model trained using m observations with accurate labels has the
same gen error as a model trained using m/(1 – 2τ) 2 observations
with noisy labels.
Model inaccuracies in automatic labeling as random classification
noise given by error rate τ
Unless τ is 0.5, having more observations will help
The cost of getting additional observations is negligible
Less cumbersome: (not requiring long lists of rules)
Nearly unsupervised (minimial user input).
- Multi-stakeholder, interdisciplinary collaborative to bring out the value of health data through large-scale analytics
- All our solutions are open-source
Manual review:
Each record was voted a case (or control) if two clinicians agreed, and a third clinician approved
We ensure that the potential cases and controls found in the noisy-labeled sets are completely disjoint from this manually reviewed evaluation set.
Potential question: You will get a question here about why is the case:control 50:50 and not what you would have in real life. (e.g. ~14% cases for diabetes)
140 collaborators in 16 countries
Not surprising. Learning theory - Generalization error of a model trained using data with inaccurate labels can be made small
Model trained using m observations with accurate labels has the same gen error as a model trained using m/(1 – 2τ) 2 observations with noisy labels.
Model inaccuracies in automatic labeling as random classification noise given by error rate τ
Unless τ is 0.5, having more observations will help
The cost of getting additional observations is negligible
With over 20,000 features (on average) the regular APHRODITE models with and without anchors perform the best, but with very close performance to the models that exclude the visits data. For the phenotypes we present, this indicates that most of the coded data is not particularly useful in making the phenotype assignments. It is also quite evident that removing the text features (observations) results in nearly a 15% drop in accuracy demonstrating that access to the unstructured portions of the medical record is crucial for the success of training phenotype models with imperfectly labeled training data.