SlideShare une entreprise Scribd logo
1  sur  29
Electronic phenotyping with
APHRODITE and the Observational
Health Sciences and Informatics
(OHDSI) data network
S16: Papers - EHR-based Phenotyping
2017 Joint Summits on Translational Science,
March 28th, 2017
J U A N M . B A N D A , P H D 1 , Y O N I H A L P E R N , P H D 2 , D AV I D S O N T A G , P H D 3 , N I G A M
H . S H A H , P H D 1
1 S T A N F O R D C E N T E R F O R B I O M E D I C A L I N F O R M A T I C S R E S E A R C H , S T A N F O R D , C A ;
2 N E W Y O R K U N I V . , N E W Y O R K , N Y ;
3 M I T , C A M B R I D G E , M A
DISCLOSURE:
SPEAKER DISCLOSES THAT HE HAS NO RELATIONSHIPS WITH COMMERCIAL
INTERESTS.
1
Electronic phenotyping with APHRODITE and the
Observational Health Sciences and Informatics
(OHDSI) data network
Motivation
- Phenotyping is the beginning step to almost every biomedical study, yet
in many cases is very complicated
- Popular modern approaches are rule-based which take very long to
develop
- Can we build a statistical model which provides performance
comparable to a consensus definition and requires less development
effort? …. Turns out we can! (Agarwal V, et al. and Halpern Y, et al.)
- Can we port these two approaches to the Observational Health
Sciences and Informatics (OHDSI) tech stack?
2
Agarwal V, et al. Learning statistical models of phenotypes using noisy labeled training data. JAMIA. 2016. 23 (6), 1166-1173
Halpern Y, et al. Electronic medical record phenotyping using the anchor and learn framework. JAMIA 2016. 23 (4): 731-740
Automated PHenotype Routine for Observational
Definition, Identification, Training and Evaluation
(APHRODITE)
- Framework is designed to enable phenotyping via supervised (or semi-
supervised) learning of phenotype models
- Reads patient data from the OHDSI/OMOP CDM version 5
- Combines two recently published phenotyping approaches:
- XPRESS (Agarwal V, et al.)
- Achor learning (Halpern Y, et al.)
- In addition, to enable sharing and reproducibility of the underlying
phenotype 'recipes', APHRODITE allows sharing of either the trained
model or sharing of the configuration settings and anchor selections
across multiple sites
Agarwal V, et al. Learning statistical models of phenotypes using noisy labeled training data. JAMIA. 2016. 23 (6), 1166-1173
Halpern Y, et al. Electronic medical record phenotyping using the anchor and learn framework. JAMIA 2016. 23 (4): 731-740
3
4
+140 collaborators
16 countries
Common CDM and
vocabulary
The patient network available includes 84 databases, both clinical and claims,
totaling over 650 million patients
OHDSI /OMOP CDMv5
APHRODITE Overview
5
Step 1 – Build Keyword List
Function call:
wordLists <- buildKeywordList(conn, aphrodite_concept_name, cdmSchema, dbms)
Generated keywords for “Myocardial Infarction”
Vocabulary v5
6
Step 2 – Find Cases and Controls
CDMv5
Function call:
casesANDcontrolspatient_ids_df<- getdPatientCohort(conn, dbms,
as.character(keywordList_FF$V3),as.character(ignoreList_FF$V3), cdmSchema,nCases,nControls)
7
Step 3 – Extract Patient Data
CDMv5
Function calls:
dataFcases <- getPatientData(conn, dbms, cases, as.character(ignoreList_FF$V3), flag, cdmSchema)
dataFcontrols <- getPatientData(conn, dbms, controls, as.character(ignoreList_FF$V3), flag, cdmSchema)
8
Step 4 – Create Feature Vector
Function calls:
fv_all<-buildFeatureVector(flag, dataFcases,dataFcontrols)
fv_full_data <- combineFeatureVectors(flag, data.frame(cases), controls, fv_all, outcomeName)
9
Step 5 – Build Model
Function calls:
model_predictors <- buildModel(flag, fv_full_data, outcomeName, folder)
model<-model_predictors$model
predictorsNames<-model_predictors$predictorsNames
10
Step 6 – Get Evaluation Results
11
How do we know ‘noisy labeling’ works?
Using the evaluation of Agarwal V., et al.:
- Selected one phenotype each from defined by the Electronic Medical
Records and Genomics (eMERGE) and the Observational Medical
Outcomes Partnership (OMOP) initiatives:
- Myocardial infarction
- Type 2 diabetes mellitus
- Created manually reviewed gold standard of cases and controls
Agarwal V., et al. Learning statistical models of phenotypes using noisy labeled training data. JAMIA. 2016. 23 (6), 1166-1173
Cases Controls Accuracy Recall PPV Cases Controls Accuracy Recall PPV
Source Myocardial Infarction (MI) Type 2 Diabetes Mellitus (T2DM)
OMOP / PheKB
Definition 94 94 0.87 0.91 0.84 152 152 0.92 0.88 0.96
XPRESS Noisy
labels 94 94 0.85 0.93 0.8 152 152 0.89 0.99 0.81
APHRODITE
Noisy labels 94 94 0.94 0.87 1.00 152 152 0.91 0.98 0.87
12
Building phenotype models with APHRODITE
Using the following keywords
Performance of noisly labeled trained models
Performance of gold-standard test set
Myocardial Infarction (MI) Type 2 diabetes mellitus (T2DM)
Old myocardial infarction Type 2 diabetes mellitus with hyperosmolar coma
True posterior myocardial infarction Type 2 diabetes mellitus
Myocardial infarction with complication Pre-existing type 2 diabetes mellitus
Myocardial infarction in recovery phase
Type 2 diabetes mellitus with multiple
complications
Microinfarct of heart Type 2 diabetes mellitus in non-obese
Cases Cont. Acc. Recall PPV Acc. Recall PPV
Myocardial Infarction (MI) Type 2 Diabetes Mellitus (T2DM)
XPRESS 750 750 0.86 0.89 0.84 0.88 0.89 0.87
APHRODITE 750 750 0.9 0.92 0.89 0.89 0.92 0.87
APHRODITE 1,500 1,500 0.9 0.93 0.9 0.91 0.93 0.88
APHRODITE 10,000 10,000 0.91 0.93 0.91 0.92 0.94 0.89
Cases Cont. Acc. Recall PPV Cases Cont. Acc. Recall PPV
Source Myocardial Infarction (MI) Type 2 Diabetes Mellitus (T2DM)
OMOP/PheKB 94 94 0.87 0.91 0.84 152 152 0.92 0.88 0.96
XPRESS 94 94 0.89 0.93 0.86 152 152 0.89 0.88 0.9
APHRODITE
(750) 94 94 0.91 0.93 0.90 152 152 0.91 0.95 0.88
APHRODITE
(1,500) 94 94 0.92 0.93 0.91 152 152 0.92 0.95 0.89
APHRODITE
(10,000) 94 94 0.92 0.94 0.91 152 152 0.93 0.96 0.89
13
15
Top 15 features for T2DM and MI
Type 2 diabetes mellitus (T2DM) Myocardial Infarction (MI)
Rank Feature Concept Name Concept Vocabulary Rank Feature Concept Name
Concept
Vocabulary
1 drugEx:1503297 Metformin RxNorm 1 obs:40398391 Hypertensive disease SNOMED
2 obs: 433736 Obesity SNOMED 2 obs:4243768 Auscultation SNOMED
3 obs:16890009 Insulin measurement SNOMED 3 obs:4342792 Infarction NDFRT
4 obs:434005 Morbid Obesity SNOMED 4 obs:4191705
Coronary heart disease
review SNOMED
5 lab:4149519:NA Glucose measurement SNOMED 5 obs:4167217
Family history of clinical
finding SNOMED
6 drugEx:253182 Regular Insulin, Human RxNorm 6 obs:4282140 Chewable tablet SNOMED
7 drugEx:1112807 Aspirin RxNorm 7 drugEx:1112807 Aspirin RxNorm
8 lab: 4144235:NA
Glucose measurement,
blood SNOMED 8 obs:4325480 Aspirin NDFRT
9 obs:45889912 Hemoglobin SNOMED 9 lab:3002030 Lymphocytes % LOINC
10 obs:45889669 Insulin CPT4 10 lab:3014576 Chloride serum/plasma LOINC
11 lab:3004410 Hemoglobin A1c (Glycated) LOINC 11 obs:4329041 Pain SNOMED
12 obs: 40398391 Hypertensive disease SNOMED 12 lab:40789893 Potassium | Bld-Ser-Plas LOINC
13 obs:1503297 Metformin RxNorm 13 obs:433595 Edema SNOMED
14 obs:4325480 Aspirin NDFRT 14 visit:77670 Chest Pain SNOMED
15 obs:40301556 Past medical history SNOMED 15 obs:1126658 Hydromorphone RxNorm
14
Using Anchors with APHRODITE:
Step 1 – Build Keyword List - Anchors
Function call:
wordLists <- buildKeywordList(conn, aphrodite_concept_name, cdmSchema, dbms)
Generated keywords for “Myocardial Infarction”
15
Step 2 – Find Anchors
CDM
v5 Function call:
anchor_list <- getAnchors(conn, dbms, cdmSchema, cases, controls, as.character(ignoreList_FF$V3),
studyName, outcomeName, flag, numAnchors=50)
16
Step 3 – Curate Anchors
Manual Step
17
Step 4 – Find Cases and Controls - Anchors
CDMv5
Function call:
Anchored_casesANDcontrolspatient_ids_df<- getPatientCohort_w_Anchors(conn, dbms,
as.character(keywordList_FF$V3),as.character(ignoreList_FF$V3),
cdmSchema,nCases,nControls, flag)
18
Step 5 – Extract Patient Data - Anchors
CDMv5
Function calls:
dataFcases <- getPatientData(conn, dbms, cases, as.character(ignoreList_FF$V3), flag, cdmSchema)
dataFcontrols <- getPatientData(conn, dbms, controls, as.character(ignoreList_FF$V3), flag, cdmSchema)
19
Step 6 – Create Feature Vector - Anchors
Function calls:
fv_all<-buildFeatureVector(flag, dataFcases,dataFcontrols)
fv_full_data <- combineFeatureVectors(flag, data.frame(cases), controls, fv_all, outcomeName)
20
Step 7 – Build Model - Anchors
Function calls:
model_predictors <- buildModel(flag, fv_full_data, outcomeName, folder)
model<-model_predictors$model
predictorsNames<-model_predictors$predictorsNames
21
Anchors sugested for MI and T2DM
Myocardial Infarction (MI) Type 2 diabetes mellitus (T2DM)
Source
importance /
rank
concept_name Source
Importance /
rank
concept_name
obs 1.9036 1 Renal function lab 0.5637 1
Serum HDL/non-HDL cholesterol
ratio measurement
obs 1.8554 2 Every eight hours lab 0.5466 2 Glucose measurement
obs 1.7831 3 Chest CT obs 0.5328 3 Lipid panel
obs 1.7108 4 Cataract obs 0.5156 4 Asthma
obs 1.6386 5 Hypercholesterolemia obs 0.4984 5 Diabetes mellitus
obs 1.5663 6 Osteopenia lab 0.4778 6 Hyaline casts
obs 1.4699 7 Palpitations obs 0.4675 7 Pulmonary edema
obs 1.3735 8 Commode obs 0.4366 8 Obesity
obs 1.3012 9
Tightness sensation
quality
obs 0.4228 9 Hypoglycemia
obs 1.253 10 Afternoon obs 0.3987 10 Metformin
obs 1.012 11 Deficiency obs 0.3816 11 Duplex
drugEx 0.8916 12 ferrous sulfate obs 0.3712 12 Palpitations
drugEx 0.7711 13 Dobutamine visit 0.3541 13 Obesity
obs 0.7229 14 Fibrovascular lab 0.3334 14 Glucose
lab 0.5542 15
Sodium [Moles/volume] in
Blood
lab 0.3059 15 Hemoglobin A1c
drugEx 0.4819 16 clopidogrel obs 0.2784 16 Subclavicular approach
lab 0.3855 17 Lactic acid measurement obs 0.2303 17 Colonoscopy
obs 0.3133 18 Pressure ulcer obs 0.2131 18 Cholesterol
drugEx 0.1928 19 zolpidem drugEx 0.1925 19 Insulin, Regular, Human
obs 0.0964 20
Aspirin 81 MG Enteric
Coated Tablet
drugEx 0.0206 20 Oxycodone
22
APHRODITE anchored phenotype performance
Cases Cont. Acc. Recall PPV Cases Cont. Acc. Recall PPV
Myocardial Infarction (MI) Type 2 Diabetes Mellitus (T2DM)
OMOP/PheKB 94 94 0.87 0.91 0.84 152 152 0.92 0.88 0.96
XPRESS 94 94 0.89 0.93 0.86 152 152 0.89 0.99 0.9
APHRODITE 94 94 0.91 0.93 0.9 152 152 0.91 0.98 0.88
APHRODITE
(Anchors)
94 94 0.92 0.97 0.89 152 152 0.92 0.95 0.9
23
Conclusions
We have successfully implemented the framework proposed by Agarwal
et al. and the Anchor learning framework by Halpern et al. to build and
refine phenotype models
We have demonstrated that it is possible identify anchors during the
model building process to generate a better labeled training set which
leads to a better performing model than just using keyword for noisy
labeling
Our main contribution is the APHRODITE package, which allows for the
potential redistribution of locally validated phenotype models as well as
the sharing of the workflows for learning phenotype models at multiple
sites of the OHDSI data network
Agarwal V, et al. Learning statistical models of phenotypes using noisy labeled training data. JAMIA. 2016. 23 (6), 1166-1173
Halpern Y, et al. Electronic medical record phenotyping using the anchor and learn framework. JAMIA 2016. 23 (4): 731-740
24
Acknowledgements
Stanford
- Nigam Shah
- Vibhu Agarwal
- Shah lab
New York University
- Yoni Halpern
MIT
- David Sontag
OHDSI Community
25
QUESTIONS?
Download Aphrodite:
https://github.com/OHDSI/Aphrodite
Learn more:
http://tinyurl.com/use-aphrodite
25
@drjmbanda
28
Why do noisy labels work
Learning theory - Generalization error of a model trained using
data with inaccurate labels can be made small
 Model trained using m observations with accurate labels has the
same gen error as a model trained using m/(1 – 2τ) 2 observations
with noisy labels.
 Model inaccuracies in automatic labeling as random classification
noise given by error rate τ
 Unless τ is 0.5, having more observations will help
The cost of getting additional observations is negligible
29
Performance results for data removal experiments
Cases Cont. Acc. Recall PPV Cases Cont. Acc. Recall PPV
Source Myocardial Infarction (MI) Type 2 Diabetes Mellitus (T2DM)
APHRODITE 94 94 0.91 0.93 0.9 152 152 0.91 0.98 0.88
Observations
removed
94 94 0.75 0.84 0.78 152 152 0.67 0.76 0.71
Labs removed 94 94 0.87 0.85 0.82 152 152 0.69 0.78 0.72
Drugs removed 94 94 0.85 0.85 0.82 152 152 0.83 0.9 0.84
Visits removed 94 94 0.89 0.88 0.86 152 152 0.86 0.91 0.86
APHRODITE w
Anchors
94 94 0.92 0.97 0.89 152 152 0.92 0.95 0.9
Observations
removed
94 94 0.77 0.89 0.79 152 152 0.7 0.77 0.73
Labs removed 94 94 0.89 0.88 0.81 152 152 0.71 0.79 0.75
Drugs removed 94 94 0.86 0.87 0.84 152 152 0.86 0.91 0.85
Visits removed 94 94 0.91 0.9 0.87 152 152 0.88 0.9 0.84

Contenu connexe

En vedette

Gaceta de Ciudades Sustentables - Centro Mario Molina
Gaceta de Ciudades Sustentables - Centro Mario MolinaGaceta de Ciudades Sustentables - Centro Mario Molina
Gaceta de Ciudades Sustentables - Centro Mario MolinaMonica Lafon
 
«Журнал "Мій Бізнес"» № 19, березень 2017
«Журнал "Мій Бізнес"» № 19, березень 2017«Журнал "Мій Бізнес"» № 19, березень 2017
«Журнал "Мій Бізнес"» № 19, березень 2017DIXI Group
 
How Can PHP Web Development Benefits to My Business?
How Can PHP Web Development Benefits to My Business?How Can PHP Web Development Benefits to My Business?
How Can PHP Web Development Benefits to My Business?iFuturz
 
Reporte Diario Bursátil 28 Marzo 2017
 Reporte Diario Bursátil  28 Marzo 2017 Reporte Diario Bursátil  28 Marzo 2017
Reporte Diario Bursátil 28 Marzo 2017Grupo Coril
 
Modern Indian School, Nepal, Improvement: Making the Best School.. Better
Modern Indian School, Nepal, Improvement: Making the Best School.. BetterModern Indian School, Nepal, Improvement: Making the Best School.. Better
Modern Indian School, Nepal, Improvement: Making the Best School.. Betteramulya123
 

En vedette (6)

Gaceta de Ciudades Sustentables - Centro Mario Molina
Gaceta de Ciudades Sustentables - Centro Mario MolinaGaceta de Ciudades Sustentables - Centro Mario Molina
Gaceta de Ciudades Sustentables - Centro Mario Molina
 
«Журнал "Мій Бізнес"» № 19, березень 2017
«Журнал "Мій Бізнес"» № 19, березень 2017«Журнал "Мій Бізнес"» № 19, березень 2017
«Журнал "Мій Бізнес"» № 19, березень 2017
 
How Can PHP Web Development Benefits to My Business?
How Can PHP Web Development Benefits to My Business?How Can PHP Web Development Benefits to My Business?
How Can PHP Web Development Benefits to My Business?
 
Reporte Diario Bursátil 28 Marzo 2017
 Reporte Diario Bursátil  28 Marzo 2017 Reporte Diario Bursátil  28 Marzo 2017
Reporte Diario Bursátil 28 Marzo 2017
 
Pregunta marzo 2017 - Inversiones Comunidad de Madrid
Pregunta marzo 2017 - Inversiones Comunidad de MadridPregunta marzo 2017 - Inversiones Comunidad de Madrid
Pregunta marzo 2017 - Inversiones Comunidad de Madrid
 
Modern Indian School, Nepal, Improvement: Making the Best School.. Better
Modern Indian School, Nepal, Improvement: Making the Best School.. BetterModern Indian School, Nepal, Improvement: Making the Best School.. Better
Modern Indian School, Nepal, Improvement: Making the Best School.. Better
 

Similaire à ELECTRONIC PHENOTYPING WITH APHRODITE

Prediction of Heart Disease Using Machine Learning and Deep Learning Techniques.
Prediction of Heart Disease Using Machine Learning and Deep Learning Techniques.Prediction of Heart Disease Using Machine Learning and Deep Learning Techniques.
Prediction of Heart Disease Using Machine Learning and Deep Learning Techniques.IRJET Journal
 
Data Science in Healthcare -The University Malaya Medical Centre Breast Cance...
Data Science in Healthcare -The University Malaya Medical Centre Breast Cance...Data Science in Healthcare -The University Malaya Medical Centre Breast Cance...
Data Science in Healthcare -The University Malaya Medical Centre Breast Cance...University of Malaya
 
Comparative Analysis of Heart Disease Prediction Models: Unveiling the Most A...
Comparative Analysis of Heart Disease Prediction Models: Unveiling the Most A...Comparative Analysis of Heart Disease Prediction Models: Unveiling the Most A...
Comparative Analysis of Heart Disease Prediction Models: Unveiling the Most A...IRJET Journal
 
MS (and NMR) data standards in Metabolomics why, how and some caveats
MS (and NMR) data standards in Metabolomics why, how and some caveatsMS (and NMR) data standards in Metabolomics why, how and some caveats
MS (and NMR) data standards in Metabolomics why, how and some caveatsSteffen Neumann
 
1Big Data Analytics forHealthcareChandan K. ReddyD.docx
1Big Data Analytics forHealthcareChandan K. ReddyD.docx1Big Data Analytics forHealthcareChandan K. ReddyD.docx
1Big Data Analytics forHealthcareChandan K. ReddyD.docxaulasnilda
 
Heart Failure Prediction using Different MachineLearning Techniques
Heart Failure Prediction using Different MachineLearning TechniquesHeart Failure Prediction using Different MachineLearning Techniques
Heart Failure Prediction using Different MachineLearning TechniquesIRJET Journal
 
IRJET- Develop Futuristic Prediction Regarding Details of Health System for H...
IRJET- Develop Futuristic Prediction Regarding Details of Health System for H...IRJET- Develop Futuristic Prediction Regarding Details of Health System for H...
IRJET- Develop Futuristic Prediction Regarding Details of Health System for H...IRJET Journal
 
Parkinson’s Disease Detection By MachineLearning Using SVM
Parkinson’s Disease Detection By MachineLearning Using SVMParkinson’s Disease Detection By MachineLearning Using SVM
Parkinson’s Disease Detection By MachineLearning Using SVMIRJET Journal
 
Heart Disease Prediction using Machine Learning Algorithms
Heart Disease Prediction using Machine Learning AlgorithmsHeart Disease Prediction using Machine Learning Algorithms
Heart Disease Prediction using Machine Learning AlgorithmsIRJET Journal
 
Elsevier Medical Graph – mit Machine Learning zu Precision Medicine
Elsevier Medical Graph – mit Machine Learning zu Precision MedicineElsevier Medical Graph – mit Machine Learning zu Precision Medicine
Elsevier Medical Graph – mit Machine Learning zu Precision MedicineRising Media Ltd.
 
[Review] High-performance medicine: the convergence of human and artificial i...
[Review] High-performance medicine: the convergence of human and artificial i...[Review] High-performance medicine: the convergence of human and artificial i...
[Review] High-performance medicine: the convergence of human and artificial i...Dongmin Choi
 
Heart disease classification using Random Forest
Heart disease classification using Random ForestHeart disease classification using Random Forest
Heart disease classification using Random ForestIRJET Journal
 
Deep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsDeep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsValery Tkachenko
 
HEART DISEASE PREDICTION RANDOM FOREST ALGORITHMS
HEART DISEASE PREDICTION RANDOM FOREST ALGORITHMSHEART DISEASE PREDICTION RANDOM FOREST ALGORITHMS
HEART DISEASE PREDICTION RANDOM FOREST ALGORITHMSIRJET Journal
 
Impact of Big Data & Artificial Intelligence in Drug Discovery & Development ...
Impact of Big Data & Artificial Intelligence in Drug Discovery & Development ...Impact of Big Data & Artificial Intelligence in Drug Discovery & Development ...
Impact of Big Data & Artificial Intelligence in Drug Discovery & Development ...Nick Brown
 
Meaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchMeaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchNolan Nichols
 
Chronic Kidney Disease Prediction Using Machine Learning
Chronic Kidney Disease Prediction Using Machine LearningChronic Kidney Disease Prediction Using Machine Learning
Chronic Kidney Disease Prediction Using Machine LearningIJCSIS Research Publications
 
PREDICTION OF DIABETES (SUGAR) USING MACHINE LEARNING TECHNIQUES
PREDICTION OF DIABETES (SUGAR) USING MACHINE LEARNING TECHNIQUESPREDICTION OF DIABETES (SUGAR) USING MACHINE LEARNING TECHNIQUES
PREDICTION OF DIABETES (SUGAR) USING MACHINE LEARNING TECHNIQUESIRJET Journal
 
Data mining in pharmacovigilance
Data mining in pharmacovigilanceData mining in pharmacovigilance
Data mining in pharmacovigilanceBhaswat Chakraborty
 

Similaire à ELECTRONIC PHENOTYPING WITH APHRODITE (20)

Prediction of Heart Disease Using Machine Learning and Deep Learning Techniques.
Prediction of Heart Disease Using Machine Learning and Deep Learning Techniques.Prediction of Heart Disease Using Machine Learning and Deep Learning Techniques.
Prediction of Heart Disease Using Machine Learning and Deep Learning Techniques.
 
Data Science in Healthcare -The University Malaya Medical Centre Breast Cance...
Data Science in Healthcare -The University Malaya Medical Centre Breast Cance...Data Science in Healthcare -The University Malaya Medical Centre Breast Cance...
Data Science in Healthcare -The University Malaya Medical Centre Breast Cance...
 
Comparative Analysis of Heart Disease Prediction Models: Unveiling the Most A...
Comparative Analysis of Heart Disease Prediction Models: Unveiling the Most A...Comparative Analysis of Heart Disease Prediction Models: Unveiling the Most A...
Comparative Analysis of Heart Disease Prediction Models: Unveiling the Most A...
 
MS (and NMR) data standards in Metabolomics why, how and some caveats
MS (and NMR) data standards in Metabolomics why, how and some caveatsMS (and NMR) data standards in Metabolomics why, how and some caveats
MS (and NMR) data standards in Metabolomics why, how and some caveats
 
1Big Data Analytics forHealthcareChandan K. ReddyD.docx
1Big Data Analytics forHealthcareChandan K. ReddyD.docx1Big Data Analytics forHealthcareChandan K. ReddyD.docx
1Big Data Analytics forHealthcareChandan K. ReddyD.docx
 
Heart Failure Prediction using Different MachineLearning Techniques
Heart Failure Prediction using Different MachineLearning TechniquesHeart Failure Prediction using Different MachineLearning Techniques
Heart Failure Prediction using Different MachineLearning Techniques
 
IRJET- Develop Futuristic Prediction Regarding Details of Health System for H...
IRJET- Develop Futuristic Prediction Regarding Details of Health System for H...IRJET- Develop Futuristic Prediction Regarding Details of Health System for H...
IRJET- Develop Futuristic Prediction Regarding Details of Health System for H...
 
Parkinson’s Disease Detection By MachineLearning Using SVM
Parkinson’s Disease Detection By MachineLearning Using SVMParkinson’s Disease Detection By MachineLearning Using SVM
Parkinson’s Disease Detection By MachineLearning Using SVM
 
Heart Disease Prediction using Machine Learning Algorithms
Heart Disease Prediction using Machine Learning AlgorithmsHeart Disease Prediction using Machine Learning Algorithms
Heart Disease Prediction using Machine Learning Algorithms
 
14 00-20171207 rance-piv_c
14 00-20171207 rance-piv_c14 00-20171207 rance-piv_c
14 00-20171207 rance-piv_c
 
Elsevier Medical Graph – mit Machine Learning zu Precision Medicine
Elsevier Medical Graph – mit Machine Learning zu Precision MedicineElsevier Medical Graph – mit Machine Learning zu Precision Medicine
Elsevier Medical Graph – mit Machine Learning zu Precision Medicine
 
[Review] High-performance medicine: the convergence of human and artificial i...
[Review] High-performance medicine: the convergence of human and artificial i...[Review] High-performance medicine: the convergence of human and artificial i...
[Review] High-performance medicine: the convergence of human and artificial i...
 
Heart disease classification using Random Forest
Heart disease classification using Random ForestHeart disease classification using Random Forest
Heart disease classification using Random Forest
 
Deep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsDeep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpoints
 
HEART DISEASE PREDICTION RANDOM FOREST ALGORITHMS
HEART DISEASE PREDICTION RANDOM FOREST ALGORITHMSHEART DISEASE PREDICTION RANDOM FOREST ALGORITHMS
HEART DISEASE PREDICTION RANDOM FOREST ALGORITHMS
 
Impact of Big Data & Artificial Intelligence in Drug Discovery & Development ...
Impact of Big Data & Artificial Intelligence in Drug Discovery & Development ...Impact of Big Data & Artificial Intelligence in Drug Discovery & Development ...
Impact of Big Data & Artificial Intelligence in Drug Discovery & Development ...
 
Meaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchMeaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine research
 
Chronic Kidney Disease Prediction Using Machine Learning
Chronic Kidney Disease Prediction Using Machine LearningChronic Kidney Disease Prediction Using Machine Learning
Chronic Kidney Disease Prediction Using Machine Learning
 
PREDICTION OF DIABETES (SUGAR) USING MACHINE LEARNING TECHNIQUES
PREDICTION OF DIABETES (SUGAR) USING MACHINE LEARNING TECHNIQUESPREDICTION OF DIABETES (SUGAR) USING MACHINE LEARNING TECHNIQUES
PREDICTION OF DIABETES (SUGAR) USING MACHINE LEARNING TECHNIQUES
 
Data mining in pharmacovigilance
Data mining in pharmacovigilanceData mining in pharmacovigilance
Data mining in pharmacovigilance
 

Dernier

Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 

Dernier (20)

Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 

ELECTRONIC PHENOTYPING WITH APHRODITE

  • 1. Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network S16: Papers - EHR-based Phenotyping 2017 Joint Summits on Translational Science, March 28th, 2017 J U A N M . B A N D A , P H D 1 , Y O N I H A L P E R N , P H D 2 , D AV I D S O N T A G , P H D 3 , N I G A M H . S H A H , P H D 1 1 S T A N F O R D C E N T E R F O R B I O M E D I C A L I N F O R M A T I C S R E S E A R C H , S T A N F O R D , C A ; 2 N E W Y O R K U N I V . , N E W Y O R K , N Y ; 3 M I T , C A M B R I D G E , M A
  • 2. DISCLOSURE: SPEAKER DISCLOSES THAT HE HAS NO RELATIONSHIPS WITH COMMERCIAL INTERESTS. 1 Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network
  • 3. Motivation - Phenotyping is the beginning step to almost every biomedical study, yet in many cases is very complicated - Popular modern approaches are rule-based which take very long to develop - Can we build a statistical model which provides performance comparable to a consensus definition and requires less development effort? …. Turns out we can! (Agarwal V, et al. and Halpern Y, et al.) - Can we port these two approaches to the Observational Health Sciences and Informatics (OHDSI) tech stack? 2 Agarwal V, et al. Learning statistical models of phenotypes using noisy labeled training data. JAMIA. 2016. 23 (6), 1166-1173 Halpern Y, et al. Electronic medical record phenotyping using the anchor and learn framework. JAMIA 2016. 23 (4): 731-740
  • 4. Automated PHenotype Routine for Observational Definition, Identification, Training and Evaluation (APHRODITE) - Framework is designed to enable phenotyping via supervised (or semi- supervised) learning of phenotype models - Reads patient data from the OHDSI/OMOP CDM version 5 - Combines two recently published phenotyping approaches: - XPRESS (Agarwal V, et al.) - Achor learning (Halpern Y, et al.) - In addition, to enable sharing and reproducibility of the underlying phenotype 'recipes', APHRODITE allows sharing of either the trained model or sharing of the configuration settings and anchor selections across multiple sites Agarwal V, et al. Learning statistical models of phenotypes using noisy labeled training data. JAMIA. 2016. 23 (6), 1166-1173 Halpern Y, et al. Electronic medical record phenotyping using the anchor and learn framework. JAMIA 2016. 23 (4): 731-740 3
  • 5. 4 +140 collaborators 16 countries Common CDM and vocabulary The patient network available includes 84 databases, both clinical and claims, totaling over 650 million patients
  • 7. Step 1 – Build Keyword List Function call: wordLists <- buildKeywordList(conn, aphrodite_concept_name, cdmSchema, dbms) Generated keywords for “Myocardial Infarction” Vocabulary v5 6
  • 8. Step 2 – Find Cases and Controls CDMv5 Function call: casesANDcontrolspatient_ids_df<- getdPatientCohort(conn, dbms, as.character(keywordList_FF$V3),as.character(ignoreList_FF$V3), cdmSchema,nCases,nControls) 7
  • 9. Step 3 – Extract Patient Data CDMv5 Function calls: dataFcases <- getPatientData(conn, dbms, cases, as.character(ignoreList_FF$V3), flag, cdmSchema) dataFcontrols <- getPatientData(conn, dbms, controls, as.character(ignoreList_FF$V3), flag, cdmSchema) 8
  • 10. Step 4 – Create Feature Vector Function calls: fv_all<-buildFeatureVector(flag, dataFcases,dataFcontrols) fv_full_data <- combineFeatureVectors(flag, data.frame(cases), controls, fv_all, outcomeName) 9
  • 11. Step 5 – Build Model Function calls: model_predictors <- buildModel(flag, fv_full_data, outcomeName, folder) model<-model_predictors$model predictorsNames<-model_predictors$predictorsNames 10
  • 12. Step 6 – Get Evaluation Results 11
  • 13. How do we know ‘noisy labeling’ works? Using the evaluation of Agarwal V., et al.: - Selected one phenotype each from defined by the Electronic Medical Records and Genomics (eMERGE) and the Observational Medical Outcomes Partnership (OMOP) initiatives: - Myocardial infarction - Type 2 diabetes mellitus - Created manually reviewed gold standard of cases and controls Agarwal V., et al. Learning statistical models of phenotypes using noisy labeled training data. JAMIA. 2016. 23 (6), 1166-1173 Cases Controls Accuracy Recall PPV Cases Controls Accuracy Recall PPV Source Myocardial Infarction (MI) Type 2 Diabetes Mellitus (T2DM) OMOP / PheKB Definition 94 94 0.87 0.91 0.84 152 152 0.92 0.88 0.96 XPRESS Noisy labels 94 94 0.85 0.93 0.8 152 152 0.89 0.99 0.81 APHRODITE Noisy labels 94 94 0.94 0.87 1.00 152 152 0.91 0.98 0.87 12
  • 14. Building phenotype models with APHRODITE Using the following keywords Performance of noisly labeled trained models Performance of gold-standard test set Myocardial Infarction (MI) Type 2 diabetes mellitus (T2DM) Old myocardial infarction Type 2 diabetes mellitus with hyperosmolar coma True posterior myocardial infarction Type 2 diabetes mellitus Myocardial infarction with complication Pre-existing type 2 diabetes mellitus Myocardial infarction in recovery phase Type 2 diabetes mellitus with multiple complications Microinfarct of heart Type 2 diabetes mellitus in non-obese Cases Cont. Acc. Recall PPV Acc. Recall PPV Myocardial Infarction (MI) Type 2 Diabetes Mellitus (T2DM) XPRESS 750 750 0.86 0.89 0.84 0.88 0.89 0.87 APHRODITE 750 750 0.9 0.92 0.89 0.89 0.92 0.87 APHRODITE 1,500 1,500 0.9 0.93 0.9 0.91 0.93 0.88 APHRODITE 10,000 10,000 0.91 0.93 0.91 0.92 0.94 0.89 Cases Cont. Acc. Recall PPV Cases Cont. Acc. Recall PPV Source Myocardial Infarction (MI) Type 2 Diabetes Mellitus (T2DM) OMOP/PheKB 94 94 0.87 0.91 0.84 152 152 0.92 0.88 0.96 XPRESS 94 94 0.89 0.93 0.86 152 152 0.89 0.88 0.9 APHRODITE (750) 94 94 0.91 0.93 0.90 152 152 0.91 0.95 0.88 APHRODITE (1,500) 94 94 0.92 0.93 0.91 152 152 0.92 0.95 0.89 APHRODITE (10,000) 94 94 0.92 0.94 0.91 152 152 0.93 0.96 0.89 13
  • 15. 15 Top 15 features for T2DM and MI Type 2 diabetes mellitus (T2DM) Myocardial Infarction (MI) Rank Feature Concept Name Concept Vocabulary Rank Feature Concept Name Concept Vocabulary 1 drugEx:1503297 Metformin RxNorm 1 obs:40398391 Hypertensive disease SNOMED 2 obs: 433736 Obesity SNOMED 2 obs:4243768 Auscultation SNOMED 3 obs:16890009 Insulin measurement SNOMED 3 obs:4342792 Infarction NDFRT 4 obs:434005 Morbid Obesity SNOMED 4 obs:4191705 Coronary heart disease review SNOMED 5 lab:4149519:NA Glucose measurement SNOMED 5 obs:4167217 Family history of clinical finding SNOMED 6 drugEx:253182 Regular Insulin, Human RxNorm 6 obs:4282140 Chewable tablet SNOMED 7 drugEx:1112807 Aspirin RxNorm 7 drugEx:1112807 Aspirin RxNorm 8 lab: 4144235:NA Glucose measurement, blood SNOMED 8 obs:4325480 Aspirin NDFRT 9 obs:45889912 Hemoglobin SNOMED 9 lab:3002030 Lymphocytes % LOINC 10 obs:45889669 Insulin CPT4 10 lab:3014576 Chloride serum/plasma LOINC 11 lab:3004410 Hemoglobin A1c (Glycated) LOINC 11 obs:4329041 Pain SNOMED 12 obs: 40398391 Hypertensive disease SNOMED 12 lab:40789893 Potassium | Bld-Ser-Plas LOINC 13 obs:1503297 Metformin RxNorm 13 obs:433595 Edema SNOMED 14 obs:4325480 Aspirin NDFRT 14 visit:77670 Chest Pain SNOMED 15 obs:40301556 Past medical history SNOMED 15 obs:1126658 Hydromorphone RxNorm 14
  • 16. Using Anchors with APHRODITE: Step 1 – Build Keyword List - Anchors Function call: wordLists <- buildKeywordList(conn, aphrodite_concept_name, cdmSchema, dbms) Generated keywords for “Myocardial Infarction” 15
  • 17. Step 2 – Find Anchors CDM v5 Function call: anchor_list <- getAnchors(conn, dbms, cdmSchema, cases, controls, as.character(ignoreList_FF$V3), studyName, outcomeName, flag, numAnchors=50) 16
  • 18. Step 3 – Curate Anchors Manual Step 17
  • 19. Step 4 – Find Cases and Controls - Anchors CDMv5 Function call: Anchored_casesANDcontrolspatient_ids_df<- getPatientCohort_w_Anchors(conn, dbms, as.character(keywordList_FF$V3),as.character(ignoreList_FF$V3), cdmSchema,nCases,nControls, flag) 18
  • 20. Step 5 – Extract Patient Data - Anchors CDMv5 Function calls: dataFcases <- getPatientData(conn, dbms, cases, as.character(ignoreList_FF$V3), flag, cdmSchema) dataFcontrols <- getPatientData(conn, dbms, controls, as.character(ignoreList_FF$V3), flag, cdmSchema) 19
  • 21. Step 6 – Create Feature Vector - Anchors Function calls: fv_all<-buildFeatureVector(flag, dataFcases,dataFcontrols) fv_full_data <- combineFeatureVectors(flag, data.frame(cases), controls, fv_all, outcomeName) 20
  • 22. Step 7 – Build Model - Anchors Function calls: model_predictors <- buildModel(flag, fv_full_data, outcomeName, folder) model<-model_predictors$model predictorsNames<-model_predictors$predictorsNames 21
  • 23. Anchors sugested for MI and T2DM Myocardial Infarction (MI) Type 2 diabetes mellitus (T2DM) Source importance / rank concept_name Source Importance / rank concept_name obs 1.9036 1 Renal function lab 0.5637 1 Serum HDL/non-HDL cholesterol ratio measurement obs 1.8554 2 Every eight hours lab 0.5466 2 Glucose measurement obs 1.7831 3 Chest CT obs 0.5328 3 Lipid panel obs 1.7108 4 Cataract obs 0.5156 4 Asthma obs 1.6386 5 Hypercholesterolemia obs 0.4984 5 Diabetes mellitus obs 1.5663 6 Osteopenia lab 0.4778 6 Hyaline casts obs 1.4699 7 Palpitations obs 0.4675 7 Pulmonary edema obs 1.3735 8 Commode obs 0.4366 8 Obesity obs 1.3012 9 Tightness sensation quality obs 0.4228 9 Hypoglycemia obs 1.253 10 Afternoon obs 0.3987 10 Metformin obs 1.012 11 Deficiency obs 0.3816 11 Duplex drugEx 0.8916 12 ferrous sulfate obs 0.3712 12 Palpitations drugEx 0.7711 13 Dobutamine visit 0.3541 13 Obesity obs 0.7229 14 Fibrovascular lab 0.3334 14 Glucose lab 0.5542 15 Sodium [Moles/volume] in Blood lab 0.3059 15 Hemoglobin A1c drugEx 0.4819 16 clopidogrel obs 0.2784 16 Subclavicular approach lab 0.3855 17 Lactic acid measurement obs 0.2303 17 Colonoscopy obs 0.3133 18 Pressure ulcer obs 0.2131 18 Cholesterol drugEx 0.1928 19 zolpidem drugEx 0.1925 19 Insulin, Regular, Human obs 0.0964 20 Aspirin 81 MG Enteric Coated Tablet drugEx 0.0206 20 Oxycodone 22
  • 24. APHRODITE anchored phenotype performance Cases Cont. Acc. Recall PPV Cases Cont. Acc. Recall PPV Myocardial Infarction (MI) Type 2 Diabetes Mellitus (T2DM) OMOP/PheKB 94 94 0.87 0.91 0.84 152 152 0.92 0.88 0.96 XPRESS 94 94 0.89 0.93 0.86 152 152 0.89 0.99 0.9 APHRODITE 94 94 0.91 0.93 0.9 152 152 0.91 0.98 0.88 APHRODITE (Anchors) 94 94 0.92 0.97 0.89 152 152 0.92 0.95 0.9 23
  • 25. Conclusions We have successfully implemented the framework proposed by Agarwal et al. and the Anchor learning framework by Halpern et al. to build and refine phenotype models We have demonstrated that it is possible identify anchors during the model building process to generate a better labeled training set which leads to a better performing model than just using keyword for noisy labeling Our main contribution is the APHRODITE package, which allows for the potential redistribution of locally validated phenotype models as well as the sharing of the workflows for learning phenotype models at multiple sites of the OHDSI data network Agarwal V, et al. Learning statistical models of phenotypes using noisy labeled training data. JAMIA. 2016. 23 (6), 1166-1173 Halpern Y, et al. Electronic medical record phenotyping using the anchor and learn framework. JAMIA 2016. 23 (4): 731-740 24
  • 26. Acknowledgements Stanford - Nigam Shah - Vibhu Agarwal - Shah lab New York University - Yoni Halpern MIT - David Sontag OHDSI Community 25
  • 28. 28 Why do noisy labels work Learning theory - Generalization error of a model trained using data with inaccurate labels can be made small  Model trained using m observations with accurate labels has the same gen error as a model trained using m/(1 – 2τ) 2 observations with noisy labels.  Model inaccuracies in automatic labeling as random classification noise given by error rate τ  Unless τ is 0.5, having more observations will help The cost of getting additional observations is negligible
  • 29. 29 Performance results for data removal experiments Cases Cont. Acc. Recall PPV Cases Cont. Acc. Recall PPV Source Myocardial Infarction (MI) Type 2 Diabetes Mellitus (T2DM) APHRODITE 94 94 0.91 0.93 0.9 152 152 0.91 0.98 0.88 Observations removed 94 94 0.75 0.84 0.78 152 152 0.67 0.76 0.71 Labs removed 94 94 0.87 0.85 0.82 152 152 0.69 0.78 0.72 Drugs removed 94 94 0.85 0.85 0.82 152 152 0.83 0.9 0.84 Visits removed 94 94 0.89 0.88 0.86 152 152 0.86 0.91 0.86 APHRODITE w Anchors 94 94 0.92 0.97 0.89 152 152 0.92 0.95 0.9 Observations removed 94 94 0.77 0.89 0.79 152 152 0.7 0.77 0.73 Labs removed 94 94 0.89 0.88 0.81 152 152 0.71 0.79 0.75 Drugs removed 94 94 0.86 0.87 0.84 152 152 0.86 0.91 0.85 Visits removed 94 94 0.91 0.9 0.87 152 152 0.88 0.9 0.84

Notes de l'éditeur

  1. Less cumbersome: (not requiring long lists of rules) Nearly unsupervised (minimial user input).
  2. - Multi-stakeholder, interdisciplinary collaborative to bring out the value of health data through large-scale analytics - All our solutions are open-source
  3. Manual review: Each record was voted a case (or control) if two clinicians agreed, and a third clinician approved We ensure that the potential cases and controls found in the noisy-labeled sets are completely disjoint from this manually reviewed evaluation set.
  4. Potential question: You will get a question here about why is the case:control 50:50 and not what you would have in real life. (e.g. ~14% cases for diabetes)
  5. 140 collaborators in 16 countries
  6. Not surprising. Learning theory - Generalization error of a model trained using data with inaccurate labels can be made small Model trained using m observations with accurate labels has the same gen error as a model trained using m/(1 – 2τ) 2 observations with noisy labels. Model inaccuracies in automatic labeling as random classification noise given by error rate τ Unless τ is 0.5, having more observations will help The cost of getting additional observations is negligible
  7. With over 20,000 features (on average) the regular APHRODITE models with and without anchors perform the best, but with very close performance to the models that exclude the visits data. For the phenotypes we present, this indicates that most of the coded data is not particularly useful in making the phenotype assignments. It is also quite evident that removing the text features (observations) results in nearly a 15% drop in accuracy demonstrating that access to the unstructured portions of the medical record is crucial for the success of training phenotype models with imperfectly labeled training data.