SlideShare une entreprise Scribd logo
1  sur  65
Lorenzo Rossi, PhD
Data Scientist
City of Hope National Medical Center
DataCon LA, August 2019
Best Practices for Prototyping
Machine Learning Models for
Healthcare
Machine learning in healthcare is growing fast,
but best practices are not well established yet
Towards Guidelines for ML in Health (8.2018, Stanford)
Motivations for ML in Healthcare
1. Lots of information about patients, but not enough time for clinicians
to process it
2. Physicians spend too much time typing information about patients
during encounters
3. Overwhelming amount of false alerts (e.g. in ICU)
Topics
1. The electronic health record (EHR)
2. Cohort definition
3. Data quality
4. Training - testing split
5. Performance metrics and reporting
6. Survival analysis
Topics
1. The electronic health record (EHR)
2. Cohort definition
3. Data quality
4. Training - testing split
5. Performance metrics and reporting
6. Survival curves
Data preparation
1. The Electronic Health Record (EHR)
• Laboratory tests
• Vitals
• Diagnoses
• Medications
• X-rays, CT scans, EKGs, …
• Notes
EHR data are very heterogeneous
• Laboratory tests [multi dimensional time series]
• Vitals [multi dimensional time series]
• Diagnoses [text, codes]
• Medications [text, codes, numeric]
• X-rays, CT scans, EKGs,… [2D - 3D images, time series, ..]
• Notes [text]
EHR data are very heterogeneous
• labs
• vitals
• notes
• …
Time is a key aspect of EHR data
p01
p02
p03
time
• labs
• vitals
• notes
• …
Time is a key aspect of EHR data
p01
p02
p03
Temporal resolution varies a lot
• ICU patient [minutes]
• Hospital patient [hours]
• Outpatient [weeks]
time
• Unplanned 30 day readmission
• Length of stay
• Mortality
• Sepsis
• ICU admission
• Surgical complications
Events hospitals want to predict from EHR data
• Unplanned 30 day readmission
• Length of stay
• Mortality
• Sepsis
• ICU admission
• Surgical complications
Events hospitals want to predict from EHR data
Improve capacity
• Unplanned 30 day readmission
• Length of stay
• Mortality
• Sepsis
• ICU admission
• Surgical complications
Events hospitals want to predict from EHR data
Improve capacity
Optimize decisions
Consider only binary prediction tasks for simplicity
Prediction algorithm gives score from 0 to 1
– E.g. close to 1 → high risk of readmission within 30 days
0 / 1
Consider only binary prediction tasks for simplicity
Prediction algorithm gives score from 0 to 1
– E.g. close to 1 → high risk of readmission within 30 days
Trade-off between falsely detected and missed targets
0 / 1
2. Cohort Definition
Individuals “who experienced particular event during specific
period of time”
Cohort
Individuals “who experienced particular event during specific
period of time”
Given prediction task, select clinically relevant cohort
E.g. for surgery complication prediction, patients who had one
or more surgeries between 2011 and 2018.
Cohort
A. Pick records of subset of patients
• labs
• vitals
• notes
• …p01
p02
p03
time
B. Pick a prediction time for each patients. Records after
prediction time are discarded
• labs
• vitals
• notes
• …p01
p02
p03
time
B. Pick a prediction time for each patients. Records after
prediction time are discarded
• labs
• vitals
• notes
• …p01
p02
p03
time
3. Data Quality
[Image source: SalesForce]
EHR data challenging in many different ways
Example: most common non-numeric entries for
lab values in a legacy HER system
• pending
• “>60”
• see note
• not done
• “<2”
• normal
• “1+”
• “2 to 5”
• “<250”
• “<0.1”
Example: discrepancies in dates of death
between hospital records and Social Security (~
4.8 % of shared patients)
Anomalies vs. Outliers
Distinguish between Anomalies and Outliers
Outlier: legitimate data point far away from mean/median of
distribution
Anomaly: illegitimate data point generated by process
different from one producing rest of data
Need domain knowledge to differentiate
Distinguish between Anomalies and Outliers
Outlier: legitimate data point far away from mean/median of
distribution
Anomaly: illegitimate data point generated by process
different from one producing rest of data
Need domain knowledge to differentiate
E.g.: Albumin level in blood. Normal range: 3.4 – 5.4 g/dL.
µ=3.5, σ=0.65 over cohort.
Distinguish between Anomalies and Outliers
Outlier: legitimate data point far away from mean/median of
distribution
Anomaly: illegitimate data point generated by process
different from one generating rest of data
Need domain knowledge to differentiate
E.g.: Albumin level in blood. Normal range: 3.4 – 5.4 g/dL.
µ=3.5, σ=0.65 over cohort.
ρ = -1 → ?
Distinguish between Anomalies and Outliers
Outlier: legitimate data point far away from mean/median of
distribution
Anomaly: illegitimate data point generated by process
different from one generating rest of data
Need domain knowledge to differentiate
E.g.: Albumin level in blood. Normal range: 3.4 – 5.4 g/dL.
µ=3.5, σ=0.65 over cohort.
ρ = -1 → anomaly (treat as missing value)
Distinguish between Anomalies and Outliers
Outlier: legitimate data point far away from mean/median of
distribution
Anomaly: illegitimate data point generated by process
different from one generating rest of data
Need domain knowledge to differentiate
E.g.: Albumin level in blood. Normal range: 3.4 – 5.4 g/dL.
µ=3.5, σ=0.65 over cohort.
ρ = 1 → ?
Distinguish between Anomalies and Outliers
Outlier: legitimate data point far away from mean/median of
distribution
Anomaly: illegitimate data point generated by process
different from one generating rest of data
Need domain knowledge to differentiate
E.g.: Albumin level in blood. Normal range: 3.4 – 5.4 g/dL.
µ=3.5, σ=0.65 over cohort.
ρ = 1 → possibly a outlier (clinically relevant)
4. Training - Testing Split
• Machine learning models evaluated on ability to make
prediction on new (unseen) data
• Split train (cross-validation) and test sets based on
temporal criteria
– e.g. no records in train set after prediction dates in test set
– random splits, even if stratified, could include records virtually
from ‘future’ to train model
• In retrospective studies should also avoid records of same
patients across train and test
– model could just learn to recognize patients
Guidelines
5. Performance Metrics and Reporting
Background
Generally highly imbalanced problems:
15% unplanned 30 day readmissions
< 10% sepsis cases
< 1% 30 day mortality
Types of Performance Metrics
1. Measure trade-offs
– (ROC) AUC
– average precision / PR AUC
2. Measure error rate at specific decision point
– false positive, false negative rates
– precision, recall
– F1
– accuracy
Types of Performance Metrics (II)
1. Measure trade-offs
– AUC, average precision / PR AUC,
– good for global performance characterization and (intra)-
model comparisons
2. Measure error rate at a specific decision point
– false positives, false negatives, …, precision, recall
– possibly good for interpretation of specific clinical costs and
benefits
Don’t use accuracy unless dataset is balanced
ROC AUC can be misleading too
ROC AUC can be misleading (II)
[Avati, Ng et al., Countdown Regression: Sharp and Calibrated
Survival Predictions. ArXiv, 2018]
ROC AUC (1 year) > ROC AUC (5 years), but PR AUC (1
year) < PR AUC (5 years)! Latter prediction task is easier.
[Avati, Ng et al., Countdown Regression: Sharp and Calibrated
Survival Predictions. ArXiv, 2018]
Performance should be reported with both types
of metrics
• 1 or 2 metrics for trade-off evaluation
– ROC AUC
– average precision
• 1 metric for performance at clinically meaningful decision
point
– e.g. recall @ 90% precision
Performance should be reported with both types
of metrics
• 1 or 2 metrics for trade-off evaluation
– ROC AUC
– average precision
• 1 metric for performance at clinically meaningful decision
point
– e.g. recall @ 90% precision
+ Comparison with a known benchmark (baseline)
Metrics in Stanford 2017 paper on mortality
prediction: AUC, average precision, recall @ 90%
Benchmarks
Main paper [Google, Nature, 2018] only reports deep
learning results with no benchmark comparison
Comparison only in supplemental online file (not on
Nature paper): deep learning only 1-2% better than
logistic regression benchmark
Plot scales can be deceiving [undisclosed
vendor, 2017]!
Same TP, FP plots rescaled
6. Survival Analysis
B. Pick a prediction time for each patients. Records after
prediction time are discarded
• labs
• vitals
• notes
• …p01
p02
p03
C. Plot survival curves
• Consider binary classification tasks
– Event of interest (e.g. death) either happens or not before
censoring time
• Survival curve: distribution of time to event and time to
censoring
Different selections of prediction times lead to
different survival profiles over same cohort
Example: high percentage of patients deceased within 30
days. Model trained to distinguish mostly between
relatively healthy and moribund patients
Example: high percentage of patients deceased within 30
days. Model trained to distinguish mostly between
relatively healthy and moribund patients → performance
overestimate
Final Remarks
• Outliers should not to be treated like anomalies
• Split train (CV) and test sets temporally
• Metrics:
– ROC AUC alone could be misleading
– Precision-Recall curve often more useful than ROC
– Compare with meaningful benchmarks
• Performance possibly overestimated for cohorts with
unrealistic survival curves
Thank You!
Twitter: @LorenzoARossi
Supplemental Material
Example: ROC Curve
Very high detection rate,
but also high false alarm rate
Data Con LA 2019 - Best Practices for Prototyping Machine Learning Models for Healthcare by Lorenzo Rossi

Contenu connexe

Tendances

EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION ...
EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION ...EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION ...
EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION ...
IJDKP
 
When to Select Observational Studies as Evidence for Comparative Effectivenes...
When to Select Observational Studies as Evidence for Comparative Effectivenes...When to Select Observational Studies as Evidence for Comparative Effectivenes...
When to Select Observational Studies as Evidence for Comparative Effectivenes...
Effective Health Care Program
 
Therapeutic_Innovation_&_Regulatory_Science-2015-Tantsyura
Therapeutic_Innovation_&_Regulatory_Science-2015-TantsyuraTherapeutic_Innovation_&_Regulatory_Science-2015-Tantsyura
Therapeutic_Innovation_&_Regulatory_Science-2015-Tantsyura
Vadim Tantsyura
 

Tendances (20)

Fallacies indrayan
Fallacies indrayanFallacies indrayan
Fallacies indrayan
 
Knowledge discovery in medicine
Knowledge discovery in medicineKnowledge discovery in medicine
Knowledge discovery in medicine
 
Sample size and power calculations
Sample size and power calculationsSample size and power calculations
Sample size and power calculations
 
Scientific Studies Reporting Guidelines
Scientific Studies Reporting GuidelinesScientific Studies Reporting Guidelines
Scientific Studies Reporting Guidelines
 
EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION ...
EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION ...EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION ...
EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION ...
 
Trends in clinical research and career gd 09_may20
Trends in clinical research and career gd 09_may20Trends in clinical research and career gd 09_may20
Trends in clinical research and career gd 09_may20
 
Searching for Evidence
Searching for EvidenceSearching for Evidence
Searching for Evidence
 
Amsterdam 11.06.2008
Amsterdam 11.06.2008Amsterdam 11.06.2008
Amsterdam 11.06.2008
 
To Cochrane or not: that's the question
To Cochrane or not: that's the questionTo Cochrane or not: that's the question
To Cochrane or not: that's the question
 
When to Select Observational Studies as Evidence for Comparative Effectivenes...
When to Select Observational Studies as Evidence for Comparative Effectivenes...When to Select Observational Studies as Evidence for Comparative Effectivenes...
When to Select Observational Studies as Evidence for Comparative Effectivenes...
 
Therapeutic_Innovation_&_Regulatory_Science-2015-Tantsyura
Therapeutic_Innovation_&_Regulatory_Science-2015-TantsyuraTherapeutic_Innovation_&_Regulatory_Science-2015-Tantsyura
Therapeutic_Innovation_&_Regulatory_Science-2015-Tantsyura
 
Day 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in HealthcareDay 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in Healthcare
 
How to conduct meta analysis
How to conduct meta analysisHow to conduct meta analysis
How to conduct meta analysis
 
Meta analysis
Meta analysisMeta analysis
Meta analysis
 
Research methodology and biostatistics
Research methodology and biostatisticsResearch methodology and biostatistics
Research methodology and biostatistics
 
lecture C
lecture Clecture C
lecture C
 
Common statistical pitfalls in basic science research
Common statistical pitfalls in basic science researchCommon statistical pitfalls in basic science research
Common statistical pitfalls in basic science research
 
297 vickers
297 vickers297 vickers
297 vickers
 
297 vickers
297 vickers297 vickers
297 vickers
 
Malmo 11.11.2008
Malmo 11.11.2008Malmo 11.11.2008
Malmo 11.11.2008
 

Similaire à Data Con LA 2019 - Best Practices for Prototyping Machine Learning Models for Healthcare by Lorenzo Rossi

bio equivalence studies
bio equivalence studiesbio equivalence studies
bio equivalence studies
RamyaP53
 
Automated Abstracting - NCRA San Antonio 2015
Automated Abstracting - NCRA San Antonio 2015Automated Abstracting - NCRA San Antonio 2015
Automated Abstracting - NCRA San Antonio 2015
Victor Brunka
 
Cenduit_Whitepaper_Forecasting_Present_14June2016
Cenduit_Whitepaper_Forecasting_Present_14June2016Cenduit_Whitepaper_Forecasting_Present_14June2016
Cenduit_Whitepaper_Forecasting_Present_14June2016
Praveen Chand
 
Extrapolation of time-to-event data
Extrapolation of time-to-event dataExtrapolation of time-to-event data
Extrapolation of time-to-event data
Sheily Kamra
 
Data-driven Disease Phenotyping and Bulk Learning
Data-driven Disease Phenotyping and Bulk LearningData-driven Disease Phenotyping and Bulk Learning
Data-driven Disease Phenotyping and Bulk Learning
Po-Hsiang (Barnett) Chiu
 

Similaire à Data Con LA 2019 - Best Practices for Prototyping Machine Learning Models for Healthcare by Lorenzo Rossi (20)

Final_Presentation.pptx
Final_Presentation.pptxFinal_Presentation.pptx
Final_Presentation.pptx
 
SHE, Quality, and Ethics in Medical Laboratories - PCLP
SHE, Quality, and Ethics in Medical Laboratories - PCLPSHE, Quality, and Ethics in Medical Laboratories - PCLP
SHE, Quality, and Ethics in Medical Laboratories - PCLP
 
Cadth 2015 c2 tt eincea_cadth_042015
Cadth 2015 c2 tt eincea_cadth_042015Cadth 2015 c2 tt eincea_cadth_042015
Cadth 2015 c2 tt eincea_cadth_042015
 
Data analysis ( Bio-statistic )
Data analysis ( Bio-statistic )Data analysis ( Bio-statistic )
Data analysis ( Bio-statistic )
 
statistics introduction.ppt
statistics introduction.pptstatistics introduction.ppt
statistics introduction.ppt
 
Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...
 
bio equivalence studies
bio equivalence studiesbio equivalence studies
bio equivalence studies
 
Evaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk predictionEvaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk prediction
 
Automated Abstracting - NCRA San Antonio 2015
Automated Abstracting - NCRA San Antonio 2015Automated Abstracting - NCRA San Antonio 2015
Automated Abstracting - NCRA San Antonio 2015
 
Statistics for DP Biology IA
Statistics for DP Biology IAStatistics for DP Biology IA
Statistics for DP Biology IA
 
Biological variation as an uncertainty component
Biological variation as an uncertainty componentBiological variation as an uncertainty component
Biological variation as an uncertainty component
 
In tech quality-control_in_clinical_laboratories
In tech quality-control_in_clinical_laboratoriesIn tech quality-control_in_clinical_laboratories
In tech quality-control_in_clinical_laboratories
 
Quality control clia
Quality control cliaQuality control clia
Quality control clia
 
First in man tokyo
First in man tokyoFirst in man tokyo
First in man tokyo
 
Cenduit_Whitepaper_Forecasting_Present_14June2016
Cenduit_Whitepaper_Forecasting_Present_14June2016Cenduit_Whitepaper_Forecasting_Present_14June2016
Cenduit_Whitepaper_Forecasting_Present_14June2016
 
ICU SCORES.pptx
ICU SCORES.pptxICU SCORES.pptx
ICU SCORES.pptx
 
Clinical Research Statistics for Non-Statisticians
Clinical Research Statistics for Non-StatisticiansClinical Research Statistics for Non-Statisticians
Clinical Research Statistics for Non-Statisticians
 
Extrapolation of time-to-event data
Extrapolation of time-to-event dataExtrapolation of time-to-event data
Extrapolation of time-to-event data
 
Data-driven Disease Phenotyping and Bulk Learning
Data-driven Disease Phenotyping and Bulk LearningData-driven Disease Phenotyping and Bulk Learning
Data-driven Disease Phenotyping and Bulk Learning
 
Sample size calculation
Sample size calculationSample size calculation
Sample size calculation
 

Plus de Data Con LA

Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA
 

Plus de Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Dernier

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Data Con LA 2019 - Best Practices for Prototyping Machine Learning Models for Healthcare by Lorenzo Rossi

  • 1. Lorenzo Rossi, PhD Data Scientist City of Hope National Medical Center DataCon LA, August 2019 Best Practices for Prototyping Machine Learning Models for Healthcare
  • 2.
  • 3. Machine learning in healthcare is growing fast, but best practices are not well established yet Towards Guidelines for ML in Health (8.2018, Stanford)
  • 4. Motivations for ML in Healthcare 1. Lots of information about patients, but not enough time for clinicians to process it 2. Physicians spend too much time typing information about patients during encounters 3. Overwhelming amount of false alerts (e.g. in ICU)
  • 5. Topics 1. The electronic health record (EHR) 2. Cohort definition 3. Data quality 4. Training - testing split 5. Performance metrics and reporting 6. Survival analysis
  • 6. Topics 1. The electronic health record (EHR) 2. Cohort definition 3. Data quality 4. Training - testing split 5. Performance metrics and reporting 6. Survival curves Data preparation
  • 7. 1. The Electronic Health Record (EHR)
  • 8. • Laboratory tests • Vitals • Diagnoses • Medications • X-rays, CT scans, EKGs, … • Notes EHR data are very heterogeneous
  • 9. • Laboratory tests [multi dimensional time series] • Vitals [multi dimensional time series] • Diagnoses [text, codes] • Medications [text, codes, numeric] • X-rays, CT scans, EKGs,… [2D - 3D images, time series, ..] • Notes [text] EHR data are very heterogeneous
  • 10. • labs • vitals • notes • … Time is a key aspect of EHR data p01 p02 p03 time
  • 11. • labs • vitals • notes • … Time is a key aspect of EHR data p01 p02 p03 Temporal resolution varies a lot • ICU patient [minutes] • Hospital patient [hours] • Outpatient [weeks] time
  • 12. • Unplanned 30 day readmission • Length of stay • Mortality • Sepsis • ICU admission • Surgical complications Events hospitals want to predict from EHR data
  • 13. • Unplanned 30 day readmission • Length of stay • Mortality • Sepsis • ICU admission • Surgical complications Events hospitals want to predict from EHR data Improve capacity
  • 14. • Unplanned 30 day readmission • Length of stay • Mortality • Sepsis • ICU admission • Surgical complications Events hospitals want to predict from EHR data Improve capacity Optimize decisions
  • 15. Consider only binary prediction tasks for simplicity Prediction algorithm gives score from 0 to 1 – E.g. close to 1 → high risk of readmission within 30 days 0 / 1
  • 16. Consider only binary prediction tasks for simplicity Prediction algorithm gives score from 0 to 1 – E.g. close to 1 → high risk of readmission within 30 days Trade-off between falsely detected and missed targets 0 / 1
  • 18. Individuals “who experienced particular event during specific period of time” Cohort
  • 19. Individuals “who experienced particular event during specific period of time” Given prediction task, select clinically relevant cohort E.g. for surgery complication prediction, patients who had one or more surgeries between 2011 and 2018. Cohort
  • 20. A. Pick records of subset of patients • labs • vitals • notes • …p01 p02 p03 time
  • 21. B. Pick a prediction time for each patients. Records after prediction time are discarded • labs • vitals • notes • …p01 p02 p03 time
  • 22. B. Pick a prediction time for each patients. Records after prediction time are discarded • labs • vitals • notes • …p01 p02 p03 time
  • 23. 3. Data Quality [Image source: SalesForce]
  • 24. EHR data challenging in many different ways
  • 25. Example: most common non-numeric entries for lab values in a legacy HER system • pending • “>60” • see note • not done • “<2” • normal • “1+” • “2 to 5” • “<250” • “<0.1”
  • 26. Example: discrepancies in dates of death between hospital records and Social Security (~ 4.8 % of shared patients)
  • 28. Distinguish between Anomalies and Outliers Outlier: legitimate data point far away from mean/median of distribution Anomaly: illegitimate data point generated by process different from one producing rest of data Need domain knowledge to differentiate
  • 29. Distinguish between Anomalies and Outliers Outlier: legitimate data point far away from mean/median of distribution Anomaly: illegitimate data point generated by process different from one producing rest of data Need domain knowledge to differentiate E.g.: Albumin level in blood. Normal range: 3.4 – 5.4 g/dL. µ=3.5, σ=0.65 over cohort.
  • 30. Distinguish between Anomalies and Outliers Outlier: legitimate data point far away from mean/median of distribution Anomaly: illegitimate data point generated by process different from one generating rest of data Need domain knowledge to differentiate E.g.: Albumin level in blood. Normal range: 3.4 – 5.4 g/dL. µ=3.5, σ=0.65 over cohort. ρ = -1 → ?
  • 31. Distinguish between Anomalies and Outliers Outlier: legitimate data point far away from mean/median of distribution Anomaly: illegitimate data point generated by process different from one generating rest of data Need domain knowledge to differentiate E.g.: Albumin level in blood. Normal range: 3.4 – 5.4 g/dL. µ=3.5, σ=0.65 over cohort. ρ = -1 → anomaly (treat as missing value)
  • 32. Distinguish between Anomalies and Outliers Outlier: legitimate data point far away from mean/median of distribution Anomaly: illegitimate data point generated by process different from one generating rest of data Need domain knowledge to differentiate E.g.: Albumin level in blood. Normal range: 3.4 – 5.4 g/dL. µ=3.5, σ=0.65 over cohort. ρ = 1 → ?
  • 33. Distinguish between Anomalies and Outliers Outlier: legitimate data point far away from mean/median of distribution Anomaly: illegitimate data point generated by process different from one generating rest of data Need domain knowledge to differentiate E.g.: Albumin level in blood. Normal range: 3.4 – 5.4 g/dL. µ=3.5, σ=0.65 over cohort. ρ = 1 → possibly a outlier (clinically relevant)
  • 34. 4. Training - Testing Split
  • 35.
  • 36. • Machine learning models evaluated on ability to make prediction on new (unseen) data • Split train (cross-validation) and test sets based on temporal criteria – e.g. no records in train set after prediction dates in test set – random splits, even if stratified, could include records virtually from ‘future’ to train model • In retrospective studies should also avoid records of same patients across train and test – model could just learn to recognize patients Guidelines
  • 37. 5. Performance Metrics and Reporting
  • 38. Background Generally highly imbalanced problems: 15% unplanned 30 day readmissions < 10% sepsis cases < 1% 30 day mortality
  • 39. Types of Performance Metrics 1. Measure trade-offs – (ROC) AUC – average precision / PR AUC 2. Measure error rate at specific decision point – false positive, false negative rates – precision, recall – F1 – accuracy
  • 40. Types of Performance Metrics (II) 1. Measure trade-offs – AUC, average precision / PR AUC, – good for global performance characterization and (intra)- model comparisons 2. Measure error rate at a specific decision point – false positives, false negatives, …, precision, recall – possibly good for interpretation of specific clinical costs and benefits
  • 41. Don’t use accuracy unless dataset is balanced
  • 42. ROC AUC can be misleading too
  • 43.
  • 44. ROC AUC can be misleading (II) [Avati, Ng et al., Countdown Regression: Sharp and Calibrated Survival Predictions. ArXiv, 2018]
  • 45. ROC AUC (1 year) > ROC AUC (5 years), but PR AUC (1 year) < PR AUC (5 years)! Latter prediction task is easier. [Avati, Ng et al., Countdown Regression: Sharp and Calibrated Survival Predictions. ArXiv, 2018]
  • 46. Performance should be reported with both types of metrics • 1 or 2 metrics for trade-off evaluation – ROC AUC – average precision • 1 metric for performance at clinically meaningful decision point – e.g. recall @ 90% precision
  • 47. Performance should be reported with both types of metrics • 1 or 2 metrics for trade-off evaluation – ROC AUC – average precision • 1 metric for performance at clinically meaningful decision point – e.g. recall @ 90% precision + Comparison with a known benchmark (baseline)
  • 48. Metrics in Stanford 2017 paper on mortality prediction: AUC, average precision, recall @ 90%
  • 50. Main paper [Google, Nature, 2018] only reports deep learning results with no benchmark comparison
  • 51. Comparison only in supplemental online file (not on Nature paper): deep learning only 1-2% better than logistic regression benchmark
  • 52. Plot scales can be deceiving [undisclosed vendor, 2017]!
  • 53. Same TP, FP plots rescaled
  • 55. B. Pick a prediction time for each patients. Records after prediction time are discarded • labs • vitals • notes • …p01 p02 p03
  • 56. C. Plot survival curves • Consider binary classification tasks – Event of interest (e.g. death) either happens or not before censoring time • Survival curve: distribution of time to event and time to censoring
  • 57. Different selections of prediction times lead to different survival profiles over same cohort
  • 58. Example: high percentage of patients deceased within 30 days. Model trained to distinguish mostly between relatively healthy and moribund patients
  • 59. Example: high percentage of patients deceased within 30 days. Model trained to distinguish mostly between relatively healthy and moribund patients → performance overestimate
  • 60. Final Remarks • Outliers should not to be treated like anomalies • Split train (CV) and test sets temporally • Metrics: – ROC AUC alone could be misleading – Precision-Recall curve often more useful than ROC – Compare with meaningful benchmarks • Performance possibly overestimated for cohorts with unrealistic survival curves
  • 63.
  • 64. Example: ROC Curve Very high detection rate, but also high false alarm rate