SlideShare une entreprise Scribd logo
1  sur  32
Str-AI-ght to heaven?
Pitfalls for clinical decision support based on AI
Ben Van Calster
Department Development and Regeneration and EPI-centre, KU Leuven
Department Biomedical Data Sciences, LUMC Leiden
Research Ethics Committee, UZ Leuven
ben.vancalster@kuleuven.be; @BenVanCalster
ISUOG World Congress, 16 October 2021
Disclaimer
• Talk last year: “a plea for good methodology”
• This talk builds on that, in the context of AI and machine learning
• There is a lot of hype surrounding AI/ML. It may have potential, but we better
start to get real!
2
https://lawtomated.com/enough-with-the-a-i-hype-and-why/
Lawtomated
Do not celebrate too early…
3
Copyright Bas Czerwinski / Getty Images
Julian Alaphilippe, Liège-Bastogne-Liège (Oct. 4th, 2020)
Real winner: Primož Roglič
Real winner
Deep learning on medical images
4
Topol. Nat Med 2019;25:44-56. Zhu et al. Front Neurol 2019;10:869.
Titano et al. Nat Med 2018;24:1337-41; Nam et al. Radiology 2019;290:218-28; Ehteshami Bejnordi et al. JAMA 2017;318:2199-210;
Esteva et al. Nature 2017;542:115-8; De Fauw et al. Nat Med 2018;24:1342-50; Raman et al. Eye 2019;33:97-109.
Machine Learning for ‘EHR’ data
5
Rajkomar et al. Npj Digit Med 2018;1:18.
Rose. JAMA Netw Open 2018;1:e181404.
Reason for popularity?
6
“Very complex machine learning algorithms are highly flexible,
and hence find relationships we could not see before.
Therefore we make better predictions and better decisions.”
→ Guaranteed success!
Right?
Pitfalls for “predictive analytics”
7
 1. Poor methodology
 2. Lack of evidence
 3. Considerable heterogeneity
 4. (Financial) conflicts of interest
 5. Actual implementation in clinical practice
1. Methodology matters, not impact factors
8
Altman DG. BMJ 1994;308:283-284.
Van Calster et al, J Clin Epidemiol, in press.
Altman. BMJ 1994.
Our own frustration paper. JCE 2021.
‘Predictive analytics’: covid-19
9
Wynants et al. BMJ 2020;369:m1328.
The review found more than 1 paper a day (!)
Results not trustworthy for 97% of the 231 models
Median sample size: 338
Non-representative sample: 42%
Representativity unclear: 25%
Data analysis problematic: 94%
No model validation at all: 22%
Predictive analytics for covid-19
10
Wynants et al. BMJ 2020;369:m1328
Deep learning models for covid-19 diagnosis using CT or RX
- No discussion of target population or setting
- Control group (without covid-19):
 Images from pediatric population
 Images from a different country
 Images from different time periods
 Barely defined, e.g. ‘healthy persons’
- Images from online repository, without further information
- Often not any demographic description (not even age or sex!)
Covid-19 deep learning: deep failure!
11
Roberts et al. Nat Mach Intell 2021;3:199-217.
Public covid-19 RX datasets
12
Santa Cruz et al. Med Image Analysis 2021;74:102225.
Complex algorithms are data hungry
So you dream of
having a Porsche?
If you cannot (or don’t want to) pay for it,
you may get this...
This also holds for predictive analytics. More fancy model? More expensive.
Currency: GOOD data.
13
Measurement and data quality
14
Missing values: the tricky importance of the invisible
Measurement: timing and procedure matters
Outcome: quality labels are key (see e.g. deep learning on medical images)
Beam & Kohane. JAMA 2018;319:1317-1318.
2. Wanted: evidence
• Kleinrouweler (AJOG 2016): 263 models in obstetrics
• Only 23 of these (9%) had been externally validated…
Other examples of model overload:
• 1060 models predicting outcomes after CVD (1990-2015) (Wessler et al, 2017)
• 363 models predicting CVD (Damen et al, 2016)
• 231 models related to Covid-19 (Wynants et al, 2020), and counting!
• 116 models to diagnose ovarian malignancy (Kaijser et al, 2014)
15
Wessler et al. Diagn Progn Res 2017;1:20. Damen et al. BMJ 2016;353:i2416. Wynants et al. BMJ 2020;369:m1328.
Kleinrouweler et al. AJOG 2016;214:79-90. Kaijser et al. Hum Reprod Update 2014;20:229-62.
Smartphone apps for skin lesions
16
Freeman et al. BMJ 2020;368:m127
• 9 validation studies covering 6 apps
• 1132 lesions in total (average 126 per study)
• Methodological quality was poor
o Selective inclusions (non-representative)
o Images were taken and selected by clinicians
o Lots of unusable images
Scarce and poor evidence
Radiology AI
17
Van Leeuwen et al. Eur Radiol 2021;31:3797-3804
• 64/100: no evidence
• 18/100: evidence of diagnostic performance
• 18/100: evidence of potential impact
• Half of the studies were independent, the other half had conflicts of interest
3. Expect (a lot of) heterogeneity
18
• Changes in care over time
• Differences in care between healthcare systems
• Differences in populations between practices/hospitals/regions
• Differences in hardware, software, and measurement procedures
• Differences in performance between patient subgroups (cf fairness)
Futoma et al. Lancet Digit Health 2020;2:e489-e492.
19
https://www.unite.ai/andrew-ng-criticizes-the-culture-of-overfitting-in-machine-learning/.
https://www.youtube.com/watch?v=Gbnep6RJinQ
Procedural heterogeneity
20
Agniel et al. BMJ 2018;360:k1479.
Hardware/software
21
Badgeley et al. npj Digit Med 2019;2:31.
Deep learning was better at predicting scanner model and brand
(AUC>=0.98) than at predicting hip fracture (AUC 0.78)
Where do DL datasets come from anyway?
22
Kaushal et al. JAMA 2020;324:1212-1213.
Implications?
23
Van Calster et al. BMC Med 2019;17:230.
THERE IS NO SUCH THING AS A ‘VALIDATED’ MODEL
DL research (Sep 2021)
24
Perkonigg et al. Nat Comm 2021;12:5678.
4. Proprietary datasets and models
25
Van Calster et al. JAMIA 2019;26:1651-1654.
https://hai.stanford.edu/news/flying-dark-hospital-ai-tools-arent-well-documented.
Not necessarily bad in principle: financial resources are needed
But it may hamper openness, availability, independent validation
COVID review: companies often did not react, but claimed that the model
was used on thousands of patients
Google’s Dermatology Assist (CE label)
26
https://www.bbc.com/news/technology-57157566.
May 18th, 2021
Google’s Dermatology Assist (CE label)
27
https://www.statnews.com/2021/06/02/machine-learning-ai-methodology-research-flaws/.
Roxana Daneshjou (Stanford):
- No evaluation on external dataset.
- Insufficient variation in skin types.
- Outcome rarely based on biopsy.
- “I haven't seen data that makes me feel
comfortable with putting this in the hands of
patients or physicians.”
External validation of EPIC sepsis model
28
Wong et al. JAMA Intern Med 2021;181:1065-1070.
Model: penalized logistic regression with 80 variables
Data: 3 healthcare organizations, 2013-2015
AUC according to internal documentation: 0.78-0.83
Validation: 1 academic center, 2018-2019
AUC 0.63, calibration poor (risks way too high)
5. Actual implementation
29
Logistical/practical issues to fit model in clinical workflow
Psychological issues regarding model use by healthcare staff
Medicolegal: Who is responsible when prediction is wrong?
https://www.statnews.com/2020/03/09/can-you-sue-artificial-intelligence-algorithm-for-malpractice/
Panch et al. npj Digit Med 2019;2:77.
Lack of evidence revisited: impact?
30
Clinical impact studies: scarce, difficult
Clinical decision support is a complex intervention (Kappen et al, 2018)
Endpoints of impact studies?
- Process-related: ‘easy’, but intermediate
- Long-term patient outcomes: difficult, lower effect sizes expected
Kappen et al. Diagn Progn Res 2018.
So, does medical AI ‘work’?
31
We still often don’t know!
Trust jeopardized by
- poor methodology
- lack of evidence
- lack of openness.
It may have potential if done well and evidence is gathered.
AI community / academia often shoots itself in the foot, this is a pity
Academia: wrong incentives (publish or perish)!
Companies: financial conflicts of interest!
That’s (not) all folks…
32
https://www.technologyreview.com/2019/06/06/239031/training-a-single-ai-model-can-emit-as-much-carbon-as-five-cars-in-their-lifetimes/.
https://spectrum.ieee.org/deep-learning-computational-cost
Thompson et al. IEEE Spectrum 2021.
Hao. MlT Technology review 2019.

Contenu connexe

Tendances

Clinical prediction models: development, validation and beyond
Clinical prediction models:development, validation and beyondClinical prediction models:development, validation and beyond
Clinical prediction models: development, validation and beyondMaarten van Smeden
 
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019Ewout Steyerberg
 
The basics of prediction modeling
The basics of prediction modeling The basics of prediction modeling
The basics of prediction modeling Maarten van Smeden
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkStats Statswork
 
Is it causal, is it prediction or is it neither?
Is it causal, is it prediction or is it neither?Is it causal, is it prediction or is it neither?
Is it causal, is it prediction or is it neither?Maarten van Smeden
 
Introduction to prediction modelling - Berlin 2018 - Part II
Introduction to prediction modelling - Berlin 2018 - Part IIIntroduction to prediction modelling - Berlin 2018 - Part II
Introduction to prediction modelling - Berlin 2018 - Part IIMaarten van Smeden
 
Evaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk predictionEvaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk predictionEwout Steyerberg
 
Thoughts on Machine Learning and Artificial Intelligence
Thoughts on Machine Learning and Artificial IntelligenceThoughts on Machine Learning and Artificial Intelligence
Thoughts on Machine Learning and Artificial IntelligenceMaarten van Smeden
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for HealthcareChandan Reddy
 
Development and evaluation of prediction models: pitfalls and solutions
Development and evaluation of prediction models: pitfalls and solutionsDevelopment and evaluation of prediction models: pitfalls and solutions
Development and evaluation of prediction models: pitfalls and solutionsMaarten van Smeden
 
Day 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in HealthcareDay 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in HealthcareAseda Owusua Addai-Deseh
 
Introduction to prediction modelling - Berlin 2018 - Part I
Introduction to prediction modelling - Berlin 2018 - Part IIntroduction to prediction modelling - Berlin 2018 - Part I
Introduction to prediction modelling - Berlin 2018 - Part IMaarten van Smeden
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkStats Statswork
 
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...GaryCollins74
 
Why the EPV≥10 sample size rule is rubbish and what to use instead
Why the EPV≥10 sample size rule is rubbish and what to use instead Why the EPV≥10 sample size rule is rubbish and what to use instead
Why the EPV≥10 sample size rule is rubbish and what to use instead Maarten van Smeden
 
Data science in health care
Data science in health careData science in health care
Data science in health careChetan Khanzode
 
Regression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questionsRegression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questionsMaarten van Smeden
 
Machine learning in health data analytics and pharmacovigilance
Machine learning in health data analytics and pharmacovigilanceMachine learning in health data analytics and pharmacovigilance
Machine learning in health data analytics and pharmacovigilanceRevathi Boyina
 

Tendances (20)

Clinical prediction models: development, validation and beyond
Clinical prediction models:development, validation and beyondClinical prediction models:development, validation and beyond
Clinical prediction models: development, validation and beyond
 
P-values in crisis
P-values in crisisP-values in crisis
P-values in crisis
 
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
 
Clinical prediction models
Clinical prediction modelsClinical prediction models
Clinical prediction models
 
The basics of prediction modeling
The basics of prediction modeling The basics of prediction modeling
The basics of prediction modeling
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - Statswork
 
Is it causal, is it prediction or is it neither?
Is it causal, is it prediction or is it neither?Is it causal, is it prediction or is it neither?
Is it causal, is it prediction or is it neither?
 
Introduction to prediction modelling - Berlin 2018 - Part II
Introduction to prediction modelling - Berlin 2018 - Part IIIntroduction to prediction modelling - Berlin 2018 - Part II
Introduction to prediction modelling - Berlin 2018 - Part II
 
Evaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk predictionEvaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk prediction
 
Thoughts on Machine Learning and Artificial Intelligence
Thoughts on Machine Learning and Artificial IntelligenceThoughts on Machine Learning and Artificial Intelligence
Thoughts on Machine Learning and Artificial Intelligence
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for Healthcare
 
Development and evaluation of prediction models: pitfalls and solutions
Development and evaluation of prediction models: pitfalls and solutionsDevelopment and evaluation of prediction models: pitfalls and solutions
Development and evaluation of prediction models: pitfalls and solutions
 
Day 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in HealthcareDay 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in Healthcare
 
Introduction to prediction modelling - Berlin 2018 - Part I
Introduction to prediction modelling - Berlin 2018 - Part IIntroduction to prediction modelling - Berlin 2018 - Part I
Introduction to prediction modelling - Berlin 2018 - Part I
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - Statswork
 
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
 
Why the EPV≥10 sample size rule is rubbish and what to use instead
Why the EPV≥10 sample size rule is rubbish and what to use instead Why the EPV≥10 sample size rule is rubbish and what to use instead
Why the EPV≥10 sample size rule is rubbish and what to use instead
 
Data science in health care
Data science in health careData science in health care
Data science in health care
 
Regression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questionsRegression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questions
 
Machine learning in health data analytics and pharmacovigilance
Machine learning in health data analytics and pharmacovigilanceMachine learning in health data analytics and pharmacovigilance
Machine learning in health data analytics and pharmacovigilance
 

Similaire à Str-AI-ght to heaven? Pitfalls for clinical decision support based on AI

Validation of Clinical Artificial Intelligence: Where We Are and Where We Are...
Validation of Clinical Artificial Intelligence: Where We Are and Where We Are...Validation of Clinical Artificial Intelligence: Where We Are and Where We Are...
Validation of Clinical Artificial Intelligence: Where We Are and Where We Are...Sean Manion PhD
 
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...Jake Chen
 
Clinical Research Informatics Year-in-Review - 2023
Clinical Research Informatics Year-in-Review - 2023Clinical Research Informatics Year-in-Review - 2023
Clinical Research Informatics Year-in-Review - 2023Peter Embi
 
ai-in-healthcare-202011-201117103639.pptx
ai-in-healthcare-202011-201117103639.pptxai-in-healthcare-202011-201117103639.pptx
ai-in-healthcare-202011-201117103639.pptxssuser6b571f
 
AI in Healthcare
AI in HealthcareAI in Healthcare
AI in HealthcarePaul Agapow
 
The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...
The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...
The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...marcus evans Network
 
Νικόλαος Κουρεντζής, 8th MedTech Conference
Νικόλαος Κουρεντζής, 8th MedTech ConferenceΝικόλαος Κουρεντζής, 8th MedTech Conference
Νικόλαος Κουρεντζής, 8th MedTech ConferenceStarttech Ventures
 
Decentralized trials white paper by Andaman7
Decentralized trials white paper by Andaman7Decentralized trials white paper by Andaman7
Decentralized trials white paper by Andaman7Lio Naveau
 
Big data in research: possibilities and pitfalls
Big data in research: possibilities and pitfallsBig data in research: possibilities and pitfalls
Big data in research: possibilities and pitfallsJoppe Nijman
 
Possibilities and pitfalls of AI in PICU
Possibilities and pitfalls of AI in PICUPossibilities and pitfalls of AI in PICU
Possibilities and pitfalls of AI in PICUJoppe Nijman
 
인공지능 논문작성과 심사에관한요령
인공지능 논문작성과 심사에관한요령인공지능 논문작성과 심사에관한요령
인공지능 논문작성과 심사에관한요령Namkug Kim
 
The impact of different sources of heterogeneity on loss of accuracy from gen...
The impact of different sources of heterogeneity on loss of accuracy from gen...The impact of different sources of heterogeneity on loss of accuracy from gen...
The impact of different sources of heterogeneity on loss of accuracy from gen...Levi Waldron
 
ML, biomedical data & trust
ML, biomedical data & trustML, biomedical data & trust
ML, biomedical data & trustPaul Agapow
 
Digital pathology in developing country
Digital pathology in developing countryDigital pathology in developing country
Digital pathology in developing countryDr. Ashish lakhey
 
Research in the time of Covid: Surveying impacts on Early Career Researchers
Research in the time of Covid: Surveying impacts on Early Career ResearchersResearch in the time of Covid: Surveying impacts on Early Career Researchers
Research in the time of Covid: Surveying impacts on Early Career ResearchersRebecca Grant
 
Where AI will (and won't) revolutionize biomedicine
Where AI will (and won't) revolutionize biomedicineWhere AI will (and won't) revolutionize biomedicine
Where AI will (and won't) revolutionize biomedicinePaul Agapow
 
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...Thien Q. Tran
 
Cancer tissue evaluation.pptx
Cancer tissue evaluation.pptxCancer tissue evaluation.pptx
Cancer tissue evaluation.pptxKerenEvangelineI
 

Similaire à Str-AI-ght to heaven? Pitfalls for clinical decision support based on AI (20)

Validation of Clinical Artificial Intelligence: Where We Are and Where We Are...
Validation of Clinical Artificial Intelligence: Where We Are and Where We Are...Validation of Clinical Artificial Intelligence: Where We Are and Where We Are...
Validation of Clinical Artificial Intelligence: Where We Are and Where We Are...
 
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
 
Clinical Research Informatics Year-in-Review - 2023
Clinical Research Informatics Year-in-Review - 2023Clinical Research Informatics Year-in-Review - 2023
Clinical Research Informatics Year-in-Review - 2023
 
ai-in-healthcare-202011-201117103639.pptx
ai-in-healthcare-202011-201117103639.pptxai-in-healthcare-202011-201117103639.pptx
ai-in-healthcare-202011-201117103639.pptx
 
20190820 deepest
20190820 deepest 20190820 deepest
20190820 deepest
 
AI in Healthcare
AI in HealthcareAI in Healthcare
AI in Healthcare
 
fnano-04-972421.pdf
fnano-04-972421.pdffnano-04-972421.pdf
fnano-04-972421.pdf
 
The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...
The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...
The Randomized Controlled Trial: The Gold Standard of Clinical Science and a ...
 
Νικόλαος Κουρεντζής, 8th MedTech Conference
Νικόλαος Κουρεντζής, 8th MedTech ConferenceΝικόλαος Κουρεντζής, 8th MedTech Conference
Νικόλαος Κουρεντζής, 8th MedTech Conference
 
Decentralized trials white paper by Andaman7
Decentralized trials white paper by Andaman7Decentralized trials white paper by Andaman7
Decentralized trials white paper by Andaman7
 
Big data in research: possibilities and pitfalls
Big data in research: possibilities and pitfallsBig data in research: possibilities and pitfalls
Big data in research: possibilities and pitfalls
 
Possibilities and pitfalls of AI in PICU
Possibilities and pitfalls of AI in PICUPossibilities and pitfalls of AI in PICU
Possibilities and pitfalls of AI in PICU
 
인공지능 논문작성과 심사에관한요령
인공지능 논문작성과 심사에관한요령인공지능 논문작성과 심사에관한요령
인공지능 논문작성과 심사에관한요령
 
The impact of different sources of heterogeneity on loss of accuracy from gen...
The impact of different sources of heterogeneity on loss of accuracy from gen...The impact of different sources of heterogeneity on loss of accuracy from gen...
The impact of different sources of heterogeneity on loss of accuracy from gen...
 
ML, biomedical data & trust
ML, biomedical data & trustML, biomedical data & trust
ML, biomedical data & trust
 
Digital pathology in developing country
Digital pathology in developing countryDigital pathology in developing country
Digital pathology in developing country
 
Research in the time of Covid: Surveying impacts on Early Career Researchers
Research in the time of Covid: Surveying impacts on Early Career ResearchersResearch in the time of Covid: Surveying impacts on Early Career Researchers
Research in the time of Covid: Surveying impacts on Early Career Researchers
 
Where AI will (and won't) revolutionize biomedicine
Where AI will (and won't) revolutionize biomedicineWhere AI will (and won't) revolutionize biomedicine
Where AI will (and won't) revolutionize biomedicine
 
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
 
Cancer tissue evaluation.pptx
Cancer tissue evaluation.pptxCancer tissue evaluation.pptx
Cancer tissue evaluation.pptx
 

Dernier

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 

Dernier (20)

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 

Str-AI-ght to heaven? Pitfalls for clinical decision support based on AI

  • 1. Str-AI-ght to heaven? Pitfalls for clinical decision support based on AI Ben Van Calster Department Development and Regeneration and EPI-centre, KU Leuven Department Biomedical Data Sciences, LUMC Leiden Research Ethics Committee, UZ Leuven ben.vancalster@kuleuven.be; @BenVanCalster ISUOG World Congress, 16 October 2021
  • 2. Disclaimer • Talk last year: “a plea for good methodology” • This talk builds on that, in the context of AI and machine learning • There is a lot of hype surrounding AI/ML. It may have potential, but we better start to get real! 2 https://lawtomated.com/enough-with-the-a-i-hype-and-why/ Lawtomated
  • 3. Do not celebrate too early… 3 Copyright Bas Czerwinski / Getty Images Julian Alaphilippe, Liège-Bastogne-Liège (Oct. 4th, 2020) Real winner: Primož Roglič Real winner
  • 4. Deep learning on medical images 4 Topol. Nat Med 2019;25:44-56. Zhu et al. Front Neurol 2019;10:869. Titano et al. Nat Med 2018;24:1337-41; Nam et al. Radiology 2019;290:218-28; Ehteshami Bejnordi et al. JAMA 2017;318:2199-210; Esteva et al. Nature 2017;542:115-8; De Fauw et al. Nat Med 2018;24:1342-50; Raman et al. Eye 2019;33:97-109.
  • 5. Machine Learning for ‘EHR’ data 5 Rajkomar et al. Npj Digit Med 2018;1:18. Rose. JAMA Netw Open 2018;1:e181404.
  • 6. Reason for popularity? 6 “Very complex machine learning algorithms are highly flexible, and hence find relationships we could not see before. Therefore we make better predictions and better decisions.” → Guaranteed success! Right?
  • 7. Pitfalls for “predictive analytics” 7  1. Poor methodology  2. Lack of evidence  3. Considerable heterogeneity  4. (Financial) conflicts of interest  5. Actual implementation in clinical practice
  • 8. 1. Methodology matters, not impact factors 8 Altman DG. BMJ 1994;308:283-284. Van Calster et al, J Clin Epidemiol, in press. Altman. BMJ 1994. Our own frustration paper. JCE 2021.
  • 9. ‘Predictive analytics’: covid-19 9 Wynants et al. BMJ 2020;369:m1328. The review found more than 1 paper a day (!) Results not trustworthy for 97% of the 231 models Median sample size: 338 Non-representative sample: 42% Representativity unclear: 25% Data analysis problematic: 94% No model validation at all: 22%
  • 10. Predictive analytics for covid-19 10 Wynants et al. BMJ 2020;369:m1328 Deep learning models for covid-19 diagnosis using CT or RX - No discussion of target population or setting - Control group (without covid-19):  Images from pediatric population  Images from a different country  Images from different time periods  Barely defined, e.g. ‘healthy persons’ - Images from online repository, without further information - Often not any demographic description (not even age or sex!)
  • 11. Covid-19 deep learning: deep failure! 11 Roberts et al. Nat Mach Intell 2021;3:199-217.
  • 12. Public covid-19 RX datasets 12 Santa Cruz et al. Med Image Analysis 2021;74:102225.
  • 13. Complex algorithms are data hungry So you dream of having a Porsche? If you cannot (or don’t want to) pay for it, you may get this... This also holds for predictive analytics. More fancy model? More expensive. Currency: GOOD data. 13
  • 14. Measurement and data quality 14 Missing values: the tricky importance of the invisible Measurement: timing and procedure matters Outcome: quality labels are key (see e.g. deep learning on medical images) Beam & Kohane. JAMA 2018;319:1317-1318.
  • 15. 2. Wanted: evidence • Kleinrouweler (AJOG 2016): 263 models in obstetrics • Only 23 of these (9%) had been externally validated… Other examples of model overload: • 1060 models predicting outcomes after CVD (1990-2015) (Wessler et al, 2017) • 363 models predicting CVD (Damen et al, 2016) • 231 models related to Covid-19 (Wynants et al, 2020), and counting! • 116 models to diagnose ovarian malignancy (Kaijser et al, 2014) 15 Wessler et al. Diagn Progn Res 2017;1:20. Damen et al. BMJ 2016;353:i2416. Wynants et al. BMJ 2020;369:m1328. Kleinrouweler et al. AJOG 2016;214:79-90. Kaijser et al. Hum Reprod Update 2014;20:229-62.
  • 16. Smartphone apps for skin lesions 16 Freeman et al. BMJ 2020;368:m127 • 9 validation studies covering 6 apps • 1132 lesions in total (average 126 per study) • Methodological quality was poor o Selective inclusions (non-representative) o Images were taken and selected by clinicians o Lots of unusable images Scarce and poor evidence
  • 17. Radiology AI 17 Van Leeuwen et al. Eur Radiol 2021;31:3797-3804 • 64/100: no evidence • 18/100: evidence of diagnostic performance • 18/100: evidence of potential impact • Half of the studies were independent, the other half had conflicts of interest
  • 18. 3. Expect (a lot of) heterogeneity 18 • Changes in care over time • Differences in care between healthcare systems • Differences in populations between practices/hospitals/regions • Differences in hardware, software, and measurement procedures • Differences in performance between patient subgroups (cf fairness) Futoma et al. Lancet Digit Health 2020;2:e489-e492.
  • 20. Procedural heterogeneity 20 Agniel et al. BMJ 2018;360:k1479.
  • 21. Hardware/software 21 Badgeley et al. npj Digit Med 2019;2:31. Deep learning was better at predicting scanner model and brand (AUC>=0.98) than at predicting hip fracture (AUC 0.78)
  • 22. Where do DL datasets come from anyway? 22 Kaushal et al. JAMA 2020;324:1212-1213.
  • 23. Implications? 23 Van Calster et al. BMC Med 2019;17:230. THERE IS NO SUCH THING AS A ‘VALIDATED’ MODEL
  • 24. DL research (Sep 2021) 24 Perkonigg et al. Nat Comm 2021;12:5678.
  • 25. 4. Proprietary datasets and models 25 Van Calster et al. JAMIA 2019;26:1651-1654. https://hai.stanford.edu/news/flying-dark-hospital-ai-tools-arent-well-documented. Not necessarily bad in principle: financial resources are needed But it may hamper openness, availability, independent validation COVID review: companies often did not react, but claimed that the model was used on thousands of patients
  • 26. Google’s Dermatology Assist (CE label) 26 https://www.bbc.com/news/technology-57157566. May 18th, 2021
  • 27. Google’s Dermatology Assist (CE label) 27 https://www.statnews.com/2021/06/02/machine-learning-ai-methodology-research-flaws/. Roxana Daneshjou (Stanford): - No evaluation on external dataset. - Insufficient variation in skin types. - Outcome rarely based on biopsy. - “I haven't seen data that makes me feel comfortable with putting this in the hands of patients or physicians.”
  • 28. External validation of EPIC sepsis model 28 Wong et al. JAMA Intern Med 2021;181:1065-1070. Model: penalized logistic regression with 80 variables Data: 3 healthcare organizations, 2013-2015 AUC according to internal documentation: 0.78-0.83 Validation: 1 academic center, 2018-2019 AUC 0.63, calibration poor (risks way too high)
  • 29. 5. Actual implementation 29 Logistical/practical issues to fit model in clinical workflow Psychological issues regarding model use by healthcare staff Medicolegal: Who is responsible when prediction is wrong? https://www.statnews.com/2020/03/09/can-you-sue-artificial-intelligence-algorithm-for-malpractice/ Panch et al. npj Digit Med 2019;2:77.
  • 30. Lack of evidence revisited: impact? 30 Clinical impact studies: scarce, difficult Clinical decision support is a complex intervention (Kappen et al, 2018) Endpoints of impact studies? - Process-related: ‘easy’, but intermediate - Long-term patient outcomes: difficult, lower effect sizes expected Kappen et al. Diagn Progn Res 2018.
  • 31. So, does medical AI ‘work’? 31 We still often don’t know! Trust jeopardized by - poor methodology - lack of evidence - lack of openness. It may have potential if done well and evidence is gathered. AI community / academia often shoots itself in the foot, this is a pity Academia: wrong incentives (publish or perish)! Companies: financial conflicts of interest!
  • 32. That’s (not) all folks… 32 https://www.technologyreview.com/2019/06/06/239031/training-a-single-ai-model-can-emit-as-much-carbon-as-five-cars-in-their-lifetimes/. https://spectrum.ieee.org/deep-learning-computational-cost Thompson et al. IEEE Spectrum 2021. Hao. MlT Technology review 2019.