SlideShare une entreprise Scribd logo
1  sur  56
Leveraging Machine Learning
Techniques
Predictive Analytics for Knowledge Discovery in Radiology
Barbaros Selnur Erdal, PhD
Luciano M.S Prevedello, M.D. MPH
Kevin Mader, PhD
Joshy Cyriac
Bram Stieltjes, M.D. PhD
Materials
a.Slides
i. http://bit.ly/2zK0qFm
b.KNIME + Workflows + Data (zip file to extract)
i. Not required for the lab computers but needed at home (Windows 7 and above)
1.https://www.dropbox.com/s/3fcjvr0lfxfzmgd/knime_3.4.1.zip?dl=0
ii.Just the workflows - for Mac and Linux requires KNIME (free from knime.com)
1.https://www.dropbox.com/s/rjp5qmb56q9fjfr/PredictiveAnalytics.knar?dl=0
c.Kaggle Competition
i. http://bit.ly/2zJMVps
Learning Objectives (from RSNA Abstract)
1. Review the basic principles of predictive analytics.
2. Be exposed to some of the existing validation methodologies to test predictive
models.
3. Understand how to incorporate radiology data sources (PACS, RIS, etc) into
predictive modeling
4. Learn how to interpret results and make visualizations.
Outline
• Introduction / Starting KNIME (Kevin)
• Why ML and Predictive Analytics are important? (Luciano)
• Framework Overview (Kevin)
• Value Prop, Decision, ML Task
• Data Sources
• Collecting Data - Preprocessing
• Collecting Data from PACS
• Features
• Data Wrangling
• Building Models
• From Double-Blind to Competitions
• Conclusion / Outlook (Luciano/Selnur)
Python might be good, but ...
• Devices, Sessions, Graphs, Ops, Context
Managers?
• tf.Session(config = tf.ConfigProto(gpu_options =
tf.GPUOptions(per_process_gpu_memory_fraction = 0.5)))
• Code gets messy very quickly
• Poor variable names
• Minimal documentation
• Custom functions / scattered .py
• Multiple library versions
KNIME + Workflows
• Medical workflows are complicated involving a large number of steps
• We want transparent, reproducible pipelines for running analysis in
research and production settings
Should I learn KNIME?
• Supports
• Matlab, R, Python scripts
• Java code snippets
• Writing your own plugins (Eclipse)
• Natural Language Processing
• Image Processing (full ImageJ / FIJI support, ImgLib2 integration)
• Machine Learning Models (WEKA, scikit-learn, Decision Trees,
PMML)
• Deep Learning (DL4J, Keras model import, and full keras support
coming)
• JavaScript Visualization
• Report Generation
• Excel Input / Output
• Database connectivity
Notes
• Please do not save workflows since the class tomorrow needs them
• You will need to change the path in
Why Predictive Analytics?
Benefits of Being Analytical
• Guide through turbulent times
• Improve decision making - Know what is working vs not
• Manage risks
• Improve Quality
• Cut cost – increase efficiency
• Anticipate change – competitive advantage
Is now the right time?
• YES!!!
• Fee for service to value-based Care
• Analytics - Guide through turbulent times
Value
Cost
Manage risks
Improve decision making
Improve Quality
Cut cost
Increase
efficiency
Appointment No-Shows
https://www.kaggle.com/joniarroba/noshowappointments/
• 80% accuracy based on Age, Time of Day, Disease
Indian Health Statistics
https://www.kaggle.com/rajanand/key-indicators-of-annual-health-survey
• Chronic disease likelihood based
on region, age,
• Establish better baseline
statistics for patients to be more
efficient with diagnostics
Canvas (machinelearningcanvas.com / louis@dorard.me)
Moveimportant incoming
emails to a dedicated section at
the top of the inbox
We want to be ableto answerthe
question
“Is this email important?”
beforethe usergets a chance to
see the email
• Input: email
• Output: “Important”
(Positiveclass) or “Regular”
-> Binary Classification
Makeit easierfor users of an
email client to identify
new important emails in their
inbox, by automatically
detecting themand making
them more visible in the inbox
(this detection must happen
beforeusersees email)
The objectiveis that users spend
less themin theirinbox and
reply to important emails more
quickly
• Previous email messages (as
mbox files or in othertypeof
database)
• Address book
• Calendar
• Explicit labelling: users can
manuallylabel emails as
important or not, by clicking
on an icon next to each
email’s subject
• Implicit labelling: heuristics
based on user behaviorafter
getting the email (e.g. replying
fast, deleting without reading,
etc.)
Every time we receive an email
addressed to our user, which
starts a new thread (otherwise
the importance is just the same
as that of the thread)
We aim to rapidly deliverthe
email in the right section of the
inbox, within a 2s period
Use last 3 months of emails for
test and 12 months beforefor
training. We makePI option
availableto user if…
• Cost < baseline heuristic (e.g.
“if senderin address book
then important”): FP costs 1,
FN costs 3
• No morethan 1 errorper X
emails
One model per user, initially
built on last 12 months of email
data, that we update…
• When an error is signaled by
the user via manual labelling
• Every 5’ by adding new data
from implicit labelling, if any
Perweek:
• Ratio: #errors explicitly signaled by user/ #emails received
• Same w. errors seen via implicit labelling
• Averagetime taken to reply to important emails
• Total time spent on inbox
Priority Inbox (PI) Louis Dorard Jan. 2017 1
• Content features: subject,
body, attachments, size
• Social features: based on info
about sender (e.g. in address
book?), previous interactions,
contextual (e.g. upcoming
meeting w. sender)
• Email labels (typically
assigned via manual rules
defined by user)
Our Goal
1. NSCLC Patients have a large number of scans of the course of their visit to a
hospital.
a. We want to predict which scans a patient will have after diagnosis to try and minimize the
number of required visits.
b. Go from a collection with metadata from over 60K scans to a model
Value Proposition
• We want to schedule Radiologists better at our Lung Cancer
Center so we have faster turn around times at lower cost
Decision
• How many scans to we expect that will need to be read?
• How many radiologists need to be on duty in a given week?
ML Task
• Given a patient history and diagnosis predict the number of
future scans
• Input:
• Patient History
• Patient Information
• Output
• Number of CT scans needed post diagnosis
Data Sources
1.This course
a. CSV File of Scans with DICOM Headers
b. CSV File extracted from Tumor Board
2. Your own hospital
a. PACS
b. Tumor Board
c. Other Interesting Sources
i. RIS Reports
ii. Pathology Reports
Reading in CSV Files
Collecting Data
• This Course
• Data is already prepared we just have to join it
• At your hospital
• Take list of patients from Tumor Board
• Find all scans for each patient in PACS
• Extract DICOM Header as Table
Collecting Data (PACS)
• Beyond this course
• DCMTK
• Python
• https://github.com/joshy/pypacscrawler
• RCC42C - Joshy Cyriac - Open Source Tools for
Rapidly Indexing, Searching, and Processing
Image Data from the PACS
Collecting Data
• This Course
• We read in scans from a PACS Output
• We read a list of patients from a Tumor Board Output
• We join the two tables on Patient ID
• We convert strings into dates
Features
• High Value Features
• Number of previous scans (hypothesis that having more scans before mean
there is less need to scan later)
• Age (older patients will have more complications?)
• Gender (could be gender differences)
• Interesting Features
• Referring Physician (maybe some physicians order more scans than others)
• Institution (some hospitals might order more scans than others)
• No
• Accession Number, Patient ID, Patient Name
Pivoting / Pivot Tables
1. Many times the data we want isn’t in the right format
a. We have a fully expanded list of scans
b. We want the number of unique studies per patient organized by scan type
c. This requires a number of different operations
Pivoting / Pivot Tables
Pivoting / Pivot Tables
Pivoting / Pivot Tables
Groups
Pivots
Aggregation
Pivoting / Pivot Tables
Pivoting / Pivot Tables
Other Pivots
Other Pivots
Features
• High Value Features
• Number of previous scans (hypothesis that having more scans before mean
there is less need to scan later)
• Age (older patients will have more complications?)
• Gender (could be gender differences)
• Interesting Features
• Referring Physician (maybe some physicians order more scans than others)
• Institution (some hospitals might order more scans than others)
• No
• Accession Number, Patient ID, Patient Name
Offline Evaluations
• Predict the number of scans
• Penalize the wrong number of scans linearly
• Even better
• Predict the number of scans per week
• Penalize by radiologist hour mismatch per week
Making Predictions
• As a patient is diagnosed with NSCLC gather the whole patient
history
• Predict the number of scans required in the future to plan better
for capacity
Building a Model
1.Models
a. Partitioning
i. Training Data
ii. Validation Data
b. Model Selection
i. Model
Representation
c. Scoring
i. Confusion Matrix
ii. R^2
iii. ROC Curve
What is Validation
Exploring (http://playground.tensorflow.org)
Processing Data
● Pipeline / Workflow
○ Data processing should be a clearly defined, transparent workflow
○ Where is data read from
○ How can it be combined (Patient ID? KIS ID? Accession Number?)
○ Which fields/columns should be transformed and how
○ How can it be reorganized (pivoting)
○ How can we apply this to any new data and make it clear for people unfamiliar
Final Result
Applying to new data
● We have built a model and tested it a bit
● Now we want to apply it to some new data
● We can take the entire workflow and make a ‘meta-node’ out of it (Node of Nodes)
Train, Validation, Test
● We have a training and testing dataset
● We partition the training into a training set and a validation
Saving Predictions on Test (CSV Writer)
● The CSV Writer node will export the table from KNIME to a
CSV File.
● The node has to be reconfigured in order to export the results
so right click and select Configure…
● Then click the Browse… button and save the file on the
desktop
From Double Blind to Challenges
Our ‘In-Class’ Competition http://bit.ly/2zJMVps
Sign up yourself or
Guest Account
Username: rsna2017
Password: rsna2017
(but you won’t be on
The leaderboard and
your results will be
deleted and only the
first 10 can use it)
Submit Predictions http://bit.ly/2zJMVps
Select Submit
Predictions
Other Models
● Random Forest Regressor (not classifier)
○ Replace the Learner and Predictors (both)
https://www.dropbox.com/s/pfgz0z8kt6tbdcw
/AdvancedWorkflows.knar?dl=0
Review Important Points
● Clearly defined Goal
○ Predict a category
■ Classification (disease type, high/low risk patients)
○ Predict a number
■ Regression (risk factor, life expectancy, treatment dose, number of scans)
○ Think about workflow integration
■ Predicting the past isn’t helpful
○ What is accuracy?
● Collecting and Organizing Data
○ Pipeline Thinking
○ Finding a representative data-set
● Deciding on a validation strategy
○ Train / Test split
○ Cross-validation
● Evaluating Outcomes / Improving Models
Above and Beyond
● Just the beginning of predictive analytics and visualization
● Here are some other things we can do with the data
○ Timeline
■ Look at the timeline of events that happen to a given patient
○ Different scan types
■ What things are more likely / less likely
Visualizing Timelines
Visualizing Data by Scan Type

Contenu connexe

Tendances

AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...ijsc
 
ENHANCED SYSTEM FOR COMPUTER-AIDED DETECTION OF MRI BRAIN TUMORS
ENHANCED SYSTEM FOR COMPUTER-AIDED DETECTION OF MRI BRAIN TUMORSENHANCED SYSTEM FOR COMPUTER-AIDED DETECTION OF MRI BRAIN TUMORS
ENHANCED SYSTEM FOR COMPUTER-AIDED DETECTION OF MRI BRAIN TUMORSsipij
 
Brain Tumor Detection using Neural Network
Brain Tumor Detection using Neural NetworkBrain Tumor Detection using Neural Network
Brain Tumor Detection using Neural Networkijtsrd
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcarePaolo Missier
 
Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...Allen Day, PhD
 
Plant Leaf Disease Analysis using Image Processing Technique with Modified SV...
Plant Leaf Disease Analysis using Image Processing Technique with Modified SV...Plant Leaf Disease Analysis using Image Processing Technique with Modified SV...
Plant Leaf Disease Analysis using Image Processing Technique with Modified SV...Tarun Kumar
 
Identification of Disease in Leaves using Genetic Algorithm
Identification of Disease in Leaves using Genetic AlgorithmIdentification of Disease in Leaves using Genetic Algorithm
Identification of Disease in Leaves using Genetic Algorithmijtsrd
 
IRJET - Disease Detection in Plant using Machine Learning
IRJET -  	  Disease Detection in Plant using Machine LearningIRJET -  	  Disease Detection in Plant using Machine Learning
IRJET - Disease Detection in Plant using Machine LearningIRJET Journal
 
deep learning applications in medical image analysis brain tumor
deep learning applications in medical image analysis brain tumordeep learning applications in medical image analysis brain tumor
deep learning applications in medical image analysis brain tumorVenkat Projects
 
IRJET- Plant Leaf Disease Detection using Image Processing
IRJET- Plant Leaf Disease Detection using Image ProcessingIRJET- Plant Leaf Disease Detection using Image Processing
IRJET- Plant Leaf Disease Detection using Image ProcessingIRJET Journal
 
2020 ssi-distinguished-projects
2020 ssi-distinguished-projects2020 ssi-distinguished-projects
2020 ssi-distinguished-projectsAnne Lee
 
Preprocessing and Classification in WEKA Using Different Classifiers
Preprocessing and Classification in WEKA Using Different ClassifiersPreprocessing and Classification in WEKA Using Different Classifiers
Preprocessing and Classification in WEKA Using Different ClassifiersIJERA Editor
 
A Novel Approach for Tomato Diseases Classification Based on Deep Convolution...
A Novel Approach for Tomato Diseases Classification Based on Deep Convolution...A Novel Approach for Tomato Diseases Classification Based on Deep Convolution...
A Novel Approach for Tomato Diseases Classification Based on Deep Convolution...Mohammad Shakirul islam
 
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...Sunil Nair
 
Classification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining TechniquesClassification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining Techniquesinventionjournals
 
Segmentation of unhealthy region of plant leaf using image processing techniques
Segmentation of unhealthy region of plant leaf using image processing techniquesSegmentation of unhealthy region of plant leaf using image processing techniques
Segmentation of unhealthy region of plant leaf using image processing techniqueseSAT Journals
 
An Exploration on the Identification of Plant Leaf Diseases using Image Proce...
An Exploration on the Identification of Plant Leaf Diseases using Image Proce...An Exploration on the Identification of Plant Leaf Diseases using Image Proce...
An Exploration on the Identification of Plant Leaf Diseases using Image Proce...Tarun Kumar
 
IRJET - A Review on Identification and Disease Detection in Plants using Mach...
IRJET - A Review on Identification and Disease Detection in Plants using Mach...IRJET - A Review on Identification and Disease Detection in Plants using Mach...
IRJET - A Review on Identification and Disease Detection in Plants using Mach...IRJET Journal
 

Tendances (20)

AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
 
ENHANCED SYSTEM FOR COMPUTER-AIDED DETECTION OF MRI BRAIN TUMORS
ENHANCED SYSTEM FOR COMPUTER-AIDED DETECTION OF MRI BRAIN TUMORSENHANCED SYSTEM FOR COMPUTER-AIDED DETECTION OF MRI BRAIN TUMORS
ENHANCED SYSTEM FOR COMPUTER-AIDED DETECTION OF MRI BRAIN TUMORS
 
Brain Tumor Detection using Neural Network
Brain Tumor Detection using Neural NetworkBrain Tumor Detection using Neural Network
Brain Tumor Detection using Neural Network
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...
 
Plant Leaf Disease Analysis using Image Processing Technique with Modified SV...
Plant Leaf Disease Analysis using Image Processing Technique with Modified SV...Plant Leaf Disease Analysis using Image Processing Technique with Modified SV...
Plant Leaf Disease Analysis using Image Processing Technique with Modified SV...
 
Identification of Disease in Leaves using Genetic Algorithm
Identification of Disease in Leaves using Genetic AlgorithmIdentification of Disease in Leaves using Genetic Algorithm
Identification of Disease in Leaves using Genetic Algorithm
 
IRJET - Disease Detection in Plant using Machine Learning
IRJET -  	  Disease Detection in Plant using Machine LearningIRJET -  	  Disease Detection in Plant using Machine Learning
IRJET - Disease Detection in Plant using Machine Learning
 
deep learning applications in medical image analysis brain tumor
deep learning applications in medical image analysis brain tumordeep learning applications in medical image analysis brain tumor
deep learning applications in medical image analysis brain tumor
 
IRJET- Plant Leaf Disease Detection using Image Processing
IRJET- Plant Leaf Disease Detection using Image ProcessingIRJET- Plant Leaf Disease Detection using Image Processing
IRJET- Plant Leaf Disease Detection using Image Processing
 
2020 ssi-distinguished-projects
2020 ssi-distinguished-projects2020 ssi-distinguished-projects
2020 ssi-distinguished-projects
 
Preprocessing and Classification in WEKA Using Different Classifiers
Preprocessing and Classification in WEKA Using Different ClassifiersPreprocessing and Classification in WEKA Using Different Classifiers
Preprocessing and Classification in WEKA Using Different Classifiers
 
L045047880
L045047880L045047880
L045047880
 
A Novel Approach for Tomato Diseases Classification Based on Deep Convolution...
A Novel Approach for Tomato Diseases Classification Based on Deep Convolution...A Novel Approach for Tomato Diseases Classification Based on Deep Convolution...
A Novel Approach for Tomato Diseases Classification Based on Deep Convolution...
 
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
 
Classification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining TechniquesClassification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining Techniques
 
Segmentation of unhealthy region of plant leaf using image processing techniques
Segmentation of unhealthy region of plant leaf using image processing techniquesSegmentation of unhealthy region of plant leaf using image processing techniques
Segmentation of unhealthy region of plant leaf using image processing techniques
 
An Exploration on the Identification of Plant Leaf Diseases using Image Proce...
An Exploration on the Identification of Plant Leaf Diseases using Image Proce...An Exploration on the Identification of Plant Leaf Diseases using Image Proce...
An Exploration on the Identification of Plant Leaf Diseases using Image Proce...
 
IRJET - A Review on Identification and Disease Detection in Plants using Mach...
IRJET - A Review on Identification and Disease Detection in Plants using Mach...IRJET - A Review on Identification and Disease Detection in Plants using Mach...
IRJET - A Review on Identification and Disease Detection in Plants using Mach...
 
xtremes
xtremesxtremes
xtremes
 

Similaire à Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Discovery in Radiology

AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...Dr. Haxel Consult
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theoryC. Tobin Magle
 
Machinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfMachinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfSaketBansal9
 
Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...Ola Spjuth
 
Activity Monitoring Using Wearable Sensors and Smart Phone
Activity Monitoring Using Wearable Sensors and Smart PhoneActivity Monitoring Using Wearable Sensors and Smart Phone
Activity Monitoring Using Wearable Sensors and Smart PhoneDrAhmedZoha
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedRobert Grossman
 
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsBridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsIlkay Altintas, Ph.D.
 
Predicting Medical Test Results using Driverless AI
Predicting Medical Test Results using Driverless AIPredicting Medical Test Results using Driverless AI
Predicting Medical Test Results using Driverless AISri Ambati
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyYannick Pouliot
 
Machine Learning & Predictive Maintenance
Machine Learning &  Predictive MaintenanceMachine Learning &  Predictive Maintenance
Machine Learning & Predictive MaintenanceArnab Biswas
 
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...Srinath Perera
 
Using Metrics for Fun, Developing with the KV Store + Javascript & News from ...
Using Metrics for Fun, Developing with the KV Store + Javascript & News from ...Using Metrics for Fun, Developing with the KV Store + Javascript & News from ...
Using Metrics for Fun, Developing with the KV Store + Javascript & News from ...Harry McLaren
 
FlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaFlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaDatabricks
 
Lauri Pietarinen - What's Wrong With My Test Data
Lauri Pietarinen - What's Wrong With My Test DataLauri Pietarinen - What's Wrong With My Test Data
Lauri Pietarinen - What's Wrong With My Test DataTEST Huddle
 
Data analysis patterns, tools and data types in genomics
Data analysis patterns, tools and data types in genomicsData analysis patterns, tools and data types in genomics
Data analysis patterns, tools and data types in genomicsAltuna Akalin
 
Health Care: Cost Reductions through Data Insights - The Data Analysis Group
Health Care: Cost Reductions through Data Insights - The Data Analysis GroupHealth Care: Cost Reductions through Data Insights - The Data Analysis Group
Health Care: Cost Reductions through Data Insights - The Data Analysis GroupJames Karis
 

Similaire à Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Discovery in Radiology (20)

AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
Machinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfMachinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdf
 
Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...
 
Activity Monitoring Using Wearable Sensors and Smart Phone
Activity Monitoring Using Wearable Sensors and Smart PhoneActivity Monitoring Using Wearable Sensors and Smart Phone
Activity Monitoring Using Wearable Sensors and Smart Phone
 
2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...
2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...
2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
 
Data science 101
Data science 101Data science 101
Data science 101
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsBridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable Workflows
 
Predicting Medical Test Results using Driverless AI
Predicting Medical Test Results using Driverless AIPredicting Medical Test Results using Driverless AI
Predicting Medical Test Results using Driverless AI
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
 
Machine Learning & Predictive Maintenance
Machine Learning &  Predictive MaintenanceMachine Learning &  Predictive Maintenance
Machine Learning & Predictive Maintenance
 
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
 
Using Metrics for Fun, Developing with the KV Store + Javascript & News from ...
Using Metrics for Fun, Developing with the KV Store + Javascript & News from ...Using Metrics for Fun, Developing with the KV Store + Javascript & News from ...
Using Metrics for Fun, Developing with the KV Store + Javascript & News from ...
 
FlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaFlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at Humana
 
Fundamental of Quality Data - Anthony Ndungu
Fundamental of Quality Data - Anthony NdunguFundamental of Quality Data - Anthony Ndungu
Fundamental of Quality Data - Anthony Ndungu
 
Lauri Pietarinen - What's Wrong With My Test Data
Lauri Pietarinen - What's Wrong With My Test DataLauri Pietarinen - What's Wrong With My Test Data
Lauri Pietarinen - What's Wrong With My Test Data
 
Data analysis patterns, tools and data types in genomics
Data analysis patterns, tools and data types in genomicsData analysis patterns, tools and data types in genomics
Data analysis patterns, tools and data types in genomics
 
Health Care: Cost Reductions through Data Insights - The Data Analysis Group
Health Care: Cost Reductions through Data Insights - The Data Analysis GroupHealth Care: Cost Reductions through Data Insights - The Data Analysis Group
Health Care: Cost Reductions through Data Insights - The Data Analysis Group
 

Dernier

Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 

Dernier (20)

Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 

Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Discovery in Radiology

  • 1. Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Discovery in Radiology Barbaros Selnur Erdal, PhD Luciano M.S Prevedello, M.D. MPH Kevin Mader, PhD Joshy Cyriac Bram Stieltjes, M.D. PhD
  • 2. Materials a.Slides i. http://bit.ly/2zK0qFm b.KNIME + Workflows + Data (zip file to extract) i. Not required for the lab computers but needed at home (Windows 7 and above) 1.https://www.dropbox.com/s/3fcjvr0lfxfzmgd/knime_3.4.1.zip?dl=0 ii.Just the workflows - for Mac and Linux requires KNIME (free from knime.com) 1.https://www.dropbox.com/s/rjp5qmb56q9fjfr/PredictiveAnalytics.knar?dl=0 c.Kaggle Competition i. http://bit.ly/2zJMVps
  • 3. Learning Objectives (from RSNA Abstract) 1. Review the basic principles of predictive analytics. 2. Be exposed to some of the existing validation methodologies to test predictive models. 3. Understand how to incorporate radiology data sources (PACS, RIS, etc) into predictive modeling 4. Learn how to interpret results and make visualizations.
  • 4. Outline • Introduction / Starting KNIME (Kevin) • Why ML and Predictive Analytics are important? (Luciano) • Framework Overview (Kevin) • Value Prop, Decision, ML Task • Data Sources • Collecting Data - Preprocessing • Collecting Data from PACS • Features • Data Wrangling • Building Models • From Double-Blind to Competitions • Conclusion / Outlook (Luciano/Selnur)
  • 5. Python might be good, but ... • Devices, Sessions, Graphs, Ops, Context Managers? • tf.Session(config = tf.ConfigProto(gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction = 0.5))) • Code gets messy very quickly • Poor variable names • Minimal documentation • Custom functions / scattered .py • Multiple library versions
  • 6. KNIME + Workflows • Medical workflows are complicated involving a large number of steps • We want transparent, reproducible pipelines for running analysis in research and production settings
  • 7. Should I learn KNIME? • Supports • Matlab, R, Python scripts • Java code snippets • Writing your own plugins (Eclipse) • Natural Language Processing • Image Processing (full ImageJ / FIJI support, ImgLib2 integration) • Machine Learning Models (WEKA, scikit-learn, Decision Trees, PMML) • Deep Learning (DL4J, Keras model import, and full keras support coming) • JavaScript Visualization • Report Generation • Excel Input / Output • Database connectivity
  • 8. Notes • Please do not save workflows since the class tomorrow needs them • You will need to change the path in
  • 10. Benefits of Being Analytical • Guide through turbulent times • Improve decision making - Know what is working vs not • Manage risks • Improve Quality • Cut cost – increase efficiency • Anticipate change – competitive advantage
  • 11. Is now the right time? • YES!!! • Fee for service to value-based Care • Analytics - Guide through turbulent times
  • 12. Value Cost Manage risks Improve decision making Improve Quality Cut cost Increase efficiency
  • 14. Indian Health Statistics https://www.kaggle.com/rajanand/key-indicators-of-annual-health-survey • Chronic disease likelihood based on region, age, • Establish better baseline statistics for patients to be more efficient with diagnostics
  • 16. Moveimportant incoming emails to a dedicated section at the top of the inbox We want to be ableto answerthe question “Is this email important?” beforethe usergets a chance to see the email • Input: email • Output: “Important” (Positiveclass) or “Regular” -> Binary Classification Makeit easierfor users of an email client to identify new important emails in their inbox, by automatically detecting themand making them more visible in the inbox (this detection must happen beforeusersees email) The objectiveis that users spend less themin theirinbox and reply to important emails more quickly • Previous email messages (as mbox files or in othertypeof database) • Address book • Calendar • Explicit labelling: users can manuallylabel emails as important or not, by clicking on an icon next to each email’s subject • Implicit labelling: heuristics based on user behaviorafter getting the email (e.g. replying fast, deleting without reading, etc.) Every time we receive an email addressed to our user, which starts a new thread (otherwise the importance is just the same as that of the thread) We aim to rapidly deliverthe email in the right section of the inbox, within a 2s period Use last 3 months of emails for test and 12 months beforefor training. We makePI option availableto user if… • Cost < baseline heuristic (e.g. “if senderin address book then important”): FP costs 1, FN costs 3 • No morethan 1 errorper X emails One model per user, initially built on last 12 months of email data, that we update… • When an error is signaled by the user via manual labelling • Every 5’ by adding new data from implicit labelling, if any Perweek: • Ratio: #errors explicitly signaled by user/ #emails received • Same w. errors seen via implicit labelling • Averagetime taken to reply to important emails • Total time spent on inbox Priority Inbox (PI) Louis Dorard Jan. 2017 1 • Content features: subject, body, attachments, size • Social features: based on info about sender (e.g. in address book?), previous interactions, contextual (e.g. upcoming meeting w. sender) • Email labels (typically assigned via manual rules defined by user)
  • 17. Our Goal 1. NSCLC Patients have a large number of scans of the course of their visit to a hospital. a. We want to predict which scans a patient will have after diagnosis to try and minimize the number of required visits. b. Go from a collection with metadata from over 60K scans to a model
  • 18. Value Proposition • We want to schedule Radiologists better at our Lung Cancer Center so we have faster turn around times at lower cost
  • 19. Decision • How many scans to we expect that will need to be read? • How many radiologists need to be on duty in a given week?
  • 20. ML Task • Given a patient history and diagnosis predict the number of future scans • Input: • Patient History • Patient Information • Output • Number of CT scans needed post diagnosis
  • 21. Data Sources 1.This course a. CSV File of Scans with DICOM Headers b. CSV File extracted from Tumor Board 2. Your own hospital a. PACS b. Tumor Board c. Other Interesting Sources i. RIS Reports ii. Pathology Reports
  • 22. Reading in CSV Files
  • 23. Collecting Data • This Course • Data is already prepared we just have to join it • At your hospital • Take list of patients from Tumor Board • Find all scans for each patient in PACS • Extract DICOM Header as Table
  • 24. Collecting Data (PACS) • Beyond this course • DCMTK • Python • https://github.com/joshy/pypacscrawler • RCC42C - Joshy Cyriac - Open Source Tools for Rapidly Indexing, Searching, and Processing Image Data from the PACS
  • 25. Collecting Data • This Course • We read in scans from a PACS Output • We read a list of patients from a Tumor Board Output • We join the two tables on Patient ID • We convert strings into dates
  • 26. Features • High Value Features • Number of previous scans (hypothesis that having more scans before mean there is less need to scan later) • Age (older patients will have more complications?) • Gender (could be gender differences) • Interesting Features • Referring Physician (maybe some physicians order more scans than others) • Institution (some hospitals might order more scans than others) • No • Accession Number, Patient ID, Patient Name
  • 27. Pivoting / Pivot Tables 1. Many times the data we want isn’t in the right format a. We have a fully expanded list of scans b. We want the number of unique studies per patient organized by scan type c. This requires a number of different operations
  • 38. Features • High Value Features • Number of previous scans (hypothesis that having more scans before mean there is less need to scan later) • Age (older patients will have more complications?) • Gender (could be gender differences) • Interesting Features • Referring Physician (maybe some physicians order more scans than others) • Institution (some hospitals might order more scans than others) • No • Accession Number, Patient ID, Patient Name
  • 39. Offline Evaluations • Predict the number of scans • Penalize the wrong number of scans linearly • Even better • Predict the number of scans per week • Penalize by radiologist hour mismatch per week
  • 40. Making Predictions • As a patient is diagnosed with NSCLC gather the whole patient history • Predict the number of scans required in the future to plan better for capacity
  • 41. Building a Model 1.Models a. Partitioning i. Training Data ii. Validation Data b. Model Selection i. Model Representation c. Scoring i. Confusion Matrix ii. R^2 iii. ROC Curve
  • 44. Processing Data ● Pipeline / Workflow ○ Data processing should be a clearly defined, transparent workflow ○ Where is data read from ○ How can it be combined (Patient ID? KIS ID? Accession Number?) ○ Which fields/columns should be transformed and how ○ How can it be reorganized (pivoting) ○ How can we apply this to any new data and make it clear for people unfamiliar
  • 46. Applying to new data ● We have built a model and tested it a bit ● Now we want to apply it to some new data ● We can take the entire workflow and make a ‘meta-node’ out of it (Node of Nodes)
  • 47. Train, Validation, Test ● We have a training and testing dataset ● We partition the training into a training set and a validation
  • 48. Saving Predictions on Test (CSV Writer) ● The CSV Writer node will export the table from KNIME to a CSV File. ● The node has to be reconfigured in order to export the results so right click and select Configure… ● Then click the Browse… button and save the file on the desktop
  • 49. From Double Blind to Challenges
  • 50. Our ‘In-Class’ Competition http://bit.ly/2zJMVps Sign up yourself or Guest Account Username: rsna2017 Password: rsna2017 (but you won’t be on The leaderboard and your results will be deleted and only the first 10 can use it)
  • 52. Other Models ● Random Forest Regressor (not classifier) ○ Replace the Learner and Predictors (both) https://www.dropbox.com/s/pfgz0z8kt6tbdcw /AdvancedWorkflows.knar?dl=0
  • 53. Review Important Points ● Clearly defined Goal ○ Predict a category ■ Classification (disease type, high/low risk patients) ○ Predict a number ■ Regression (risk factor, life expectancy, treatment dose, number of scans) ○ Think about workflow integration ■ Predicting the past isn’t helpful ○ What is accuracy? ● Collecting and Organizing Data ○ Pipeline Thinking ○ Finding a representative data-set ● Deciding on a validation strategy ○ Train / Test split ○ Cross-validation ● Evaluating Outcomes / Improving Models
  • 54. Above and Beyond ● Just the beginning of predictive analytics and visualization ● Here are some other things we can do with the data ○ Timeline ■ Look at the timeline of events that happen to a given patient ○ Different scan types ■ What things are more likely / less likely
  • 56. Visualizing Data by Scan Type