SlideShare une entreprise Scribd logo
1  sur  17
1
Elsevier Health Analytics
Medical Graph v1
Empowering
KnowledgeTM
Towards
• A map of medicine
• Personalized decision support in a
clinical setting
Paul Hellwig
Director Research & Development
p.hellwig@elsevier.com
https://www.linkedin.com/in/paulhellwig
Nov, 2016
2
Elsevier
• Publisher & world-leading provider of
information solutions
• 6,700 people worldwide, € 2.8 billion
revenues1
• >2,200 journals, >25,000 book titles
• ScienceDirect, Scopus, ClinicalKey and
Nursing Consult
• Health Analytics Team in Berlin
2
LexisNexis
• Helps predict and manage risk for
industry and government
• 7,200 people, € 2.2 billion revenues1
• 35 years experience in managing big
data, currently >5 Peta Bytes
• Have developed the HPCC2
supercomputer platform
1: 2015 2: High Performance Computing Cluster
Elsevier Health Analytics combines
RELX Group's medical and big data analytics expertise
3
3
Elsevier Health Analytics
- Our vision -
4
4
physician patient
Trends driving changes in physician - patient interaction…
25 million
biomed articles
referenced on PubMed
1.2 million
new biomed articles p.a.
3. information explosion1. medical data explosion
4500 tests for gene
disorders available
(2013: 3200 +20% CAGR)
$1245
cost to sequence
full genome
(10/2014: $5730)
patientslikeme has
400,000+ members
31 million data points covering
2,500+ conditions, donating data
2. patient empowerment
105 mm ECG biosensor
high ecg quality, heart rate, respiratory,
body temp, activity, body position, water
tight, induction charged, bluetooth,
continuous data feed
5
5
physician patient
…and the real challenge
25 million
biomed articles
referenced on PubMed
1.2 million
new biomed articles p.a.
3. information explosion1. medical data explosion
4500 tests for gene
disorders available
(2013: 3200 +20% CAGR)
$1245
cost to sequence
full genome
(10/2014: $5730)
patientslikeme has
400,000+ members
31 million data points covering
2,500+ conditions, donating data
2. patient empowerment
105 mm ECG biosensor
high ecg quality, heart rate, respiratory,
body temp, activity, body position, water
tight, induction charged, bluetooth,
continuous data feed
< 10
minutes1
1 Europe; US up to 20 mins: Ray KN, Chari AV, Engberg J, Bertolet M, Mehrotra A. Disparities in Time Spent Seeking Medical Care in the United States. JAMA
Intern Med. 2015;175(12):1983-1986. doi:10.1001/jamainternmed.2015.4468.
6
6
Medical Graph – Research Goal A:
Risk predictions: which diseases will you likely get within 4 years?
From Electronic Health Record…
…to Top Risks
7
7
I65
Verschluss und Stenose
präzerebraler Arterien
G40
Epilepsie
I61
C71
Bösartige Neubildung des
Gehirns
odds ratio: 1.12
Intrazerebrale Blutung
1 Criteria based on: Jensen et.al.: Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients. Nature
Communications, 2014 Jun 24 ;5:4022. doi: 10.1038/ncomms5022.
Weitere
Covariaten
Medical Graph – Research Goal B:
Map: How are diseases, medications and other data connected?
has_successor1
…für 1600
Zielkrankheiten
8
8
Medical Graph development
9
Example: Model to predict „I50 – Heart Failure“
9
I50 -
2009
„PAST“
time
I50 -
(coded
as 0)
I50 +
(coded
as1)
2011 2014
Covariates
• Age
• Gender
• Other diseases
• Medications
• Other
Analysis Design
Predict 4 year long-term effects, balanced for all co-variables
„FUTURE“
2010
10
10
Primary care
Secondary care
Medication
Other data
Visits & diagnoses
Visits, diagnoses
& procedures
Drug presciptions
Further cooperations just started
Will enable analysis of vital and laboratory parameters
Billing data flow
60+ sickness funds;
Anonymized
feature extraction
3943 features for 3.8m
patients
• 1623 targets, 2011-2014
• 2320 covariates, 2010
Our observation / feature matrix
11
11
Attempt no. #1
on server
#2
on cluster
#3
on server
machine
learning
algorithm
Component-wise
gradient boosting
(mboost)
GLM for p-values
Logistic Regression
with LASSO
GLM for p-values
Linear gradient boosting
(sklearn + xgboost)
F-test for p-values
Did it work for
full dataset?
Worked for 100k
patients.
Failure reason:
RAM (extensive dataset
copying)
Worked for 138 models.
Failure reason:
Memory Leak every 30-40
models
Worked for 800k
patients.
Failure reason:
int32 as index for sparse
matrixes
Runtime ~7 min / target model
(on 100k patients)
~8 min / target model
(on 3.8m patients)
~7 min / target model
(on 800k patients)
Predictive Modeling for ~1600 target diseases
Multiple attempts – no software is perfect
12
12
# model 1: component-wise linear boosting
boost_train_ds <- glmboost(as.formula(paste(icd_atc_use_names[i],"~.")), 
data=data[ins,][c(which_one,sample(which_zero,(length(which_one)),replace=F)),], 
family=Binomial(), control=boost_control(mstop=400,trace=T,center=F))
...
# model 1: GLM with ElasticNet
model1 = H2OGeneralizedLinearEstimator(model_id=post_col, family = 'binomial', solver='IRLSM', 
alpha = 0.99, #mainly LASSO
lambda_search=True, standardize=True, intercept=True)
model1.train(x=index_cols, y=post_col, training_frame=training, validation_frame=val)
...
+ XGBoost
+ mboost
# model 1: component-wise linear boosting
params={'silent': 0, 'nthread': 4, 
'eval_metric':['error','map','map@'+str(top1percent_train),'map@'+str(top1percent_eval),'auc'],
'objective': 'binary:logistic', 'booster': 'gblinear', 
'lambda': 0, #L2 regularization (Ridge) none 
'alpha': 500} #L1 regularization (LASSO)
booster = xgb.train( params, dtrain, num_boost_round=settings.boosting_iterations, 
evals=[(dtrain,'train'),(dtest,'eval')], early_stopping_rounds=10, evals_result =quality)
...
Code for model building
13
13
Krankheiten des
Nervensystems
Neubildungen
Validate & test
Interesting effects between disease chapters
14
Medical Graph backend
14
From last run:
• 2261 nodes
• 434995 edges
Relation Source Target OR beta p-value
number
relations
proportion of
incidents have source
proportion source
get incidents Mean age
has_successor Intercept ICD_M54 0,2483 -1,3930
has_successor AGE ICD_M54 1,0517 0,0504 0,000000 100,0% 21,9%
has_successor GENDER ICD_M54 0,9944 -0,0056 0,000000 82556 47,2% 21,2% 42
has_successor ICD_I10 ICD_M54 0,9260 -0,0768 0,000000 45013 25,8% 20,4% 62
has_successor ICD_H35 ICD_M54 0,9469 -0,0545 0,000000 8125 4,6% 19,5% 62
has_successor ATC_D01AC ICD_M54 1,0022 0,0022 0,000000 3382 1,9% 17,8% 47
has_successor ATC_M01AB ICD_M54 1,2207 0,1994 0,000000 16534 9,5% 17,0% 52
has_successor ICD_H26 ICD_M54 0,9420 -0,0597 0,000000 7550 4,3% 19,1% 67
has_successor ATC_C09AA ICD_M54 0,9603 -0,0405 0,000000 16840 9,6% 20,1% 62
has_successor ATC_C08CA ICD_M54 0,9299 -0,0727 0,000000 9892 5,7% 19,5% 67
has_successor ATC_C07BB ICD_M54 1,0031 0,0031 0,000000 2197 1,3% 21,3% 62
has_successor ICD_H52 ICD_M54 1,0006 0,0006 0,000000 35331 20,2% 20,5% 52
has_successor ATC_M01AE ICD_M54 1,0450 0,0440 0,000000 22808 13,0% 16,4% 42
has_successor ICD_H43 ICD_M54 1,0300 0,0296 0,000000 3599 2,1% 20,2% 62
has_successor ICD_L85 ICD_M54 0,9362 -0,0660 0,000978 1244 0,7% 18,4% 47
has_successor ICD_H02 ICD_M54 1,0165 0,0164 0,000000 1734 1,0% 19,8% 57
Edges
15
Medical Graph frontend
15
16
16
Key Learnings
17
Key learnings from working 5 years with medical data
17
Physicians want
explanations.
Otherwise they will not
trust the predictions.
Typical best-in-class
classification methods
(deep learning, random
forest) do not yet
deliver explainable
models. This won‘t
do.
Open source tools have failures
(as have proprietary tools).
Debugging can be a
nightmare.
In practice, you need to
save the users processing
time, not add to it.
Visualization is
key.
Building a classification model
using open source tools is simple.
Scaling input data size is also
manageable. Building 1000+
models is complex.
Implementing, applying and
maintaining a Security
Framework to keep personal
health information secure is a
substantial effort.
Feature
engineering is
not dead. If you
want explainable
effects, you most
probably need linear
models, so you need to
engineer non-linear
effects, e.g. using
clusters.

Contenu connexe

Tendances

Improve The Performance of K-means by using Genetic Algorithm for Classificat...
Improve The Performance of K-means by using Genetic Algorithm for Classificat...Improve The Performance of K-means by using Genetic Algorithm for Classificat...
Improve The Performance of K-means by using Genetic Algorithm for Classificat...
IJECEIAES
 
Survival Analysis On Kidney Failure of Kidney Tranplant Patients
Survival Analysis On Kidney Failure of Kidney Tranplant PatientsSurvival Analysis On Kidney Failure of Kidney Tranplant Patients
Survival Analysis On Kidney Failure of Kidney Tranplant Patients
Dwaipayan Mukhopadhyay
 

Tendances (16)

Improve The Performance of K-means by using Genetic Algorithm for Classificat...
Improve The Performance of K-means by using Genetic Algorithm for Classificat...Improve The Performance of K-means by using Genetic Algorithm for Classificat...
Improve The Performance of K-means by using Genetic Algorithm for Classificat...
 
IRJET - Survey on Chronic Kidney Disease Prediction System with Feature Selec...
IRJET - Survey on Chronic Kidney Disease Prediction System with Feature Selec...IRJET - Survey on Chronic Kidney Disease Prediction System with Feature Selec...
IRJET - Survey on Chronic Kidney Disease Prediction System with Feature Selec...
 
Echo Directed Therapy Saves Lives in ICU - Vieillard-Baron
Echo Directed Therapy Saves Lives in ICU - Vieillard-Baron Echo Directed Therapy Saves Lives in ICU - Vieillard-Baron
Echo Directed Therapy Saves Lives in ICU - Vieillard-Baron
 
TADAA - Towards Automated Detection of Anaesthetic Activity
TADAA - Towards Automated Detection of Anaesthetic ActivityTADAA - Towards Automated Detection of Anaesthetic Activity
TADAA - Towards Automated Detection of Anaesthetic Activity
 
What if medicine understood itself as a knowledge processing discipline
What if medicine understood itself as a knowledge processing disciplineWhat if medicine understood itself as a knowledge processing discipline
What if medicine understood itself as a knowledge processing discipline
 
Survival Analysis On Kidney Failure of Kidney Tranplant Patients
Survival Analysis On Kidney Failure of Kidney Tranplant PatientsSurvival Analysis On Kidney Failure of Kidney Tranplant Patients
Survival Analysis On Kidney Failure of Kidney Tranplant Patients
 
iHT2 Health IT Summit Atlanta 2013 – John Doulis, MD , CIO, MedCare Investme...
iHT2 Health IT Summit Atlanta 2013 –  John Doulis, MD , CIO, MedCare Investme...iHT2 Health IT Summit Atlanta 2013 –  John Doulis, MD , CIO, MedCare Investme...
iHT2 Health IT Summit Atlanta 2013 – John Doulis, MD , CIO, MedCare Investme...
 
AI: The Future is So Bright: Part Deux
AI: The Future is So Bright: Part DeuxAI: The Future is So Bright: Part Deux
AI: The Future is So Bright: Part Deux
 
An OGMS-based Model for Clinical Information
An OGMS-based Model for Clinical InformationAn OGMS-based Model for Clinical Information
An OGMS-based Model for Clinical Information
 
Kaggle presentation at SF Data Mining Meetup - Trulia June 23, 2015
Kaggle presentation at SF Data Mining Meetup - Trulia June 23, 2015Kaggle presentation at SF Data Mining Meetup - Trulia June 23, 2015
Kaggle presentation at SF Data Mining Meetup - Trulia June 23, 2015
 
Kamloops Emergency Surgery Booking System (project staff presentation: surge...
Kamloops Emergency Surgery Booking System  (project staff presentation: surge...Kamloops Emergency Surgery Booking System  (project staff presentation: surge...
Kamloops Emergency Surgery Booking System (project staff presentation: surge...
 
"Challenges for AI in Healthcare" - Peter Graven Ph.D
"Challenges for AI in Healthcare" - Peter Graven Ph.D"Challenges for AI in Healthcare" - Peter Graven Ph.D
"Challenges for AI in Healthcare" - Peter Graven Ph.D
 
What is in my smartphone
What is in my smartphoneWhat is in my smartphone
What is in my smartphone
 
Modified Revised-Marshal Score
Modified Revised-Marshal ScoreModified Revised-Marshal Score
Modified Revised-Marshal Score
 
Predicting Diabetes Using Machine Learning
Predicting Diabetes Using Machine LearningPredicting Diabetes Using Machine Learning
Predicting Diabetes Using Machine Learning
 
Allegro
AllegroAllegro
Allegro
 

En vedette

En vedette (20)

Die Bedeutung von Machine Learning für den e-Commerce am Beispiel von Amazon
Die Bedeutung von Machine Learning für den e-Commerce am Beispiel von AmazonDie Bedeutung von Machine Learning für den e-Commerce am Beispiel von Amazon
Die Bedeutung von Machine Learning für den e-Commerce am Beispiel von Amazon
 
Outside the Comfort Zone: Cross Industry Use Cases in Big Data Analytics
Outside the Comfort Zone: Cross Industry Use Cases in Big Data AnalyticsOutside the Comfort Zone: Cross Industry Use Cases in Big Data Analytics
Outside the Comfort Zone: Cross Industry Use Cases in Big Data Analytics
 
Predictive Analytics World for Business Deutschland 2017
Predictive Analytics World for Business Deutschland 2017Predictive Analytics World for Business Deutschland 2017
Predictive Analytics World for Business Deutschland 2017
 
Six secrets-to-closing-sale
Six secrets-to-closing-saleSix secrets-to-closing-sale
Six secrets-to-closing-sale
 
From Big Data to Precision Medicine
From Big Data to Precision Medicine From Big Data to Precision Medicine
From Big Data to Precision Medicine
 
How Real-time Analysis turns Big Medical Data into Precision Medicine
How Real-time Analysis turns Big Medical Data into Precision MedicineHow Real-time Analysis turns Big Medical Data into Precision Medicine
How Real-time Analysis turns Big Medical Data into Precision Medicine
 
Predictive Analytics World Berlin 2016
Predictive Analytics World Berlin 2016 Predictive Analytics World Berlin 2016
Predictive Analytics World Berlin 2016
 
Genomic Medicine: Personalized Care for Just Pennies
Genomic Medicine: Personalized Care for Just PenniesGenomic Medicine: Personalized Care for Just Pennies
Genomic Medicine: Personalized Care for Just Pennies
 
Precision Medicine World Conference 2017
Precision Medicine World Conference 2017Precision Medicine World Conference 2017
Precision Medicine World Conference 2017
 
C-Change Cancer Big Data, NCI Genomic Data Commons, Cloud Pilots
C-Change Cancer Big Data, NCI Genomic Data Commons, Cloud PilotsC-Change Cancer Big Data, NCI Genomic Data Commons, Cloud Pilots
C-Change Cancer Big Data, NCI Genomic Data Commons, Cloud Pilots
 
Precision Medicine: Opportunities and Challenges for Clinical Trials
Precision Medicine: Opportunities and Challenges for Clinical TrialsPrecision Medicine: Opportunities and Challenges for Clinical Trials
Precision Medicine: Opportunities and Challenges for Clinical Trials
 
Human Genome Project
Human Genome ProjectHuman Genome Project
Human Genome Project
 
SuperComputing 16 HPC Matters Panel on Precision Medicine
SuperComputing 16 HPC Matters Panel on Precision MedicineSuperComputing 16 HPC Matters Panel on Precision Medicine
SuperComputing 16 HPC Matters Panel on Precision Medicine
 
National Cancer Data Ecosystem and Data Sharing
National Cancer Data Ecosystem and Data SharingNational Cancer Data Ecosystem and Data Sharing
National Cancer Data Ecosystem and Data Sharing
 
Cancer Moonshot, Data sharing and the Genomic Data Commons
Cancer Moonshot, Data sharing and the Genomic Data CommonsCancer Moonshot, Data sharing and the Genomic Data Commons
Cancer Moonshot, Data sharing and the Genomic Data Commons
 
Turning the tide on cancer drug costs
Turning the tide on cancer drug costsTurning the tide on cancer drug costs
Turning the tide on cancer drug costs
 
Genomics and Computation in Precision Medicine March 2017
Genomics and Computation in Precision Medicine March 2017Genomics and Computation in Precision Medicine March 2017
Genomics and Computation in Precision Medicine March 2017
 
The Lean Startup Model for Healthcare
The Lean Startup Model for HealthcareThe Lean Startup Model for Healthcare
The Lean Startup Model for Healthcare
 
The Link Between Alcohol and Breast Cancer
The Link Between Alcohol and Breast CancerThe Link Between Alcohol and Breast Cancer
The Link Between Alcohol and Breast Cancer
 
Medical Graphs
Medical GraphsMedical Graphs
Medical Graphs
 

Similaire à Elsevier Medical Graph – mit Machine Learning zu Precision Medicine

為恭醫院 20070913
為恭醫院 20070913為恭醫院 20070913
為恭醫院 20070913
calaf0618
 
Prevalence Of Pressure Ulcer Name xxxUnited State Universit.docx
Prevalence Of  Pressure Ulcer Name xxxUnited State Universit.docxPrevalence Of  Pressure Ulcer Name xxxUnited State Universit.docx
Prevalence Of Pressure Ulcer Name xxxUnited State Universit.docx
LacieKlineeb
 

Similaire à Elsevier Medical Graph – mit Machine Learning zu Precision Medicine (20)

為恭醫院 20070913
為恭醫院 20070913為恭醫院 20070913
為恭醫院 20070913
 
DESIGN AND IMPLEMENTATION OF CARDIAC DISEASE USING NAIVE BAYES TECHNIQUE
DESIGN AND IMPLEMENTATION OF CARDIAC DISEASE USING NAIVE BAYES TECHNIQUEDESIGN AND IMPLEMENTATION OF CARDIAC DISEASE USING NAIVE BAYES TECHNIQUE
DESIGN AND IMPLEMENTATION OF CARDIAC DISEASE USING NAIVE BAYES TECHNIQUE
 
Prevalence Of Pressure Ulcer Name xxxUnited State Universit.docx
Prevalence Of  Pressure Ulcer Name xxxUnited State Universit.docxPrevalence Of  Pressure Ulcer Name xxxUnited State Universit.docx
Prevalence Of Pressure Ulcer Name xxxUnited State Universit.docx
 
Heart Disease Prediction Using Data Mining
Heart Disease Prediction Using Data MiningHeart Disease Prediction Using Data Mining
Heart Disease Prediction Using Data Mining
 
Impact of Big Data & Artificial Intelligence in Drug Discovery & Development ...
Impact of Big Data & Artificial Intelligence in Drug Discovery & Development ...Impact of Big Data & Artificial Intelligence in Drug Discovery & Development ...
Impact of Big Data & Artificial Intelligence in Drug Discovery & Development ...
 
14 00-20171207 rance-piv_c
14 00-20171207 rance-piv_c14 00-20171207 rance-piv_c
14 00-20171207 rance-piv_c
 
IRJET - Digital Assistance: A New Impulse on Stroke Patient Health Care using...
IRJET - Digital Assistance: A New Impulse on Stroke Patient Health Care using...IRJET - Digital Assistance: A New Impulse on Stroke Patient Health Care using...
IRJET - Digital Assistance: A New Impulse on Stroke Patient Health Care using...
 
IoT Based Intelligent Medicine Box with Assistance
IoT Based Intelligent Medicine Box with AssistanceIoT Based Intelligent Medicine Box with Assistance
IoT Based Intelligent Medicine Box with Assistance
 
IRJET- A Survey on Classification and identification of Arrhythmia using Mach...
IRJET- A Survey on Classification and identification of Arrhythmia using Mach...IRJET- A Survey on Classification and identification of Arrhythmia using Mach...
IRJET- A Survey on Classification and identification of Arrhythmia using Mach...
 
Advancements in Rodent Surgical Monitoring
Advancements in Rodent Surgical MonitoringAdvancements in Rodent Surgical Monitoring
Advancements in Rodent Surgical Monitoring
 
H2O World - Machine Learning to Save Lives - Taposh Dutta Roy
H2O World - Machine Learning to Save Lives - Taposh Dutta RoyH2O World - Machine Learning to Save Lives - Taposh Dutta Roy
H2O World - Machine Learning to Save Lives - Taposh Dutta Roy
 
An Intelligent System for Patient Monitoring & Clinical Decision Support in N...
An Intelligent System for Patient Monitoring & Clinical Decision Support in N...An Intelligent System for Patient Monitoring & Clinical Decision Support in N...
An Intelligent System for Patient Monitoring & Clinical Decision Support in N...
 
AI-powered Medical Imaging Analysis for Precision Medicine
AI-powered Medical Imaging Analysis for Precision MedicineAI-powered Medical Imaging Analysis for Precision Medicine
AI-powered Medical Imaging Analysis for Precision Medicine
 
2016 CSE Poster Template - Ventilator
2016 CSE Poster Template - Ventilator2016 CSE Poster Template - Ventilator
2016 CSE Poster Template - Ventilator
 
From the Archives, 2008:Clinical and Economic Advantages Implantable Defibril...
From the Archives, 2008:Clinical and Economic Advantages Implantable Defibril...From the Archives, 2008:Clinical and Economic Advantages Implantable Defibril...
From the Archives, 2008:Clinical and Economic Advantages Implantable Defibril...
 
IOT BASED HEALTH MONITORING SYSTEM FOR COVID 19 PATIENT
IOT BASED HEALTH MONITORING SYSTEM FOR COVID 19 PATIENTIOT BASED HEALTH MONITORING SYSTEM FOR COVID 19 PATIENT
IOT BASED HEALTH MONITORING SYSTEM FOR COVID 19 PATIENT
 
Multiple disease prediction using Machine Learning Algorithms
Multiple disease prediction using Machine Learning AlgorithmsMultiple disease prediction using Machine Learning Algorithms
Multiple disease prediction using Machine Learning Algorithms
 
Pharma & Health Conference 2017, Francois Clerin
Pharma & Health Conference 2017, Francois ClerinPharma & Health Conference 2017, Francois Clerin
Pharma & Health Conference 2017, Francois Clerin
 
Pressure Prediction System in Lung Circuit using Deep Learning and Machine Le...
Pressure Prediction System in Lung Circuit using Deep Learning and Machine Le...Pressure Prediction System in Lung Circuit using Deep Learning and Machine Le...
Pressure Prediction System in Lung Circuit using Deep Learning and Machine Le...
 
Dual Mode Ventilator Integrated with Patient Monitoring System
Dual Mode Ventilator Integrated with Patient Monitoring SystemDual Mode Ventilator Integrated with Patient Monitoring System
Dual Mode Ventilator Integrated with Patient Monitoring System
 

Plus de Rising Media Ltd.

Plus de Rising Media Ltd. (20)

Data Science at Roche: From Exploration to Productionization - Frank Block
Data Science at Roche: From Exploration to Productionization - Frank BlockData Science at Roche: From Exploration to Productionization - Frank Block
Data Science at Roche: From Exploration to Productionization - Frank Block
 
Cost-Effective Personalisation Platform for 30M Users of Ringier Axel Springe...
Cost-Effective Personalisation Platform for 30M Users of Ringier Axel Springe...Cost-Effective Personalisation Platform for 30M Users of Ringier Axel Springe...
Cost-Effective Personalisation Platform for 30M Users of Ringier Axel Springe...
 
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan HamedUplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed
 
Behind the Buzzword: Understanding Customer Data Platforms in the Light of Pr...
Behind the Buzzword: Understanding Customer Data Platforms in the Light of Pr...Behind the Buzzword: Understanding Customer Data Platforms in the Light of Pr...
Behind the Buzzword: Understanding Customer Data Platforms in the Light of Pr...
 
Data Science Development Lifecycle - Everyone Talks About it, Nobody Really K...
Data Science Development Lifecycle - Everyone Talks About it, Nobody Really K...Data Science Development Lifecycle - Everyone Talks About it, Nobody Really K...
Data Science Development Lifecycle - Everyone Talks About it, Nobody Really K...
 
Creating Community at WeWork through Graph Embeddings with node2vec - Karry Lu
Creating Community at WeWork through Graph Embeddings with node2vec - Karry LuCreating Community at WeWork through Graph Embeddings with node2vec - Karry Lu
Creating Community at WeWork through Graph Embeddings with node2vec - Karry Lu
 
More than 10 Blue Links: Advanced-Level SERP Optimisation
More than 10 Blue Links: Advanced-Level SERP OptimisationMore than 10 Blue Links: Advanced-Level SERP Optimisation
More than 10 Blue Links: Advanced-Level SERP Optimisation
 
How to Get Great Results Across Every Marketing Channel
How to Get Great Results Across Every Marketing ChannelHow to Get Great Results Across Every Marketing Channel
How to Get Great Results Across Every Marketing Channel
 
Don’t Freak Out! Tips for Mobile and Voice Search
Don’t Freak Out! Tips for Mobile and Voice SearchDon’t Freak Out! Tips for Mobile and Voice Search
Don’t Freak Out! Tips for Mobile and Voice Search
 
The Scout24 Data Landscape Manifesto: Building an Opinionated Data Platform
The Scout24 Data Landscape Manifesto: Building an Opinionated Data PlatformThe Scout24 Data Landscape Manifesto: Building an Opinionated Data Platform
The Scout24 Data Landscape Manifesto: Building an Opinionated Data Platform
 
Prescriptive ohne Predictive: Regression ist noch nicht tot! ROMI bei Unitymedia
Prescriptive ohne Predictive: Regression ist noch nicht tot! ROMI bei UnitymediaPrescriptive ohne Predictive: Regression ist noch nicht tot! ROMI bei Unitymedia
Prescriptive ohne Predictive: Regression ist noch nicht tot! ROMI bei Unitymedia
 
Reinforcement Learning - Learning from Experience like a Human
Reinforcement Learning - Learning from Experience like a HumanReinforcement Learning - Learning from Experience like a Human
Reinforcement Learning - Learning from Experience like a Human
 
Mindful Analytics - Wie Achtsamkeit uns noch besser macht
Mindful Analytics - Wie Achtsamkeit uns noch besser machtMindful Analytics - Wie Achtsamkeit uns noch besser macht
Mindful Analytics - Wie Achtsamkeit uns noch besser macht
 
Data Science Development with Impact
Data Science Development with ImpactData Science Development with Impact
Data Science Development with Impact
 
Predictive Analytics World for Business Deutschland 2018
Predictive Analytics World for Business Deutschland 2018Predictive Analytics World for Business Deutschland 2018
Predictive Analytics World for Business Deutschland 2018
 
Predictive Analytics World for Business Germany 2018
Predictive Analytics World for Business Germany 2018Predictive Analytics World for Business Germany 2018
Predictive Analytics World for Business Germany 2018
 
The Centrality of a Detailed Understanding of your Audience
The Centrality of a Detailed Understanding of your AudienceThe Centrality of a Detailed Understanding of your Audience
The Centrality of a Detailed Understanding of your Audience
 
Der steinige Weg zum automatisierten Data Science Produkt – Empfehlungen und ...
Der steinige Weg zum automatisierten Data Science Produkt – Empfehlungen und ...Der steinige Weg zum automatisierten Data Science Produkt – Empfehlungen und ...
Der steinige Weg zum automatisierten Data Science Produkt – Empfehlungen und ...
 
Data Alchemy
Data AlchemyData Alchemy
Data Alchemy
 
SpiegelMining – Data Science auf Spiegel Online
SpiegelMining – Data Science auf Spiegel Online SpiegelMining – Data Science auf Spiegel Online
SpiegelMining – Data Science auf Spiegel Online
 

Dernier

➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
gajnagarg
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 

Dernier (20)

➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 

Elsevier Medical Graph – mit Machine Learning zu Precision Medicine

  • 1. 1 Elsevier Health Analytics Medical Graph v1 Empowering KnowledgeTM Towards • A map of medicine • Personalized decision support in a clinical setting Paul Hellwig Director Research & Development p.hellwig@elsevier.com https://www.linkedin.com/in/paulhellwig Nov, 2016
  • 2. 2 Elsevier • Publisher & world-leading provider of information solutions • 6,700 people worldwide, € 2.8 billion revenues1 • >2,200 journals, >25,000 book titles • ScienceDirect, Scopus, ClinicalKey and Nursing Consult • Health Analytics Team in Berlin 2 LexisNexis • Helps predict and manage risk for industry and government • 7,200 people, € 2.2 billion revenues1 • 35 years experience in managing big data, currently >5 Peta Bytes • Have developed the HPCC2 supercomputer platform 1: 2015 2: High Performance Computing Cluster Elsevier Health Analytics combines RELX Group's medical and big data analytics expertise
  • 4. 4 4 physician patient Trends driving changes in physician - patient interaction… 25 million biomed articles referenced on PubMed 1.2 million new biomed articles p.a. 3. information explosion1. medical data explosion 4500 tests for gene disorders available (2013: 3200 +20% CAGR) $1245 cost to sequence full genome (10/2014: $5730) patientslikeme has 400,000+ members 31 million data points covering 2,500+ conditions, donating data 2. patient empowerment 105 mm ECG biosensor high ecg quality, heart rate, respiratory, body temp, activity, body position, water tight, induction charged, bluetooth, continuous data feed
  • 5. 5 5 physician patient …and the real challenge 25 million biomed articles referenced on PubMed 1.2 million new biomed articles p.a. 3. information explosion1. medical data explosion 4500 tests for gene disorders available (2013: 3200 +20% CAGR) $1245 cost to sequence full genome (10/2014: $5730) patientslikeme has 400,000+ members 31 million data points covering 2,500+ conditions, donating data 2. patient empowerment 105 mm ECG biosensor high ecg quality, heart rate, respiratory, body temp, activity, body position, water tight, induction charged, bluetooth, continuous data feed < 10 minutes1 1 Europe; US up to 20 mins: Ray KN, Chari AV, Engberg J, Bertolet M, Mehrotra A. Disparities in Time Spent Seeking Medical Care in the United States. JAMA Intern Med. 2015;175(12):1983-1986. doi:10.1001/jamainternmed.2015.4468.
  • 6. 6 6 Medical Graph – Research Goal A: Risk predictions: which diseases will you likely get within 4 years? From Electronic Health Record… …to Top Risks
  • 7. 7 7 I65 Verschluss und Stenose präzerebraler Arterien G40 Epilepsie I61 C71 Bösartige Neubildung des Gehirns odds ratio: 1.12 Intrazerebrale Blutung 1 Criteria based on: Jensen et.al.: Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients. Nature Communications, 2014 Jun 24 ;5:4022. doi: 10.1038/ncomms5022. Weitere Covariaten Medical Graph – Research Goal B: Map: How are diseases, medications and other data connected? has_successor1 …für 1600 Zielkrankheiten
  • 9. 9 Example: Model to predict „I50 – Heart Failure“ 9 I50 - 2009 „PAST“ time I50 - (coded as 0) I50 + (coded as1) 2011 2014 Covariates • Age • Gender • Other diseases • Medications • Other Analysis Design Predict 4 year long-term effects, balanced for all co-variables „FUTURE“ 2010
  • 10. 10 10 Primary care Secondary care Medication Other data Visits & diagnoses Visits, diagnoses & procedures Drug presciptions Further cooperations just started Will enable analysis of vital and laboratory parameters Billing data flow 60+ sickness funds; Anonymized feature extraction 3943 features for 3.8m patients • 1623 targets, 2011-2014 • 2320 covariates, 2010 Our observation / feature matrix
  • 11. 11 11 Attempt no. #1 on server #2 on cluster #3 on server machine learning algorithm Component-wise gradient boosting (mboost) GLM for p-values Logistic Regression with LASSO GLM for p-values Linear gradient boosting (sklearn + xgboost) F-test for p-values Did it work for full dataset? Worked for 100k patients. Failure reason: RAM (extensive dataset copying) Worked for 138 models. Failure reason: Memory Leak every 30-40 models Worked for 800k patients. Failure reason: int32 as index for sparse matrixes Runtime ~7 min / target model (on 100k patients) ~8 min / target model (on 3.8m patients) ~7 min / target model (on 800k patients) Predictive Modeling for ~1600 target diseases Multiple attempts – no software is perfect
  • 12. 12 12 # model 1: component-wise linear boosting boost_train_ds <- glmboost(as.formula(paste(icd_atc_use_names[i],"~.")), data=data[ins,][c(which_one,sample(which_zero,(length(which_one)),replace=F)),], family=Binomial(), control=boost_control(mstop=400,trace=T,center=F)) ... # model 1: GLM with ElasticNet model1 = H2OGeneralizedLinearEstimator(model_id=post_col, family = 'binomial', solver='IRLSM', alpha = 0.99, #mainly LASSO lambda_search=True, standardize=True, intercept=True) model1.train(x=index_cols, y=post_col, training_frame=training, validation_frame=val) ... + XGBoost + mboost # model 1: component-wise linear boosting params={'silent': 0, 'nthread': 4, 'eval_metric':['error','map','map@'+str(top1percent_train),'map@'+str(top1percent_eval),'auc'], 'objective': 'binary:logistic', 'booster': 'gblinear', 'lambda': 0, #L2 regularization (Ridge) none 'alpha': 500} #L1 regularization (LASSO) booster = xgb.train( params, dtrain, num_boost_round=settings.boosting_iterations, evals=[(dtrain,'train'),(dtest,'eval')], early_stopping_rounds=10, evals_result =quality) ... Code for model building
  • 13. 13 13 Krankheiten des Nervensystems Neubildungen Validate & test Interesting effects between disease chapters
  • 14. 14 Medical Graph backend 14 From last run: • 2261 nodes • 434995 edges Relation Source Target OR beta p-value number relations proportion of incidents have source proportion source get incidents Mean age has_successor Intercept ICD_M54 0,2483 -1,3930 has_successor AGE ICD_M54 1,0517 0,0504 0,000000 100,0% 21,9% has_successor GENDER ICD_M54 0,9944 -0,0056 0,000000 82556 47,2% 21,2% 42 has_successor ICD_I10 ICD_M54 0,9260 -0,0768 0,000000 45013 25,8% 20,4% 62 has_successor ICD_H35 ICD_M54 0,9469 -0,0545 0,000000 8125 4,6% 19,5% 62 has_successor ATC_D01AC ICD_M54 1,0022 0,0022 0,000000 3382 1,9% 17,8% 47 has_successor ATC_M01AB ICD_M54 1,2207 0,1994 0,000000 16534 9,5% 17,0% 52 has_successor ICD_H26 ICD_M54 0,9420 -0,0597 0,000000 7550 4,3% 19,1% 67 has_successor ATC_C09AA ICD_M54 0,9603 -0,0405 0,000000 16840 9,6% 20,1% 62 has_successor ATC_C08CA ICD_M54 0,9299 -0,0727 0,000000 9892 5,7% 19,5% 67 has_successor ATC_C07BB ICD_M54 1,0031 0,0031 0,000000 2197 1,3% 21,3% 62 has_successor ICD_H52 ICD_M54 1,0006 0,0006 0,000000 35331 20,2% 20,5% 52 has_successor ATC_M01AE ICD_M54 1,0450 0,0440 0,000000 22808 13,0% 16,4% 42 has_successor ICD_H43 ICD_M54 1,0300 0,0296 0,000000 3599 2,1% 20,2% 62 has_successor ICD_L85 ICD_M54 0,9362 -0,0660 0,000978 1244 0,7% 18,4% 47 has_successor ICD_H02 ICD_M54 1,0165 0,0164 0,000000 1734 1,0% 19,8% 57 Edges
  • 17. 17 Key learnings from working 5 years with medical data 17 Physicians want explanations. Otherwise they will not trust the predictions. Typical best-in-class classification methods (deep learning, random forest) do not yet deliver explainable models. This won‘t do. Open source tools have failures (as have proprietary tools). Debugging can be a nightmare. In practice, you need to save the users processing time, not add to it. Visualization is key. Building a classification model using open source tools is simple. Scaling input data size is also manageable. Building 1000+ models is complex. Implementing, applying and maintaining a Security Framework to keep personal health information secure is a substantial effort. Feature engineering is not dead. If you want explainable effects, you most probably need linear models, so you need to engineer non-linear effects, e.g. using clusters.