SlideShare a Scribd company logo
1 of 27
Download to read offline
Extracting Medical
Attributes and finding
relations
Sanghamitra Deb
Accenture Technology Laboratory
drugs
side effects
Personalized Medicine
ethnicity
dosages
diseases
age group
compounds
gender
interacti
ons
?
?
?
FDA Drug Labels
It is indicated for treating respiratory disorder caused
due to allergy.
For the relief of symptoms of depression.
Evidence supporting efficacy of carbamazepine as an
anticonvulsant was derived from active drug-controlled
studies that enrolled patients with the following seizure
types:
LOTEMAX is a corticosteroid indicated for the treatment
of post-operative inflammation and pain following
ocular surgery.
FDA Drug Labels: Examples
We present a case of a 10-year-old boy who had
severe relapsing pancreatitis three times in two
months within 3 weeks after starting treatment with
methylphenidate ( ritalin ) due to attention deficit
hyperactivity disorder (adhd).
The boy was generally healthy except for that he
was newly diagnosed with adhd and started the
use of methylphenidate ( ritalin ) for the past
three weeks at a dose, of 30 mg daily.
We believe that the number of persons suffering
from pancreatitis due to the use of ritalin is more
than this published case.
Physicians must pay attention regarding this
possible complication and it should be taken into
consideration in every patient with abdominal
pain who started consuming ritalin.
Meta Data
Dosage
single dose:
240 ml
Drug methylphenidate
# of vol 30mg
Clinical Trials: Meta Data
We present a case of a 10-year-old boy who had
severe relapsing pancreatitis three times in two
months within 3 weeks after starting treatment with
methylphenidate ( ritalin ) due to attention deficit
hyperactivity disorder (adhd).
The boy was generally healthy except for that he
was newly diagnosed with adhd and started the use
of methylphenidate ( ritalin ) for the past three weeks
at a dose, of 30 mg daily.
We believe that the number of persons suffering
from pancreatitis due to the use of ritalin is more
than this published case.
Physicians must pay attention regarding this
possible complication and it should be taken into
consideration in every patient with abdominal pain
who started consuming ritalin.
Drug
Adverse
Effects
Ritalin
pancreatitis,abdomin
al pain
Tylenol
nausea, upper
stomach pain,
itching, loss of
appetite
Aspirin
rash, gastrointestinal
ulcerations,
abdominal pain,
upset stomach,
heartburn
Clinical Trials: Side Effects
Drug—Disease
• Of Label Drug Uses
• Database completion
• Design of clinical trials
relationship between meta- data
• How does heart disease correlate
with gender and age.?
• Which universities have the most
successful clinical trails for breast
cancer?
• How are genes and phenotypes
related?
• What dosage for ritalin was most
effective in treating ADHD with least
side effects?
Problems it Solves
8000 drug - disease treatment relationships from UMLS
data
drug_name:’metipred|methylprednisolone|methylprednisolone preparation|methylprednisolonum|6alpha-
methylprednisolone|6-alpha-methylprednisolone preparation|methylprednisolone|pregna-1,4-diene-3,20-
dione, 11,17,21-trihydroxy-6-methyl-, (6alpha,11beta)-|(6alpha,11beta)-11,17,21-trihydroxy-6-
methylpregna-1,4-diene-3,20-dione|methylprednisolone|meprdl|methylprednisolone|6-methylprednisolone|6
methylprednisolone'
disease_name: 'respiratory distress syndrome, acute|pulmonary capillary leak syndrome|wet lung
syndrome|acute respiratory distress syndrome|shock lung|adult respiratory distress syndrome|shock lung|
human ards|adult respiratory distress syndrome|wet lung|ards - adult respiratory distress syndrome|
acquired respiratory distress syndrome|adult rds|ards|adult respiratory syndrome|a.r.d.s.|danang lung|
danang lung|respiratory distress syndrome|adult respiratory distress syndrome, ards|shock lung|respiratory
distress syndrome, adult|adult respiratory distress syndrome|vietnam lung|rds|lung, shock|adult hyaline
membrane disease|ards, human|adult respiratory distress syndrome|adult hyaline membrane disease|
ardss, human|a r d s|adult rds|congestive atelectasis|ards|respiratory distress syndrome|respiratory
distress syndrome, adult|adult respiratory distress syndrome’
Training Data
Extract sentences that contain the
specific attribute
POS tag and extract unigrams,bigrams
and trigrams centered on nouns
Extract Features: words around nouns:
bag of words/word vectors,
position of the noun.
Train a Machine Learning model to predict which unigrams,bigrams
or trigrams satisfy the specific relationship: for example the drug-disease
treatment relationship.
Map training data to create a balanced
positive and negative training set.
Course of Action
Creating Labelled Data
lemmatized_sentence: [‘maintenance’,
‘therapy','reduce','the','frequency','of', ‘manic', 'episode',
'and', 'diminish', 'the', 'intensity', 'of',
'those', 'episode', 'which', 'may', 'occur', '.']
Several Candidates
Typically one of them is the disease
that the drug treats. For every drug
we create a training data. One line of
the text produces 5 lines of training
data with one true positive.
Balancing the Training Data
Since the training data contains a
higher percentage of zero’s than
one’s it is important to balance it
before modeling, i.e in order to build
the model I choose equal number of
zeros and ones.
Candidat
e
Target
rule-
predictio
nmainten
ance
0 1
therapy 0 1
manic
episode
1 1
intensity 0 1
episode 0 1
Feature Extraction: Word Vectors, Disease Combinations
adhd + manic episode = bipolar disorder
respiratory disorder+allergy=common cold
coronary artery+heart disease=angina pectoris
high blood pressure+lipid=diabetes_management
Extract Features: Initialize vocabulary with pre-trained vectors
gensim: Train word2vec on medical corpus with unigrams,
bi-grams and trigrams
Produce word vectors
Pure Python stack
pandas
scikit-learn
gensim
stanford-nlp-
parser
pipeline = Pipeline([
('union', FeatureUnion(
transformer_list=[
# Pipeline for getting the position of the disease candidate
('position', Pipeline([
('selector', ItemSelector(column='candidate')),
('vect', DictVectorizer()),
])),
# Pipeline for getting words around candidates
('words_around', Pipeline([
('selector', ItemSelector(column='words_around')),
('count', CountVectorizer()),
]))
])),
('clf', ML_library(penalty=‘l1'))])
Data Cleaning and Tokenization
Machine Learning Workflow: Pure Python stack
pandas
scikit-learn
gensim
stanford-nlp-
parser
Feature Extraction/
Candidate Selection
Create Labelled Data
ML: Logistics Regression, …
HyperParameter Tuning
Calculate Metrics: precision,
recall, ROC curve, etc
Results: Examples
drug-name
disease
candidate
Candidates ML
Lithium
Carbonate
bipolar
disorder
1 1
Lithium
Carbonate
individual 1 0
Lithium
Carbonate
maintenance 1 0
Lithium
Carbonate
manic episode 1 1
Drug
Candidat
e
Target Predict
Silver
Sulfadiazine
third
degree 0 0
Silver
Sulfadiazine sepsis 0 1
Silver
Sulfadiazine burn 0 1
Silver
Sulfadiazine cream 0 0
Drug
Candidat
e
Target Predict
Diltiazem
Hydrochlori
de
spasm 1 0
Diltiazem
Hydrochlori
de
coronary
artery 1 0
Diltiazem
Hydrochlori
de
stable
angina 0 0
Diltiazem
Hydrochlori
de
angina 0 0
'silver sulfadiazine cream usp 1 % be a topical
antimicrobial drug indicate as a adjunct for the
prevention and treatment of wound sepsis in patient with
second and third degree burn .’
[‘Diltiazem', ‘hydrochloride', ‘tablet','USP', 'be',
‘indicate', 'for', 'the', ‘management', 'of', 'chronic',
'stable', 'angina', 'and', ‘angina', 'due', ‘to',
‘coronary', 'artery', 'spasm', '.']
Cases where it does not work
Exploring Modeling Technique
Method Precision Recall F1
ROC
Curve
Logistic
Regression
0.95 0.95 0.95 0.92
LR+
word2vec
0.94 0.94 0.94 0.9
SVM 0.96 0.95 0.95 0.92
Random
Forest
0.96 0.96 0.96 0.9
Clinical Trials Data
We present a case of a 10-year-old boy who had severe relapsing
pancreatitis three times in two months within 3 weeks after starting treatment
with methylphenidate ( ritalin ) due to attention deficit hyperactivity
disorder (adhd).
The boy was generally healthy except for that he was newly diagnosed with
adhd and started the use of methylphenidate ( ritalin ) for the past three
weeks at a dose, of 30 mg daily.
We believe that the number of persons suffering from pancreatitis due to the
use of ritalin is more than this published case.
Physicians must pay attention regarding this possible complication and it
should be taken into consideration in every patient with abdominal pain who
started consuming ritalin.
Clinical Trials Data: Labelled Data
Data Dosage Drug
Treats
Disease
Side
Effects
Age Gender Ethnicity duration
10-year-old 0 0 0 0 1 0 0 0
pancreatiti
s-ritalin
0 0 0 1 0 0 0 0
adhd-ritalin 0 0 1 0 0 0 0 0
ritalin 0 1 0 0 0 0 0 0
30 mg 1 0 0 0 0 0 0 0
past three
weeks
0 0 0 0 0 0 0 1
boy 0 0 0 0 0 1 0 0
Clinical Trials Data: Labelled Data Exist
Data Dosage Drug
Treats
Disease
Side
Effects
Age Gender Ethnicity duration
10-year-old 0 0 0 0 1 0 0 0
pancreatiti
s-ritalin
0 0 0 1 0 0 0 0
adhd-ritalin 0 0 1 0 0 0 0 0
ritalin 0 1 0 0 0 0 0 0
30 mg 1 0 0 0 0 0 0 0
past three
weeks
0 0 0 0 0 0 0 1
boy 0 0 0 0 0 1 0 0
Creating Labeled Data
Hand Label data that contain the
specific attribute ~100
Extract Candidates: POS tag and extract unigrams,
bigrams and trigrams centered on nouns
Generate rules: Automatic creation of
labels that satisfy the 100 hand labelled data
This process will create a smaller sample (say 5-10%) of data which
can be further crowdsourced for 100% accurate gold sample
Rule Based Model : with 95% accuracy
Iterate: Repeat process a few times
Example of rules:
Dosage:
(1) Sentence contains numbers
(2) Distance between numbers and “mg”, “milligrams”
<5 characters
(3)Contains the word “dose”
Age:
(1) Sentence contains numbers
(2)Contains the word “age”, “year-old” within 5 words of the
candidate
Deepdive: Extracting relationships between entities
pdf’s, textfiles, semistuctured json, example: journals available at
pubmed and clinicaltrails.gov
Provide examples of data that need to be extracted
Structured data
Deepdive: Prototyping with ddlite
https://github.com/HazyResearch/ddlite
Deepdive: Prototyping with ddlite
Mind Tagger
Show ipython notebook
• NLP relationship extraction with ML techniques are
very successful in presence of gold labeled data
• It is very important to invest time and resources
towards harvesting good training data.
• There is an enormous amount data in pharma
(clinical trials, laboratory notes, doctors notes, drug
manufacturing documents,…). In order to pursue
personalized medicine it is important to centralize
this and make joint inferences across all data sets.
Final Remarks
Thank You: We are hiring …
blog: https://medium.com/@sangha_deb
@sangha_deb,sanghamitra.a.deb@accenture.com

More Related Content

Viewers also liked

Big Data Analytics for Healthcare Decision Support- Operational and Clinical
Big Data Analytics for Healthcare Decision Support- Operational and ClinicalBig Data Analytics for Healthcare Decision Support- Operational and Clinical
Big Data Analytics for Healthcare Decision Support- Operational and ClinicalAdrish Sannyasi
 
Clinical Trial Management Systems of next next decade
Clinical Trial Management Systems of next next decadeClinical Trial Management Systems of next next decade
Clinical Trial Management Systems of next next decadeFotis Stathopoulos
 
Using Machine Learning to Automate Clinical Pathways
Using Machine Learning to Automate Clinical PathwaysUsing Machine Learning to Automate Clinical Pathways
Using Machine Learning to Automate Clinical Pathwaysdiannepatricia
 
Malcolm Pradhan on Pathology in Clincial Decision Support and the role of Dee...
Malcolm Pradhan on Pathology in Clincial Decision Support and the role of Dee...Malcolm Pradhan on Pathology in Clincial Decision Support and the role of Dee...
Malcolm Pradhan on Pathology in Clincial Decision Support and the role of Dee...Cirdan
 
Clinical Data Management: Strategies for unregulated data
Clinical Data Management: Strategies for unregulated dataClinical Data Management: Strategies for unregulated data
Clinical Data Management: Strategies for unregulated dataIUPUI
 
Oncology Big Data: A Mirage or Oasis of Clinical Value?
Oncology Big Data:  A Mirage or Oasis of Clinical Value? Oncology Big Data:  A Mirage or Oasis of Clinical Value?
Oncology Big Data: A Mirage or Oasis of Clinical Value? Michael Peters
 
Clinical research and clinical data management - Ikya Global
Clinical research and clinical data management - Ikya GlobalClinical research and clinical data management - Ikya Global
Clinical research and clinical data management - Ikya Globalikya global
 
Flexible Study Design in Oracle Clinical and Remote Data Capture 4.6
Flexible Study Design in Oracle Clinical and Remote Data Capture 4.6Flexible Study Design in Oracle Clinical and Remote Data Capture 4.6
Flexible Study Design in Oracle Clinical and Remote Data Capture 4.6Perficient
 
Deep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the EnterpriseDeep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the EnterpriseJosh Patterson
 
H2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in PythonH2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in PythonSri Ambati
 
Smart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecSmart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecJosh Patterson
 
Clinical data-management-overview
Clinical data-management-overviewClinical data-management-overview
Clinical data-management-overviewAcri India
 
Medical Informatics: Computational Analytics in Healthcare
Medical Informatics: Computational Analytics in HealthcareMedical Informatics: Computational Analytics in Healthcare
Medical Informatics: Computational Analytics in HealthcareNUS-ISS
 
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...David Talby
 

Viewers also liked (15)

Big Data Analytics for Healthcare Decision Support- Operational and Clinical
Big Data Analytics for Healthcare Decision Support- Operational and ClinicalBig Data Analytics for Healthcare Decision Support- Operational and Clinical
Big Data Analytics for Healthcare Decision Support- Operational and Clinical
 
Clinical Trial Management Systems of next next decade
Clinical Trial Management Systems of next next decadeClinical Trial Management Systems of next next decade
Clinical Trial Management Systems of next next decade
 
Using Machine Learning to Automate Clinical Pathways
Using Machine Learning to Automate Clinical PathwaysUsing Machine Learning to Automate Clinical Pathways
Using Machine Learning to Automate Clinical Pathways
 
Malcolm Pradhan on Pathology in Clincial Decision Support and the role of Dee...
Malcolm Pradhan on Pathology in Clincial Decision Support and the role of Dee...Malcolm Pradhan on Pathology in Clincial Decision Support and the role of Dee...
Malcolm Pradhan on Pathology in Clincial Decision Support and the role of Dee...
 
Clinical Data Management: Strategies for unregulated data
Clinical Data Management: Strategies for unregulated dataClinical Data Management: Strategies for unregulated data
Clinical Data Management: Strategies for unregulated data
 
Oncology Big Data: A Mirage or Oasis of Clinical Value?
Oncology Big Data:  A Mirage or Oasis of Clinical Value? Oncology Big Data:  A Mirage or Oasis of Clinical Value?
Oncology Big Data: A Mirage or Oasis of Clinical Value?
 
Clinical research and clinical data management - Ikya Global
Clinical research and clinical data management - Ikya GlobalClinical research and clinical data management - Ikya Global
Clinical research and clinical data management - Ikya Global
 
Writing Business Plan
Writing Business PlanWriting Business Plan
Writing Business Plan
 
Flexible Study Design in Oracle Clinical and Remote Data Capture 4.6
Flexible Study Design in Oracle Clinical and Remote Data Capture 4.6Flexible Study Design in Oracle Clinical and Remote Data Capture 4.6
Flexible Study Design in Oracle Clinical and Remote Data Capture 4.6
 
Deep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the EnterpriseDeep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the Enterprise
 
H2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in PythonH2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in Python
 
Smart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecSmart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVec
 
Clinical data-management-overview
Clinical data-management-overviewClinical data-management-overview
Clinical data-management-overview
 
Medical Informatics: Computational Analytics in Healthcare
Medical Informatics: Computational Analytics in HealthcareMedical Informatics: Computational Analytics in Healthcare
Medical Informatics: Computational Analytics in Healthcare
 
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
 

Similar to Extracting medical attributes and finding relations

ASU Health Medical Odds Ratio for Lorcaserin Producing Questions.docx
ASU Health Medical Odds Ratio for Lorcaserin Producing Questions.docxASU Health Medical Odds Ratio for Lorcaserin Producing Questions.docx
ASU Health Medical Odds Ratio for Lorcaserin Producing Questions.docxwrite12
 
Consider the following hypothet-ical scenario and results .docx
Consider the following hypothet-ical scenario and results .docxConsider the following hypothet-ical scenario and results .docx
Consider the following hypothet-ical scenario and results .docxdonnajames55
 
Farmacoepi Course Leiden 0210 Part 2
Farmacoepi Course Leiden 0210   Part 2Farmacoepi Course Leiden 0210   Part 2
Farmacoepi Course Leiden 0210 Part 2RobHeerdink
 
You are an employee at Novartis. The company is currently addres.docx
You are an employee at Novartis. The company is currently addres.docxYou are an employee at Novartis. The company is currently addres.docx
You are an employee at Novartis. The company is currently addres.docxodiliagilby
 
You are an employee at Novartis. The company is currently addres.docx
You are an employee at Novartis. The company is currently addres.docxYou are an employee at Novartis. The company is currently addres.docx
You are an employee at Novartis. The company is currently addres.docxavaforman16457
 
BCP401 Principles Of Pharmacology.docx
BCP401 Principles Of Pharmacology.docxBCP401 Principles Of Pharmacology.docx
BCP401 Principles Of Pharmacology.docxwrite5
 
Castration-Resistant Prostate Cancer: Implementing New Data and Evolving Stan...
Castration-Resistant Prostate Cancer: Implementing New Data and Evolving Stan...Castration-Resistant Prostate Cancer: Implementing New Data and Evolving Stan...
Castration-Resistant Prostate Cancer: Implementing New Data and Evolving Stan...i3 Health
 
Sample Size: A couple more hints to handle it right using SAS and R
Sample Size: A couple more hints to handle it right using SAS and RSample Size: A couple more hints to handle it right using SAS and R
Sample Size: A couple more hints to handle it right using SAS and RDave Vanz
 
Dr. Rajen Tankshali
Dr. Rajen TankshaliDr. Rajen Tankshali
Dr. Rajen TankshaliBhavyaroy999
 
Measurement and Modeling Issues with Adherence to Pharmacotherapy
Measurement and Modeling Issues with Adherence to PharmacotherapyMeasurement and Modeling Issues with Adherence to Pharmacotherapy
Measurement and Modeling Issues with Adherence to PharmacotherapyM. Christopher Roebuck
 
Clinical Decision Support By Hari
Clinical Decision Support   By HariClinical Decision Support   By Hari
Clinical Decision Support By HariHari Vishwanathan
 
Introduction to medical statistics
Introduction to medical statistics Introduction to medical statistics
Introduction to medical statistics Mohamed Alhelaly
 
Jean Marc Nabholtz : Médicaments biologiques : Critères d’enregistrement et d...
Jean Marc Nabholtz : Médicaments biologiques : Critères d’enregistrement et d...Jean Marc Nabholtz : Médicaments biologiques : Critères d’enregistrement et d...
Jean Marc Nabholtz : Médicaments biologiques : Critères d’enregistrement et d...breastcancerupdatecongress
 
RxpredictPresentation.pdf
RxpredictPresentation.pdfRxpredictPresentation.pdf
RxpredictPresentation.pdfDanikaGupta
 

Similar to Extracting medical attributes and finding relations (20)

ASU Health Medical Odds Ratio for Lorcaserin Producing Questions.docx
ASU Health Medical Odds Ratio for Lorcaserin Producing Questions.docxASU Health Medical Odds Ratio for Lorcaserin Producing Questions.docx
ASU Health Medical Odds Ratio for Lorcaserin Producing Questions.docx
 
Consider the following hypothet-ical scenario and results .docx
Consider the following hypothet-ical scenario and results .docxConsider the following hypothet-ical scenario and results .docx
Consider the following hypothet-ical scenario and results .docx
 
Farmacoepi Course Leiden 0210 Part 2
Farmacoepi Course Leiden 0210   Part 2Farmacoepi Course Leiden 0210   Part 2
Farmacoepi Course Leiden 0210 Part 2
 
Medication Reconciliation
Medication ReconciliationMedication Reconciliation
Medication Reconciliation
 
You are an employee at Novartis. The company is currently addres.docx
You are an employee at Novartis. The company is currently addres.docxYou are an employee at Novartis. The company is currently addres.docx
You are an employee at Novartis. The company is currently addres.docx
 
You are an employee at Novartis. The company is currently addres.docx
You are an employee at Novartis. The company is currently addres.docxYou are an employee at Novartis. The company is currently addres.docx
You are an employee at Novartis. The company is currently addres.docx
 
Can Personalized Medicine Save the Health Care System?
Can Personalized Medicine Save the Health Care System?Can Personalized Medicine Save the Health Care System?
Can Personalized Medicine Save the Health Care System?
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Predictive Medicine
Predictive Medicine Predictive Medicine
Predictive Medicine
 
Displaying your results
Displaying your resultsDisplaying your results
Displaying your results
 
BCP401 Principles Of Pharmacology.docx
BCP401 Principles Of Pharmacology.docxBCP401 Principles Of Pharmacology.docx
BCP401 Principles Of Pharmacology.docx
 
Castration-Resistant Prostate Cancer: Implementing New Data and Evolving Stan...
Castration-Resistant Prostate Cancer: Implementing New Data and Evolving Stan...Castration-Resistant Prostate Cancer: Implementing New Data and Evolving Stan...
Castration-Resistant Prostate Cancer: Implementing New Data and Evolving Stan...
 
Sample Size: A couple more hints to handle it right using SAS and R
Sample Size: A couple more hints to handle it right using SAS and RSample Size: A couple more hints to handle it right using SAS and R
Sample Size: A couple more hints to handle it right using SAS and R
 
Dr. Rajen Tankshali
Dr. Rajen TankshaliDr. Rajen Tankshali
Dr. Rajen Tankshali
 
Measurement and Modeling Issues with Adherence to Pharmacotherapy
Measurement and Modeling Issues with Adherence to PharmacotherapyMeasurement and Modeling Issues with Adherence to Pharmacotherapy
Measurement and Modeling Issues with Adherence to Pharmacotherapy
 
Clinical Decision Support By Hari
Clinical Decision Support   By HariClinical Decision Support   By Hari
Clinical Decision Support By Hari
 
Introduction to medical statistics
Introduction to medical statistics Introduction to medical statistics
Introduction to medical statistics
 
NNT: Number Needed to Treat
NNT: Number Needed to TreatNNT: Number Needed to Treat
NNT: Number Needed to Treat
 
Jean Marc Nabholtz : Médicaments biologiques : Critères d’enregistrement et d...
Jean Marc Nabholtz : Médicaments biologiques : Critères d’enregistrement et d...Jean Marc Nabholtz : Médicaments biologiques : Critères d’enregistrement et d...
Jean Marc Nabholtz : Médicaments biologiques : Critères d’enregistrement et d...
 
RxpredictPresentation.pdf
RxpredictPresentation.pdfRxpredictPresentation.pdf
RxpredictPresentation.pdf
 

More from Sanghamitra Deb

Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningSanghamitra Deb
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureSanghamitra Deb
 
Intro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingIntro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingSanghamitra Deb
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for BeginnersSanghamitra Deb
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & MetricsSanghamitra Deb
 
Developing Recommendation System to provide a Personalized Learning experienc...
Developing Recommendation System to provide a PersonalizedLearning experienc...Developing Recommendation System to provide a PersonalizedLearning experienc...
Developing Recommendation System to provide a Personalized Learning experienc...Sanghamitra Deb
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsSanghamitra Deb
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningSanghamitra Deb
 
NLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsNLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsSanghamitra Deb
 
Democratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUsDemocratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUsSanghamitra Deb
 
Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.Sanghamitra Deb
 

More from Sanghamitra Deb (13)

odsc_2023.pdf
odsc_2023.pdfodsc_2023.pdf
odsc_2023.pdf
 
Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learning
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and Future
 
Intro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingIntro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic Modeling
 
Intro to ml_2021
Intro to ml_2021Intro to ml_2021
Intro to ml_2021
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & Metrics
 
Developing Recommendation System to provide a Personalized Learning experienc...
Developing Recommendation System to provide a PersonalizedLearning experienc...Developing Recommendation System to provide a PersonalizedLearning experienc...
Developing Recommendation System to provide a Personalized Learning experienc...
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
NLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsNLP and Machine Learning for non-experts
NLP and Machine Learning for non-experts
 
Democratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUsDemocratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUs
 
Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.
 

Recently uploaded

home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadaditya806802
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfRajuKanojiya4
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 
Steel Structures - Building technology.pptx
Steel Structures - Building technology.pptxSteel Structures - Building technology.pptx
Steel Structures - Building technology.pptxNikhil Raut
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
Internet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxInternet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxVelmuruganTECE
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingBootNeck1
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 

Recently uploaded (20)

home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasad
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdf
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
Steel Structures - Building technology.pptx
Steel Structures - Building technology.pptxSteel Structures - Building technology.pptx
Steel Structures - Building technology.pptx
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
Internet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxInternet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptx
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event Scheduling
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 

Extracting medical attributes and finding relations

  • 1. Extracting Medical Attributes and finding relations Sanghamitra Deb Accenture Technology Laboratory
  • 4. It is indicated for treating respiratory disorder caused due to allergy. For the relief of symptoms of depression. Evidence supporting efficacy of carbamazepine as an anticonvulsant was derived from active drug-controlled studies that enrolled patients with the following seizure types: LOTEMAX is a corticosteroid indicated for the treatment of post-operative inflammation and pain following ocular surgery. FDA Drug Labels: Examples
  • 5. We present a case of a 10-year-old boy who had severe relapsing pancreatitis three times in two months within 3 weeks after starting treatment with methylphenidate ( ritalin ) due to attention deficit hyperactivity disorder (adhd). The boy was generally healthy except for that he was newly diagnosed with adhd and started the use of methylphenidate ( ritalin ) for the past three weeks at a dose, of 30 mg daily. We believe that the number of persons suffering from pancreatitis due to the use of ritalin is more than this published case. Physicians must pay attention regarding this possible complication and it should be taken into consideration in every patient with abdominal pain who started consuming ritalin. Meta Data Dosage single dose: 240 ml Drug methylphenidate # of vol 30mg Clinical Trials: Meta Data
  • 6. We present a case of a 10-year-old boy who had severe relapsing pancreatitis three times in two months within 3 weeks after starting treatment with methylphenidate ( ritalin ) due to attention deficit hyperactivity disorder (adhd). The boy was generally healthy except for that he was newly diagnosed with adhd and started the use of methylphenidate ( ritalin ) for the past three weeks at a dose, of 30 mg daily. We believe that the number of persons suffering from pancreatitis due to the use of ritalin is more than this published case. Physicians must pay attention regarding this possible complication and it should be taken into consideration in every patient with abdominal pain who started consuming ritalin. Drug Adverse Effects Ritalin pancreatitis,abdomin al pain Tylenol nausea, upper stomach pain, itching, loss of appetite Aspirin rash, gastrointestinal ulcerations, abdominal pain, upset stomach, heartburn Clinical Trials: Side Effects
  • 7. Drug—Disease • Of Label Drug Uses • Database completion • Design of clinical trials relationship between meta- data • How does heart disease correlate with gender and age.? • Which universities have the most successful clinical trails for breast cancer? • How are genes and phenotypes related? • What dosage for ritalin was most effective in treating ADHD with least side effects? Problems it Solves
  • 8. 8000 drug - disease treatment relationships from UMLS data drug_name:’metipred|methylprednisolone|methylprednisolone preparation|methylprednisolonum|6alpha- methylprednisolone|6-alpha-methylprednisolone preparation|methylprednisolone|pregna-1,4-diene-3,20- dione, 11,17,21-trihydroxy-6-methyl-, (6alpha,11beta)-|(6alpha,11beta)-11,17,21-trihydroxy-6- methylpregna-1,4-diene-3,20-dione|methylprednisolone|meprdl|methylprednisolone|6-methylprednisolone|6 methylprednisolone' disease_name: 'respiratory distress syndrome, acute|pulmonary capillary leak syndrome|wet lung syndrome|acute respiratory distress syndrome|shock lung|adult respiratory distress syndrome|shock lung| human ards|adult respiratory distress syndrome|wet lung|ards - adult respiratory distress syndrome| acquired respiratory distress syndrome|adult rds|ards|adult respiratory syndrome|a.r.d.s.|danang lung| danang lung|respiratory distress syndrome|adult respiratory distress syndrome, ards|shock lung|respiratory distress syndrome, adult|adult respiratory distress syndrome|vietnam lung|rds|lung, shock|adult hyaline membrane disease|ards, human|adult respiratory distress syndrome|adult hyaline membrane disease| ardss, human|a r d s|adult rds|congestive atelectasis|ards|respiratory distress syndrome|respiratory distress syndrome, adult|adult respiratory distress syndrome’ Training Data
  • 9. Extract sentences that contain the specific attribute POS tag and extract unigrams,bigrams and trigrams centered on nouns Extract Features: words around nouns: bag of words/word vectors, position of the noun. Train a Machine Learning model to predict which unigrams,bigrams or trigrams satisfy the specific relationship: for example the drug-disease treatment relationship. Map training data to create a balanced positive and negative training set. Course of Action
  • 10. Creating Labelled Data lemmatized_sentence: [‘maintenance’, ‘therapy','reduce','the','frequency','of', ‘manic', 'episode', 'and', 'diminish', 'the', 'intensity', 'of', 'those', 'episode', 'which', 'may', 'occur', '.'] Several Candidates Typically one of them is the disease that the drug treats. For every drug we create a training data. One line of the text produces 5 lines of training data with one true positive. Balancing the Training Data Since the training data contains a higher percentage of zero’s than one’s it is important to balance it before modeling, i.e in order to build the model I choose equal number of zeros and ones. Candidat e Target rule- predictio nmainten ance 0 1 therapy 0 1 manic episode 1 1 intensity 0 1 episode 0 1
  • 11. Feature Extraction: Word Vectors, Disease Combinations adhd + manic episode = bipolar disorder respiratory disorder+allergy=common cold coronary artery+heart disease=angina pectoris high blood pressure+lipid=diabetes_management Extract Features: Initialize vocabulary with pre-trained vectors gensim: Train word2vec on medical corpus with unigrams, bi-grams and trigrams Produce word vectors
  • 12. Pure Python stack pandas scikit-learn gensim stanford-nlp- parser pipeline = Pipeline([ ('union', FeatureUnion( transformer_list=[ # Pipeline for getting the position of the disease candidate ('position', Pipeline([ ('selector', ItemSelector(column='candidate')), ('vect', DictVectorizer()), ])), # Pipeline for getting words around candidates ('words_around', Pipeline([ ('selector', ItemSelector(column='words_around')), ('count', CountVectorizer()), ])) ])), ('clf', ML_library(penalty=‘l1'))])
  • 13. Data Cleaning and Tokenization Machine Learning Workflow: Pure Python stack pandas scikit-learn gensim stanford-nlp- parser Feature Extraction/ Candidate Selection Create Labelled Data ML: Logistics Regression, … HyperParameter Tuning Calculate Metrics: precision, recall, ROC curve, etc
  • 14. Results: Examples drug-name disease candidate Candidates ML Lithium Carbonate bipolar disorder 1 1 Lithium Carbonate individual 1 0 Lithium Carbonate maintenance 1 0 Lithium Carbonate manic episode 1 1
  • 15. Drug Candidat e Target Predict Silver Sulfadiazine third degree 0 0 Silver Sulfadiazine sepsis 0 1 Silver Sulfadiazine burn 0 1 Silver Sulfadiazine cream 0 0 Drug Candidat e Target Predict Diltiazem Hydrochlori de spasm 1 0 Diltiazem Hydrochlori de coronary artery 1 0 Diltiazem Hydrochlori de stable angina 0 0 Diltiazem Hydrochlori de angina 0 0 'silver sulfadiazine cream usp 1 % be a topical antimicrobial drug indicate as a adjunct for the prevention and treatment of wound sepsis in patient with second and third degree burn .’ [‘Diltiazem', ‘hydrochloride', ‘tablet','USP', 'be', ‘indicate', 'for', 'the', ‘management', 'of', 'chronic', 'stable', 'angina', 'and', ‘angina', 'due', ‘to', ‘coronary', 'artery', 'spasm', '.'] Cases where it does not work
  • 16. Exploring Modeling Technique Method Precision Recall F1 ROC Curve Logistic Regression 0.95 0.95 0.95 0.92 LR+ word2vec 0.94 0.94 0.94 0.9 SVM 0.96 0.95 0.95 0.92 Random Forest 0.96 0.96 0.96 0.9
  • 17. Clinical Trials Data We present a case of a 10-year-old boy who had severe relapsing pancreatitis three times in two months within 3 weeks after starting treatment with methylphenidate ( ritalin ) due to attention deficit hyperactivity disorder (adhd). The boy was generally healthy except for that he was newly diagnosed with adhd and started the use of methylphenidate ( ritalin ) for the past three weeks at a dose, of 30 mg daily. We believe that the number of persons suffering from pancreatitis due to the use of ritalin is more than this published case. Physicians must pay attention regarding this possible complication and it should be taken into consideration in every patient with abdominal pain who started consuming ritalin.
  • 18. Clinical Trials Data: Labelled Data Data Dosage Drug Treats Disease Side Effects Age Gender Ethnicity duration 10-year-old 0 0 0 0 1 0 0 0 pancreatiti s-ritalin 0 0 0 1 0 0 0 0 adhd-ritalin 0 0 1 0 0 0 0 0 ritalin 0 1 0 0 0 0 0 0 30 mg 1 0 0 0 0 0 0 0 past three weeks 0 0 0 0 0 0 0 1 boy 0 0 0 0 0 1 0 0
  • 19. Clinical Trials Data: Labelled Data Exist Data Dosage Drug Treats Disease Side Effects Age Gender Ethnicity duration 10-year-old 0 0 0 0 1 0 0 0 pancreatiti s-ritalin 0 0 0 1 0 0 0 0 adhd-ritalin 0 0 1 0 0 0 0 0 ritalin 0 1 0 0 0 0 0 0 30 mg 1 0 0 0 0 0 0 0 past three weeks 0 0 0 0 0 0 0 1 boy 0 0 0 0 0 1 0 0
  • 20. Creating Labeled Data Hand Label data that contain the specific attribute ~100 Extract Candidates: POS tag and extract unigrams, bigrams and trigrams centered on nouns Generate rules: Automatic creation of labels that satisfy the 100 hand labelled data This process will create a smaller sample (say 5-10%) of data which can be further crowdsourced for 100% accurate gold sample Rule Based Model : with 95% accuracy Iterate: Repeat process a few times
  • 21. Example of rules: Dosage: (1) Sentence contains numbers (2) Distance between numbers and “mg”, “milligrams” <5 characters (3)Contains the word “dose” Age: (1) Sentence contains numbers (2)Contains the word “age”, “year-old” within 5 words of the candidate
  • 22. Deepdive: Extracting relationships between entities pdf’s, textfiles, semistuctured json, example: journals available at pubmed and clinicaltrails.gov Provide examples of data that need to be extracted Structured data
  • 23. Deepdive: Prototyping with ddlite https://github.com/HazyResearch/ddlite
  • 26. • NLP relationship extraction with ML techniques are very successful in presence of gold labeled data • It is very important to invest time and resources towards harvesting good training data. • There is an enormous amount data in pharma (clinical trials, laboratory notes, doctors notes, drug manufacturing documents,…). In order to pursue personalized medicine it is important to centralize this and make joint inferences across all data sets. Final Remarks
  • 27. Thank You: We are hiring … blog: https://medium.com/@sangha_deb @sangha_deb,sanghamitra.a.deb@accenture.com