SlideShare une entreprise Scribd logo
1  sur  34
Automated Health Responses
Austin Powell
3rd Annual Regional Python Conference August 16 - 19, 2018 | San Francisco, CA
About Me
Position:
• Healthcare data scientist
Tools:
• Python (any library that is useful), R,
PySpark, SQL, Julia
• A well-worded email
Hobbies:
• Backpacking
• Long distance runner
• Eating a lot after running
Introduction
● Motivation of project from
healthcare perspective
● Data
● Scope
● Current result
Introduction
Why healthcare data science is so difficult
Privacy
• Privacy is paramount
• Data is very difficult to anonymize
• Privacy is paramount
Data is the new gold
• Healthcare data is extremely valuable
• Models trained on sandboxed data are very valuable and it
is unclear to what extent they can be reverse-engineered
Healthcare Data Is Messy:
• EMR (Electronic Medical Records) data are notoriously
complicated and messy
• Not standardized
• Domain and specialty heavy
Problem
Statement
Getting from B -> A
● B: Be able to have a system
such as Google Smart Reply or
Health provider community
forum
● …
● …
● …
● …
● …
● A: Use open-sourced data to
demonstrate intelligent and
contextually health-type
responses
Where to get text data for discussions
about your health with medical
professionals that will not violate
anyone’s privacy?
The Data
Reddit: AskDocs
The Data
Summary statistics:
• Data gathered from 2014 to 2018
• % of posts by Reddit certified medical professionals: 49.0%
• ~ 30k Threads
• ~ 106k posts (after cleaning)
• ~ 26.3k Users
• ~ 16.3k Medical Professionals
Notes about data:
• More intent in conversations than typically in normal forum or 1-to-1 conversations (especially
Reddit)
• Signs of more male presence, especially in male-specific problems
• Surprisingly tame for Reddit!
What are they posting about?
Modeling
Scope
 Descriptive vs prescriptive:
● Framing the problem for the “patient” as
opposed to trying to solve it for them
 Addressing specifics within the
problem:
● Number of speakers
● Change in topic
● Outcome
Structuring For
ML
For each thread:
○ Thread start becomes initial query.
Following first-level responses
become become available
responses to initial query
○ E.g.
 User1: initial question
 User 2: response
 User 3: response
 User 2: response
 User 3: response
 User1:
response
Looking at Turns
● 96k query/response
pairs
Turns
● How dynamic a
thread/conversation
● Mean: 4.5
● Median: 4.0
● Min: 2
● Max: 106
Query:
Alright. Give me a bit. Any
food you can't eat? Egg
products?
Possible Answer 4:
Thanks It's an evolving situation. It seems that it takes awhile for
everyone to get on the same page regarding his medications and
test results. Apparently the positive staph results they got could
have very well been caused by poor procedure. It is all the little
things that are adding up, like the meals, or coming in and asking
a bedridden 93 year old incontinent man, suffering from
dementia, if he needs to get up and go to the rest room and then
leaving when he fails to answer. I'm trying to find a patient
advocate or something now. Supposed to meet with a 'social
worker' as well, but I don't know when.
Possible Answer 1:
I agree with @stephaniecaseys that it is a skin tag. Also not a doc, but I
had mine removed at the dermatologist. It was very quick. It was
messing with my bra strap. They can grow bigger as well, especially if
you have a tendency to mess with it. It will swell.
Possible Answer 3:
You're not the first person to have that happen to. Happens to me just
about every time I'm out in the bush. Unless you are in an area with
known cases of Lyme disease, I doubt it is of any concern.
Possible Answer 2:
I can eat pretty much anything
Sample exchanges about a cold
Tagging Medical Entities with UMLS
• 'persistent dry cough', 'C0857172', 1.0, {'Sign or Symptom’}
• 'dextromethorphan', 'C0011816’, {'Clinical Drug', 'Organic Chemical', 'Pharmacologic Substance’}
• 'benzonatate', 'C0053229', 1.0, {'Organic Chemical', 'Pharmacologic Substance’}
• 'hydrocodone', 'C0020264', 1.0, {'Laboratory Procedure', 'Organic Chemical', 'Pharmacologic Substance’}
• 'temporary', 'C3245481', 1.0, {'Body Location or Region', 'Intellectual Product’}
• 'counter', 'C1705206', 1.0, {'Health Care Activity', 'Medical Device’}
• 'inhaler', 'C0021461', 1.0, {'Medical Device', 'Organism Function’}]
• 'throat', 'C0031354', 1.0, {'Body Part, Organ, or Organ Component', 'Body Substance', 'Intellectual Product', 'Pharmacologic
Substance’}
• 'injury', 'C3263723', 1.0, {'Injury or Poisoning’}
• 'throat', 'C0031354', 1.0, {'Body Part, Organ, or Organ Component', 'Body Substance', 'Intellectual Product', 'Pharmacologic
Substance’}
• 'ginger', 'C0939895', 1.0, {'Immunologic Factor', 'Inorganic Chemical', 'Organic Chemical', 'Pharmacologic Substance’}
• …
Modeling
Modeling
Considerations
● Generative modeling is cool, but
has disadvantages:
○ Lots of data
○ May not be precise enough
health context
○ Pretty far away from being able
to augment responses
Keras Blog: Seq2Seq
Cryptic response with generative
approach:
Q: Husband deteriorating before my eyes,
doctors at a loss, no one will help; Reddit
docs, I need you.
A: I don't think this is a single pain is not
a doctor but I have a similar symptoms
and the story
Response Retrieval
Approach
● Easier to measure and optimize
than generative approach
● Easier to apply rules by
business user to directed
outcome:
● Allows clinician to automate
follow-up questions that might
often ask in email/phone/visit
exchanges
Changes from Original Paper: The Ubuntu Dialogue Corpus:
A Large Dataset for Research in Unstructured Multi-Turn
Dialogue Systems
Query Response
I’ve had pain in my
knee after running.
Why?
Non-relevant answer 1
Non-relevant answer 2
Relevant answer 1
Non-relevant answer 3
Relevant answer 2
Dual Encoder
● Embeddings are initialized with
pre-trained Common Crawl
Glove 840b, 300d and allowed
to train on new utterances &
responses
● 1 hidden layers: one for context,
one for response
● Different choices are made for
output, but often is a binary
probability over up to 10
possible responses
Image credit: The Ubuntu Dialogue Corpus: A Large Dataset for Research in
Unstructured Multi-Turn Dialogue Systems
How Good is our Dual Encoder Doing?
Method/Metric Expected
recall due to
chance
TF-IDF LSTM GRU
1 in 2 R@1 50.0% 64.0% 91.1% 92.1%
1 in 5 R@1 20.0% 54.5% 75.5% 79.6%
1 in 10 R@1 10.0% 48.7% 65.1% 68.7%
The above metrics seek to quantify how good the models are at extracting the important
responses. It is a standard metric for response retrieval models.
E.G. “1 in 2 R@1” translates to “At a rate of 1 correct answer out of 2 possible answers,
how what is the % of correct answers if only selecting 1.
Not bad!
Future Directions
1. Create intents for all utterances
using a smart indexing from
something like UMLS
2. Refine problem statement to
exploit the potential of data
3. User transfer learning from large
Glove vectors on large corpus of
Reddit comments to get more at
semantics of comments.
4. Experiment with predicting next
response given all thread history
5. Hierarchical attention instead of
RNN as these are showing a lot
of promise.
Appendix
UMLS
Unified Medical Language System
● Brings together many health
vocabularies and standards
● Contains 3 tools:
○ Metathesaurus: Terms and codes from
many vocabularies, including CPT®,
ICD-10-CM, LOINC®, MeSH®, RxNorm,
and SNOMED CT®
○ Semantic Network: Broad categories
(semantic types) and their relationships
(semantic relations)
○ SPECIALIST Lexicon and Lexical
Tools: Natural language processing
tools
Modeling
Considerations
● Rule-based:
○ Expensive to build
○ Expensive to maintain
○ Requires precise knowledge
○ E.g.: “If patient has cough,
check for fever”, if in flu and
fever then patient has flu
Health Concepts in
Utterances
Health Concept Entities
• 61 Total Distinct Concept Entities
• Mean of 20 health entity behaviors per
utterance
Identifying Health
Intents
● Ranking of utterances number
of health behavior content
normalized by length
[("Hey, how's your husband doing now? Hope everything is okay.", 0.0),
('I know this is beside your point, but youre 16 and know you have high cholesterol. How? Why?',
1.1831455647530769),
("So why are you posting on here then, if you had two 'real' doctors giving you advice? What answer
are you looking for here? ",
1.2548015599879458),
('How long ago did you change your diet, as in when did you have the kidney stones?',
1.2584445094145018),
('How old is your partner?nnDo you know her diagnosis (ie why they did her surgery)?',
1.2593890745195433),
('How long ago was your thyroid levels checked? Do you have any pain in your abdomen?',
1.2635706685998762),
('Obviously you got blood work done here, any strange findings?',
1.2784053615073325),
("And you want to know if he did the burn on purpose or not? There's absolutely no way of telling,
you'll have to ask him I guess.",
1.311204647849376),
('dysarthria (slurring your words) is not a common side effect of synthroid. Do you take birth control
pills? Do you have a history of migraines? nn Have you had any numbness, weakness, paralysis or
changes in the way you walk?nHave you noticed any other weird things going on with your body
when you have these episodes? nnHave you had trouble with your vision recently? Any trouble
swallowing? Any sudden loss of vision in one eye ever, even a long time ago? ',
1.312565699208395),
('How long have you been suffering? Hope you get rid of dat nasty pain soon.',
1.3201720990394112)]
Turns in
conversation
# of Initial Words
vs Quantity of
Responses
● No real relationship between #
of words to number of
responses: that is, using a long
response doesn’t help or hurt
your chances of getting a lot of
input
Query:
Because TMAU and many
other metabolic diseases
are very rare. Your other
question is a bit too
general to answer, but
yes, there are advances
all the time in
gastroenterology and
metabolic diseases.
Possible Answer 4:
Could it be tonsil stones perhaps? I have both swollen tonsils
after Is it rare because people don't have the disease, or
because it goes undiagnosed? Or is it difficult to diagnose?
There's a large community over on the MEBO Research site,
with members easily in the thousands. What defines a "rare"
disease? What percentage of the population has to have a
similar or identical diagnosis for it be considered common?
Possible Answer 2:
Thanks a lot! :) You've given me peace of mind.
Possible Answer 3:
Just don't take anymore, you aren't withdrawing from anything. Cut the
b***.
Possible Answer 1:
Curiosity... closure... a few other reasons. I
understand that it would depend on the cause of
death and that if its not a P.E., they might not be
able to tell what it was now depending on how the
body was prepared. But... specifically regarding the
pulmonary embolism... can they at least say
whether it was a P.E. or not?

Contenu connexe

Similaire à Automated health responses

1. Week 5 Assignment - Case Study Statistical ForecastingDr.
1. Week 5 Assignment - Case Study Statistical ForecastingDr. 1. Week 5 Assignment - Case Study Statistical ForecastingDr.
1. Week 5 Assignment - Case Study Statistical ForecastingDr.
TatianaMajor22
 
1115 wyatt wheres the science in hi for christchurch nz oct 2015
1115 wyatt wheres the science in hi   for christchurch nz oct 20151115 wyatt wheres the science in hi   for christchurch nz oct 2015
1115 wyatt wheres the science in hi for christchurch nz oct 2015
Health Informatics New Zealand
 
Diagnostic Reasoning I and II
Diagnostic Reasoning I and IIDiagnostic Reasoning I and II
Diagnostic Reasoning I and II
Open.Michigan
 
ENDTERMStarted Dec 12 at 604pm Quiz Instructions.docx
ENDTERMStarted Dec 12 at 604pm Quiz Instructions.docxENDTERMStarted Dec 12 at 604pm Quiz Instructions.docx
ENDTERMStarted Dec 12 at 604pm Quiz Instructions.docx
SALU18
 
Unit 7 Heredity, Structure and Function - DiscussionMut.docx
Unit 7 Heredity, Structure and Function - DiscussionMut.docxUnit 7 Heredity, Structure and Function - DiscussionMut.docx
Unit 7 Heredity, Structure and Function - DiscussionMut.docx
dickonsondorris
 

Similaire à Automated health responses (18)

NR 512 Education Specialist / snaptutorial.com
 NR 512 Education Specialist / snaptutorial.com NR 512 Education Specialist / snaptutorial.com
NR 512 Education Specialist / snaptutorial.com
 
User Driven Development For Palinet
User Driven Development For PalinetUser Driven Development For Palinet
User Driven Development For Palinet
 
Data/AI driven product development: from video streaming to telehealth
Data/AI driven product development: from video streaming to telehealthData/AI driven product development: from video streaming to telehealth
Data/AI driven product development: from video streaming to telehealth
 
Developing a Virtual Environment for Discharge Planning after Stroke : A Pre...
Developing a Virtual Environment for Discharge Planning after Stroke :A Pre...Developing a Virtual Environment for Discharge Planning after Stroke :A Pre...
Developing a Virtual Environment for Discharge Planning after Stroke : A Pre...
 
How To Write A Introduction Paragraph For An Essay
How To Write A Introduction Paragraph For An EssayHow To Write A Introduction Paragraph For An Essay
How To Write A Introduction Paragraph For An Essay
 
1. Week 5 Assignment - Case Study Statistical ForecastingDr.
1. Week 5 Assignment - Case Study Statistical ForecastingDr. 1. Week 5 Assignment - Case Study Statistical ForecastingDr.
1. Week 5 Assignment - Case Study Statistical ForecastingDr.
 
1115 wyatt wheres the science in hi for christchurch nz oct 2015
1115 wyatt wheres the science in hi   for christchurch nz oct 20151115 wyatt wheres the science in hi   for christchurch nz oct 2015
1115 wyatt wheres the science in hi for christchurch nz oct 2015
 
GEMC- Test-Taking Skills- Resident Training
GEMC- Test-Taking Skills- Resident TrainingGEMC- Test-Taking Skills- Resident Training
GEMC- Test-Taking Skills- Resident Training
 
MedicalResearch.com: Medical Research Updates and Interviews March 27 2014
MedicalResearch.com:  Medical Research Updates and Interviews March 27 2014MedicalResearch.com:  Medical Research Updates and Interviews March 27 2014
MedicalResearch.com: Medical Research Updates and Interviews March 27 2014
 
Put Musics Garbage Can Nursing Essay Examples A
Put Musics Garbage Can Nursing Essay Examples APut Musics Garbage Can Nursing Essay Examples A
Put Musics Garbage Can Nursing Essay Examples A
 
Diagnostic Reasoning I and II
Diagnostic Reasoning I and IIDiagnostic Reasoning I and II
Diagnostic Reasoning I and II
 
ENDTERMStarted Dec 12 at 604pm Quiz Instructions.docx
ENDTERMStarted Dec 12 at 604pm Quiz Instructions.docxENDTERMStarted Dec 12 at 604pm Quiz Instructions.docx
ENDTERMStarted Dec 12 at 604pm Quiz Instructions.docx
 
Clinical Reasoning Discussion Project
Clinical Reasoning Discussion ProjectClinical Reasoning Discussion Project
Clinical Reasoning Discussion Project
 
Observational studies in social media
Observational studies in social mediaObservational studies in social media
Observational studies in social media
 
Test Taking Strategies For Nursing Students
Test Taking Strategies For Nursing StudentsTest Taking Strategies For Nursing Students
Test Taking Strategies For Nursing Students
 
NurseReview.Org - Test Taking Strategies For Nursing Students
NurseReview.Org - Test Taking Strategies For Nursing StudentsNurseReview.Org - Test Taking Strategies For Nursing Students
NurseReview.Org - Test Taking Strategies For Nursing Students
 
Unit 7 Heredity, Structure and Function - DiscussionMut.docx
Unit 7 Heredity, Structure and Function - DiscussionMut.docxUnit 7 Heredity, Structure and Function - DiscussionMut.docx
Unit 7 Heredity, Structure and Function - DiscussionMut.docx
 
Xplore health Lesson Plan
Xplore health Lesson PlanXplore health Lesson Plan
Xplore health Lesson Plan
 

Dernier

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 

Dernier (20)

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 

Automated health responses

  • 1. Automated Health Responses Austin Powell 3rd Annual Regional Python Conference August 16 - 19, 2018 | San Francisco, CA
  • 2. About Me Position: • Healthcare data scientist Tools: • Python (any library that is useful), R, PySpark, SQL, Julia • A well-worded email Hobbies: • Backpacking • Long distance runner • Eating a lot after running
  • 3. Introduction ● Motivation of project from healthcare perspective ● Data ● Scope ● Current result
  • 5. Why healthcare data science is so difficult Privacy • Privacy is paramount • Data is very difficult to anonymize • Privacy is paramount Data is the new gold • Healthcare data is extremely valuable • Models trained on sandboxed data are very valuable and it is unclear to what extent they can be reverse-engineered Healthcare Data Is Messy: • EMR (Electronic Medical Records) data are notoriously complicated and messy • Not standardized • Domain and specialty heavy
  • 6. Problem Statement Getting from B -> A ● B: Be able to have a system such as Google Smart Reply or Health provider community forum ● … ● … ● … ● … ● … ● A: Use open-sourced data to demonstrate intelligent and contextually health-type responses
  • 7. Where to get text data for discussions about your health with medical professionals that will not violate anyone’s privacy?
  • 10. The Data Summary statistics: • Data gathered from 2014 to 2018 • % of posts by Reddit certified medical professionals: 49.0% • ~ 30k Threads • ~ 106k posts (after cleaning) • ~ 26.3k Users • ~ 16.3k Medical Professionals Notes about data: • More intent in conversations than typically in normal forum or 1-to-1 conversations (especially Reddit) • Signs of more male presence, especially in male-specific problems • Surprisingly tame for Reddit!
  • 11. What are they posting about?
  • 12.
  • 14. Scope  Descriptive vs prescriptive: ● Framing the problem for the “patient” as opposed to trying to solve it for them  Addressing specifics within the problem: ● Number of speakers ● Change in topic ● Outcome
  • 15. Structuring For ML For each thread: ○ Thread start becomes initial query. Following first-level responses become become available responses to initial query ○ E.g.  User1: initial question  User 2: response  User 3: response  User 2: response  User 3: response  User1: response
  • 16. Looking at Turns ● 96k query/response pairs Turns ● How dynamic a thread/conversation ● Mean: 4.5 ● Median: 4.0 ● Min: 2 ● Max: 106
  • 17. Query: Alright. Give me a bit. Any food you can't eat? Egg products? Possible Answer 4: Thanks It's an evolving situation. It seems that it takes awhile for everyone to get on the same page regarding his medications and test results. Apparently the positive staph results they got could have very well been caused by poor procedure. It is all the little things that are adding up, like the meals, or coming in and asking a bedridden 93 year old incontinent man, suffering from dementia, if he needs to get up and go to the rest room and then leaving when he fails to answer. I'm trying to find a patient advocate or something now. Supposed to meet with a 'social worker' as well, but I don't know when. Possible Answer 1: I agree with @stephaniecaseys that it is a skin tag. Also not a doc, but I had mine removed at the dermatologist. It was very quick. It was messing with my bra strap. They can grow bigger as well, especially if you have a tendency to mess with it. It will swell. Possible Answer 3: You're not the first person to have that happen to. Happens to me just about every time I'm out in the bush. Unless you are in an area with known cases of Lyme disease, I doubt it is of any concern. Possible Answer 2: I can eat pretty much anything
  • 19. Tagging Medical Entities with UMLS • 'persistent dry cough', 'C0857172', 1.0, {'Sign or Symptom’} • 'dextromethorphan', 'C0011816’, {'Clinical Drug', 'Organic Chemical', 'Pharmacologic Substance’} • 'benzonatate', 'C0053229', 1.0, {'Organic Chemical', 'Pharmacologic Substance’} • 'hydrocodone', 'C0020264', 1.0, {'Laboratory Procedure', 'Organic Chemical', 'Pharmacologic Substance’} • 'temporary', 'C3245481', 1.0, {'Body Location or Region', 'Intellectual Product’} • 'counter', 'C1705206', 1.0, {'Health Care Activity', 'Medical Device’} • 'inhaler', 'C0021461', 1.0, {'Medical Device', 'Organism Function’}] • 'throat', 'C0031354', 1.0, {'Body Part, Organ, or Organ Component', 'Body Substance', 'Intellectual Product', 'Pharmacologic Substance’} • 'injury', 'C3263723', 1.0, {'Injury or Poisoning’} • 'throat', 'C0031354', 1.0, {'Body Part, Organ, or Organ Component', 'Body Substance', 'Intellectual Product', 'Pharmacologic Substance’} • 'ginger', 'C0939895', 1.0, {'Immunologic Factor', 'Inorganic Chemical', 'Organic Chemical', 'Pharmacologic Substance’} • …
  • 21. Modeling Considerations ● Generative modeling is cool, but has disadvantages: ○ Lots of data ○ May not be precise enough health context ○ Pretty far away from being able to augment responses Keras Blog: Seq2Seq Cryptic response with generative approach: Q: Husband deteriorating before my eyes, doctors at a loss, no one will help; Reddit docs, I need you. A: I don't think this is a single pain is not a doctor but I have a similar symptoms and the story
  • 22. Response Retrieval Approach ● Easier to measure and optimize than generative approach ● Easier to apply rules by business user to directed outcome: ● Allows clinician to automate follow-up questions that might often ask in email/phone/visit exchanges Changes from Original Paper: The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems Query Response I’ve had pain in my knee after running. Why? Non-relevant answer 1 Non-relevant answer 2 Relevant answer 1 Non-relevant answer 3 Relevant answer 2
  • 23. Dual Encoder ● Embeddings are initialized with pre-trained Common Crawl Glove 840b, 300d and allowed to train on new utterances & responses ● 1 hidden layers: one for context, one for response ● Different choices are made for output, but often is a binary probability over up to 10 possible responses Image credit: The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems
  • 24. How Good is our Dual Encoder Doing? Method/Metric Expected recall due to chance TF-IDF LSTM GRU 1 in 2 R@1 50.0% 64.0% 91.1% 92.1% 1 in 5 R@1 20.0% 54.5% 75.5% 79.6% 1 in 10 R@1 10.0% 48.7% 65.1% 68.7% The above metrics seek to quantify how good the models are at extracting the important responses. It is a standard metric for response retrieval models. E.G. “1 in 2 R@1” translates to “At a rate of 1 correct answer out of 2 possible answers, how what is the % of correct answers if only selecting 1. Not bad!
  • 25. Future Directions 1. Create intents for all utterances using a smart indexing from something like UMLS 2. Refine problem statement to exploit the potential of data 3. User transfer learning from large Glove vectors on large corpus of Reddit comments to get more at semantics of comments. 4. Experiment with predicting next response given all thread history 5. Hierarchical attention instead of RNN as these are showing a lot of promise.
  • 27. UMLS Unified Medical Language System ● Brings together many health vocabularies and standards ● Contains 3 tools: ○ Metathesaurus: Terms and codes from many vocabularies, including CPT®, ICD-10-CM, LOINC®, MeSH®, RxNorm, and SNOMED CT® ○ Semantic Network: Broad categories (semantic types) and their relationships (semantic relations) ○ SPECIALIST Lexicon and Lexical Tools: Natural language processing tools
  • 28. Modeling Considerations ● Rule-based: ○ Expensive to build ○ Expensive to maintain ○ Requires precise knowledge ○ E.g.: “If patient has cough, check for fever”, if in flu and fever then patient has flu
  • 30. Health Concept Entities • 61 Total Distinct Concept Entities • Mean of 20 health entity behaviors per utterance
  • 31. Identifying Health Intents ● Ranking of utterances number of health behavior content normalized by length [("Hey, how's your husband doing now? Hope everything is okay.", 0.0), ('I know this is beside your point, but youre 16 and know you have high cholesterol. How? Why?', 1.1831455647530769), ("So why are you posting on here then, if you had two 'real' doctors giving you advice? What answer are you looking for here? ", 1.2548015599879458), ('How long ago did you change your diet, as in when did you have the kidney stones?', 1.2584445094145018), ('How old is your partner?nnDo you know her diagnosis (ie why they did her surgery)?', 1.2593890745195433), ('How long ago was your thyroid levels checked? Do you have any pain in your abdomen?', 1.2635706685998762), ('Obviously you got blood work done here, any strange findings?', 1.2784053615073325), ("And you want to know if he did the burn on purpose or not? There's absolutely no way of telling, you'll have to ask him I guess.", 1.311204647849376), ('dysarthria (slurring your words) is not a common side effect of synthroid. Do you take birth control pills? Do you have a history of migraines? nn Have you had any numbness, weakness, paralysis or changes in the way you walk?nHave you noticed any other weird things going on with your body when you have these episodes? nnHave you had trouble with your vision recently? Any trouble swallowing? Any sudden loss of vision in one eye ever, even a long time ago? ', 1.312565699208395), ('How long have you been suffering? Hope you get rid of dat nasty pain soon.', 1.3201720990394112)]
  • 33. # of Initial Words vs Quantity of Responses ● No real relationship between # of words to number of responses: that is, using a long response doesn’t help or hurt your chances of getting a lot of input
  • 34. Query: Because TMAU and many other metabolic diseases are very rare. Your other question is a bit too general to answer, but yes, there are advances all the time in gastroenterology and metabolic diseases. Possible Answer 4: Could it be tonsil stones perhaps? I have both swollen tonsils after Is it rare because people don't have the disease, or because it goes undiagnosed? Or is it difficult to diagnose? There's a large community over on the MEBO Research site, with members easily in the thousands. What defines a "rare" disease? What percentage of the population has to have a similar or identical diagnosis for it be considered common? Possible Answer 2: Thanks a lot! :) You've given me peace of mind. Possible Answer 3: Just don't take anymore, you aren't withdrawing from anything. Cut the b***. Possible Answer 1: Curiosity... closure... a few other reasons. I understand that it would depend on the cause of death and that if its not a P.E., they might not be able to tell what it was now depending on how the body was prepared. But... specifically regarding the pulmonary embolism... can they at least say whether it was a P.E. or not?

Notes de l'éditeur

  1. Tried to structure the talk in proportion to the work that has been done somewhat far and also to give some perspective to people outside of healthcare why from my perspective it has unique challenges in data science
  2. Healthcare data is extremely valuable: always consider that before you share your data with just any app.
  3. You would think that Kaiser Permanente would be above using Reddit data. Not So!
  4. You are highly susceptible I think to Apophenia: seeing patterns in random data
  5. Two points to make: 1) The consideration of know who is who (medical professional or not) 2) Making sure the optimization is happening more on the clinician side as opposed to the ”customer” or
  6. Becomes important as you consider how long the thread goes on
  7. Trimethylaminuria is a disorder in which the body is unable to break down trimethylamine, a chemical compound that has a pungent odor. Trimethylamine has been described as smelling like rotting fish, rotting eggs, garbage, or urine
  8. A Dual Encoder Sequence to Sequence Model for Open-Domain Dialogue Modeling: https://arxiv.org/pdf/1710.10520.pdf
  9. E.g. “1 in 2 R@1” means for 1 correct recall of the recommendation out of 2 total recommendation where 1 in 2 are correct
  10. Expensive to build: Some things are easy: “If patient has cough, check for fever”, if in flu and fever then patient has flue Health-care may be one of the worst examples for this however, since any one manifestation of a symptom probably means a bunch of things. In fact
  11. Trimethylaminuria is a disorder in which the body is unable to break down trimethylamine, a chemical compound that has a pungent odor. Trimethylamine has been described as smelling like rotting fish, rotting eggs, garbage, or urine