Ching-huei Tsou, senior software engineer in the Watson Algorithms group from IBM Watson Research, presented this at the Cognitive Systems Institute Speaker Series on April 14, 2016.
2. Watson’s post-Jeopardy challenge: Healthcare
Our first domain of exploration is medical decision support
because of its mature, complex and meaningful
problem solving nature
After Watson’s win on Jeopardy!, people (outside of computer
science community) assumed that anything that could be
phrased as a question could be correctly answered by Watson:
Watson, “Given my medical record
<insert hundreds of pages of structured and unstructured
data here>
, what’s wrong with me?”
3. Filing System
Summarization Multi-Step ReasoningClinical Knowledge QA EMR Deep QA & Search
*Not to scale
From the generated
Problem oriented summary,
physician noticed the patient’s
“Creatinine level is high”
What has been done to treat
his diabetic nephropathy?
What else can I try?
Clinical Decision
Support System
EMR
What are the causes of
creatinine elevation?
What are the most
likely causes for
this patient?
Toward a Clinical Decision Support System
EMR Analysis Medical QA Reasoning EMR QA
5. Electronic Medical Records
Unstructured Data
Clinical Notes
Semi-structured Data,
e.g.
Diagnoses,
medications,
lab test results
100s of notes for a
typical patient;
1,000s for older patients
with inpatient notes
6. Promises of Electronic Medical/Health Record
Why a Dr. went into medicine
Not why a Dr. went into medicine Prevent medical errors
Reduce health care costs
Increase administrative efficiency
Decrease paperwork
Expand access to affordable care
7. Today’s EMR is Broken
Digitizing medical records does not reduce physicians’ cognitive load
Today’s EMR is largely billing-oriented
Billing compliance regulations require that notes stand on their own, which
may promote duplication of text
Detailed coding = bill = better quality?
General Coding Specific Coding
428.0 CHF NOS
428.9 HF NOS
428.21 Ac Systolic HF
428.23 Ac-Chr Systolic HF
428.31 Ac Diastolic HF
428.33 Ac-Chr Diastolic HF
428.41 Ac Comb S&D HF
428.43 Ac-Chr Comb S&D HF
$29,716 $53,670
8. How EMR Summarization Can Help a Physician?
Consider a physician who is about to see a patient in an outpatient setting
Perhaps this is the first encounter for the physician with the patient, or
It has been a while since the physician has seen this patient
Before seeing the patient, the physician may want to know
What are the patient’s current problems?
When was a problem last discussed / addressed?
How a problem is being managed?
Current medications?
Related lab test results?
Most questions are problem-oriented
9. Problem-Oriented Medical Record (POMR) Summary
Problems List
Medications
Lab tests
“treated by”
“measured by”
“discussed in”
Procedures
Vitals
Clinical Notes & timeline
POMR, as originally defined
by Dr. Lawrence Weed in the
1960s is the official record
keeping method in most US
hospitals
Problem list is also a
mandatory section in the
CCD (continuity of care),
part of HL7’s CDA (clinical
document architecture)
standard
The key to success?
An accurate problem list
10. The Problem List Challenge
Unfortunately, manually maintained problem lists are not
accurate
Our assessment of existing problem lists based on a gold standard
indicates the challenge
Entered Problem List Accuracy:
Recall (Sensitivity) = 0.55
Precision (Positive Predictive Rate) = 0.28
Ground-Truth
Problem
Problems on
the entered list
Resolved Problem
Acute Problem
Problems added
for billing purpose
Patient’s pre-
existing problems
No time to
update the list
FN
FP
TP
Rule-out diagnoses
12. Problem List: A list of current and active diagnoses as
well as past diagnoses relevant to the current care of
the patient
CMS (center of medicare and medicaid services)
Meaningful Use Stage 1
Problem List Definition
13. Problem-List Ground-Truth Annotation Guidelines
What to include:
1. Chronic disease like diabetes, hypertension, hyperlipidemia etc.
2. History of cardiovascular events such as CVA, MI, DVT, PE.
3. Non-injury related musculoskeletal conditions like degenerative disc disease, osteoarthritis, osteoporosis, and rickets
4. History of drug or alcohol ABUSE
5. All psychiatric diagnoses
6. Obesity and obesity related problems like sleep apnea, fatty liver disease etc.
7. Resolved problems of high importance such as recurrent PNA, anemia, etc.
8. Complications from other disease processes, such as diabetic neuropathy, CKD from hypertension etc
9. malignant Neoplasms (or history of) regardless of patient status and any benign neoplasms that need to be monitored
What should NOT be included:
1. all injuries
2. resolved problems of either low importance, or those which have been corrected by surgery(bronchitis, pneumonia, cholecystitis with cholecystectomy,
hernia that has been surgically corrected, appendicitis with appendectomy, etc).
3. Most dermatologic conditions including warts, transient skin rashes of low importance that are resolved. Only exception to this is Acne (regardless of severity)
is included.
4. Signs or symptoms of disease; chest pain, headache, abdominal pain, epistaxis, hematuria, etc. Usually these will have some corresponding diagnosis. If not
then it isn’t included. Only exception is Lumbago, which because of its usual chronicity IS included.
5. Severity of disease, as these tend to wax and wane in many chronic problems.
6. Cause of death or anything from an autopsy report
Where to take information from:
1. Any clinical note, operative report, telephone encounter, etc, where a specific diagnoses is discussed.
2. Do not make inferences. Ie, if a note says fasting glucose of 156, unless it explicitly says this patient has diabetes, leave the diagnosis off
3. Words like probable or suspected before a given diagnoses are situation dependent. Sometimes a later note will confirm or refute that diagnosis.
Tips:
1. Remember that notes have places for allergies, past surgeries, procedures, etc. so leaving things off of a problem list doesn’t mean the information isn’t
available.
2. Try to make the diagnosis as concise as possible, abbreviations are acceptable.
3. If you’re unsure then include it and it can always be removed during adjudication
Guidelines are
subject to
explanation and
extensive domain
knowledge is
required
14. EMRA Problem List Generation
Candidate Generation Scoring & Ranking
Find everything that looks
like a disorder from the
clinical notes
Look for contextual
information and
supporting evidences
17. Context-aware Computing
Given the context, we have no problem reading the
sentences above, even though the characters H and A
(and B and13) are identical
18. Context in EMR
Word
Sentence
Section
Note
Medication & Labs
Similar Pattern in Other EMRs
Hypertension
Hypertension: Yes
Assessment and Plan
Hypertension: Yes
Mentioned in several other notes
Taking HTN drugs, elevated BP
Other patients with similar
pattern has been diagnosed
with hypertension
19. 19
EMRA Problem List Accuracy:
Recall (Sensitivity) = 0.84
Precision (Positive Predictive Rate) = 0.52
Recall Oriented F2 = 0.75
Entered Problem List Accuracy:
Recall (Sensitivity) = 0.55
Precision (Positive Predictive Rate) = 0.28
GroupingCandidate
Generation
Feature Generation
Information
Extraction
Text
Segmen-
tation
Scoring / WeightingEMR
Clinical
Factors
Extraction
CUI Confidence
Note Section
Notes
Structured
Data
(Medications,
Orders, Lab,
etc)
CUIs of unique
Disorders (100s)
Candidate Problems (10s)
CUIs of unique
Medications (10s),
Orders, Lab, etc.
Merging
and
Clustering
Closely
Related
Problems
Term Frequency
Relationship
LSA / DSRD
CUI Path
LSA / DSRD
CUI PathMedsLabs
Score
1.0
0 0.4
Confidence
Score
1.0
0 10
Term
Frequency
Score
1.0
0 0.3
LSA Score
Score
1.0
0 A may
treat B
Path
Pattern
Score
1.
0
0 PMH
Note
Section
Note Type
EMRA Problem List Generation
20. EMRA Problem List Generation
20
EMRA Problem List Accuracy:
Recall (Sensitivity) = 0.70
Precision (Positive Predictive Rate) = 0.73
Precision Oriented F1 = 0.72
Entered Problem List Accuracy:
Recall (Sensitivity) = 0.55
Precision (Positive Predictive Rate) = 0.28
GroupingCandidate
Generation
Feature Generation
Information
Extraction
Text
Segmen-
tation
Scoring / WeightingEMR
Clinical
Factors
Extraction
CUI Confidence
Note Section
Notes
Structured
Data
(Medications,
Orders, Lab,
etc)
CUIs of unique
Disorders (100s)
Candidate Problems (10s)
CUIs of unique
Medications (10s),
Orders, Lab, etc.
Merging
and
Clustering
Closely
Related
Problems
Term Frequency
Relationship
LSA / DSRD
CUI Path
LSA / DSRD
CUI PathMedsLabs
Score
1.0
0 0.4
Confidence
Score
1.0
0 10
Term
Frequency
Score
1.0
0 0.3
LSA Score
Score
1.0
0 A may
treat B
Path
Pattern
Score
1.
0
0 PMH
Note
Section
Note Type
22. EMR Summarization
Watson generates
and groups Problems
by clinical relevance
Watson groups medications
by clinical relevance
Each panel contains answers to a pre-defined question
23. Context-aware User Interface
Labs show elevated
glucose and A1C among
the others…
When a problem is selected
Current and related meds
are highlighted
Relevant notes
are highlighted
24. Is the patient's diabetes well-controlled?
What was patient's last HbA1c? When was it taken?
Patient's hemoglobin A1c is red indicating it is not within normal range.
Patient’s HbA1c has been high except for a single reading in 2013, so
patient's diabetes has NOT been well-controlled.
A1C went down,
why?
A1C went up,
why?
25. A1C went down;
why?
A1C went up in most
recent test despite being
on Victoza (liraglutide);
why?
Endocrinology note on 03/06/2013
Endocrinology note on 07/17/2013
EMRA makes it easy
to find and bring up
relevant notes
Is the patient's diabetes well-controlled?
26. Semantic Find
Acute problems are normally not considered as problems, and don’t show
up in the Summarization UI
Patient come in complaining of hearing problem
has patient experienced this before?
Was patient started on any treatment?
28. Quality Assessment
“I’d consider Watson extremely useful if it can
find one important problem that is missed by
physician”
Neil Mehta. M.D., Internist, Cleveland Clinic
29. Quality Assessment
6 Cleveland Clinic physicians
reviewed 15 EMRs to
generated their own problem
lists, and then compared and
rated the problem lists
each physician reviewed 5
EMRs, and each EMR is
reviewed by 2 physicians
Watson generated lists were given after physicians completed their
own list. Physicians were asked to rate the Watson generated
problems one by one and as a whole
for each problem, is it correct? Is it on your list? If correct, how important is it?
as a whole, rate each list from 1-10 (Likert scale)
Very Important
Ground Truth
Physician
Watson
Important
Somewhat Important
Not at all Important
31. Quality Assessment
Simple linear regression indicates the most important factor to
higher Watson rating is “Percentage of very important
problems that are missed by physician and found by Watson”
In average Watson found 1.2 very important or important
problem missed by physician per EMR (avg. 6 problems)
33. Type of False-Positive Problems
Transient
problem
51%
Correct
21%
Redundant
Problem
11%
Certainty
error
5%
System
error
4%
Noise
4%
Negation
error
3%
Human
error
1%
Error analysis showed most of the
false-positives are “transient problems”
Transient problems are true findings or
disorders of the patient that are less
important to the medical care
Minor / self-limited problem
waxing and waning, e.g. seasonal
Resolved
The definition is somewhat subjective
a resolved problem to one physician
may be a significant past medical
history to another physician
34. CMS (center of medicare and medicaid services) Meaningful Use
Stage 1
Problem List: A list of current and active diagnoses as well as past
diagnoses relevant to the current care of the patient
Problem List Definition
Every known
findings / risk factors
/ disorders of the
patient
• “ideal” problem list
for a nephrologist
• The blue list contains
too many irrelevant
problems
• “ideal” problem list
from an internist
• The green list is too
specific and not
comprehensive
35. The Problem List Challenge
Cardiovascular
Digestive
BodySystem
Endocrine
Respiratory
Genitourinary
36. The Problem List Challenge
Cardiovascular
Digestive
BodySystem
Endocrine
Respiratory
Genitourinary
38. Active Learning (Sample Complexity)
0.5
0.6
0.7
0.8
0 50 100 150 200 250 300
F2Measure
Number of Training EMRs
39. Current Research Direction
Learning Supervised
(batch learning)
Supervised
(active learning)
Features
Knowledge-based features
O(100) selected using ADT
tree / boosting
Features O(1,000) extracted
and selected by DNN (e.g.
auto-encoder)
Temporal
Aspect
Modeled implicitly
Explicitly clustering
multivariate time series
Today Work in Progress