1. Mining Electronic Health Records
Go Beyond Ontology Based Text Mining
October 15th 2015
Mining Electronic Health Records #110/16/2015
2. • Information management company providing text analysis,
data management and state-of-the-art semantic technology
• 70 software developers in Sofia, Bulgaria
• Presence in London and New York
• Clients include BBC, FT, AstraZeneca, DoD, Wiley & Sons
• Over 400 person-years in R&D to create a one-stop shop for:
– Content enrichment
– Data management
– Graph database engine
Ontotext
Mining Electronic Health Records #210/16/2015
6. Mining Electronic Health Records #610/16/2015
• An ontology models
discrete knowledge
domain
• All ontology concepts
have a definition
• All ontology concepts
have alternative labels
• Where appropriate,
ontology concepts have
additional labels
• Inference can be
applied
Chronic Obstructive
Pulmonary Disease
rdf:typeCOPD
Disease
skos:prefLabel
skos:altLabel
COLD
Shortness
of Breath
rdf:type
Symptom
hasSymptom
skos:altLabel Chronic Airflow
Obstruction
rdf:type
Disease
Respiratory
Disease
Ontology Based IE
7. Ontology Based IE - problems
Mining Electronic Health Records #710/16/2015
• Does not model a domain completely (both on instance
level and labels)
Extend ontologies
Ontology enrichment via instance mappings
• Labels contain additional qualifying information
Definition of literals rewrite and ignore rules
• Labels does not reflect natural language
Apply “flexible” gazetteers
• Ambiguity in terminology
Pre-filtering
Ranking
Semantic instance mappings
8. Vocabulary Enrichment – Semantic Mappings
Mining Electronic Health Records #810/16/2015
Chronic obstructive airway disease NOS
Chronic obstructive lung disease NOS
Chronic obstructive pulmonary disease, unspecified
Chronic obstructive lung disease
Chronic obstructive airways disease NOS
Chronic obstructive lung disease (disorder)
CAFL - Chronic airflow limitation
Chronic irreversible airway obstruction
ICD 10 CM SNOMED CT US
skos:closeMatch
9. Ontology Based IE - problems
Mining Electronic Health Records #910/16/2015
• Does not model a domain completely (both on instance
level and labels)
Extend ontologies
Ontology enrichment via instance mappings
• Labels contain additional qualifying information
Definition of literals rewrite and ignore rules
• Labels does not reflect natural language
Apply “flexible” gazetteers
• Ambiguity in terminology
Pre-filtering
Ranking
Semantic instance mappings
10. Vocabulary Enrichment – Synonym Enrichment
Mining Electronic Health Records #1010/16/2015
Tumor
Tumour
Abdomen
Abd
Tumor of abdomen
Tumor of abd
Tumour of abdomen
Tumour of abd
11. Ontology Based IE - problems
Mining Electronic Health Records #1110/16/2015
• Does not model a domain completely (both on instance
level and labels)
Extend ontologies
Ontology enrichment via instance mappings
• Labels contain additional qualifying information
Definition of literals rewrite and ignore rules
• Labels does not reflect natural language
Apply “flexible” gazetteers
• Ambiguity in terminology
Pre-filtering
Ranking
Semantic instance mappings
12. Ontology Based IE – example
Mining Electronic Health Records #1210/16/2015
13. Flexible Gazetteers
Mining Electronic Health Records #1310/16/2015
• Pre-coordinated terms cannot match all natural
language terms, especially those used in narrative
medical text!
Inversions
concept “knee injury” vs. “injury of knee” in text
Gaps due to additional qualifiers
concept “periorbital swelling” vs. “periorbital soft tissue swelling” in text
14. Detection of negations
Mining Electronic Health Records #1410/16/2015
• The ability to reliably identify negated medical
statements in text may significantly affect the quality
of the extracted information.
Adverbial Negation
Negations in noun phrase
Prepositional Negation
Adjective Negation
Verb Negation
15. Temporality Identification
Mining Electronic Health Records #1510/16/2015
• Temporal resolution for events in clinical notes is
crucial for an accurate definition of patient history,
current medical condition and assigned treatment.
• Identified temporality classes are:
Historical
Hypothetical (“Not particular”)
Recent
• The temporality data is important to be normalized
based on the medical documents meta data (date of
report/visit)!
17. Post-coordination Patterns
Mining Electronic Health Records #1710/16/2015
• It is impossible to fully describe medical knowledge in
term of fully qualified concepts!
• Natural language does not follow the standardized
descriptions defined by domain ontologies!
• Concepts must describe basic entities
• Entity properties can be described by different
qualifier classes
• Patterns can generate new concepts, combining
specific instance and qualifier classes
18. Post-coordination Patterns - Examples
Mining Electronic Health Records #1810/16/2015
• Example pattern:
<disease> or <morphologic abnormality> as right most concept in a noun
phrase, preceded by <qualifier> and <body structure>
19. Data Modeling
Mining Electronic Health Records #1910/16/2015
• Based on normalized data
• … but allowing extension with free text
• Allow data fusion with background knowledge
• Capture all aspects of the extracted information
• Tightly coupled with the context
• Provide provenance and confidence score
• Explorable! Not just searchable
20. Data provenance: graph <http://linkedlifedata.com/resource/document/CD8672>
Data Modeling
Mining Electronic Health Records #2010/16/2015
rdf:typePatient XYZ
Patient
male
hasGender
hasBirthDate
1956/09/20 xsd:date
hasDiagnose
http://linkedlifedata.com/resource/icd9cm/157.9
current
Disease
hasStatus
skos:prefLabel
Malignant neoplasm of pancreas
rdf:type
Data provenance: graph <http://linkedlifedata.com/resource/document/CN127753>
hasTreatment
http://linkedlifedata.com/resource/treatment/DT127753
Treatment
hasDrug
hasDosage
rdf:type
http://linkedlifedata.com/resource/drug/irinotecan
180 mg/ 1 m2 for 80 min
21. Data provenance: graph <http://linkedlifedata.com/resource/drugBroshure/CAMPTOSAR>
Maximum Daily Dosage
Data Modeling – KB
Mining Electronic Health Records #2110/16/2015
http://linkedlifedata.com/resource/drugDosage/DD127753
Dosage
hasMedication
hasPopulationGroup
rdf:type
http://linkedlifedata.com/resource/drug/irinotecan
Adult
hasAdministration Route
http://linkedlifedata.com/resource/route/subcutaneus
hasAdministration Form
http://linkedlifedata.com/resource/form/injection
http://linkedlifedata.com/resource/icd9cm/157.9
hasIndication
hasDosageValue
180
hasDosageUnit
mg
hasDenominatorValue
1
hasDenominatorUnit
m2
22. Semantic Data Exploration and Mining
Mining Electronic Health Records #2210/16/2015
• Build Linked Data out of extracted facts and
background knowledge
• Semantic Faceted Search
• Cross Entity Search & Exploration
• Expert Text Mining Search in pre-annotated
documents
Combine semantic annotations with PoS elements
Identify post-coordination patterns
Identify relations patterns
Query expansion using background knowledge
23. • Information Extraction from EHRs is still a challenge!
• Making use of the extracted data is even more
challenging
• Ontotext provides the technology stack to make it work!
life-sciences@ontotext.com
Thank you!
Mining Electronic Health Records #2310/16/2015