Segmenting & Merging Domain-specific Modules for Clinical Informatics
1. Segmenting & Merging Domain-specific
Modules for Clinical Informatics
Chimezie Ogbuji
Cleveland Clinic & Case Western Reserve University
Sivaram Arabandi
Case Western Reserve University
Songmao Zhang
Chinese Academy of Sciences
Guo-Qiang Zhang
Case Western Reserve University
2. Introduction
● What are we doing and why are we doing it?
– Generally
– Specifically
● What is the criteria for success?
● What are existing best practices and well-
documented challenges of ontology re-use?
3. Introduction
● Construct domain-specific ontologies to support
data curation and ongoing clinical research
activity
● PhysioMIMI is an informatics infrastructure for
collection, management, and analysis of sleep-
related data
● Our method was used to bootstrap a Sleep
Domain Ontology (SDO)
4.
5. Goal / Criteria for Success
● Want to (automatically)
– Generate anatomy and clinical terminology modules
that make use of principled normal forms, are
minimal in size, and preserve the meaning of re-
used symbols
– As much as is computationally feasible
– Be able to facilitate the customization of a large
source ontology such as SNOMED-CT
● Provide a framework for bootstrapping
terminology for a specific domain
6. Desiderata for Clinical Terminology
● There is a critical need for formal, reproducible
methods for recognizing and filling gaps in
medical terminologies (Cimino 1998)
● Clinical terminology systems need to extend
smoothly and quickly in response to the needs
of users (Rector 1999)
– A fixed, enumerated list of concepts can never be
complete and results in a combinatorial explosion of
terms (exhaustive pre-coordination)
7. Desiderata (cont.)
● Post-coordination is a contrasting approach
where a set of atomic concepts are used to
create new terms on demand rather than a
priori
● Rector 2003 proposed a set of normalization
criteria and an approach for decomposing and
recombining disjoint, homogenous taxonomies
● Goal is for trees of primitive terms to serve as a
terminological framework that minimizes implicit
differentia
– Discrete coordinate system
8. Background
● Related efforts regarding
– Ontology merging
– Ontology modularization
● Review formalisms for ontology modularization
– What is a deductive, conservative extension?
– What is a module?
● What is the difference between a segment and
a module?
9. Related Work
● Noy and Musen (2000)
– Discuss how to either automate the merging and
alignment or guide the user, suggesting conflicts
and actions to take
– Rely on lexical matching of term names
● Bontas and Tolksdorf (2005)
– Similar goal as Noy & Musen
– User provides a list of term matches between
source & target
– Follow semantic connections from these terms
10. Related Work (cont.)
● Bontas et al. (2005) identify the following
challenges in ontology re-use:
– Automated translation of source ontologies into
common KR format
– Customization of source ontology
– Performance challenges of large medical ontologies
11. Related Work (cont.)
● d'Aquin et al.(2006)
– Use a modularization algorithm based on a
traversal paradigm
– Describe 3 generic steps of dynamic knowledge
selection algorithms:
● Selection of relevant ontologies
● Modularization via an algorithm
● Merging of ontology modules in a meaningful way
– Claim all entailments are preserved but do not
demonstrate how this is guaranteed
12. Modularization
● Move to introduction (single bullet item)
● The size of major medical ontologies is
prohibitive to the use of deductive reasoning
● In addition and more relevant here, their size is
a significant challenge to terminology
management
● Ontology modularization is a blossoming field in
logic engineering
13. Deductive, Conservative Extensions
● Grau et al. (2008) define a formal relationship
between DL ontologies: deductive, conservative
extension
● Use case: we are developing ontology P and
want to re-use a set of symbols from ontology Q
without changing their meaning
● If the symbols they have in common are re-
used in this way then:
– P + Q is a conservative extension of Q
14. Module
● When answering a query involving terms in O
(its signature or vocabulary), importing O'1
should give the same answers as if O' had
been imported instead:
– O'1 is a more manageable fragment of O'
● Then we say O'1 is a module for O in O'
15. Materials
● SNOMED-CT
● FMA
● Common anatomy signature
16. Materials
● There is a reasonable consensus around two
reference ontologies in clinical medicine
– SNOMED-CT and the Foundational Model of
Anatomy (FMA)
● Both leverage an underlying formal knowledge
representation
17. SNOMED-CT
● A comprehensive terminological framework for
clinical documentation and reporting.
● Comprised of about half a million concepts:
– Clinical findings, procedures, body structures,
organisms, substances, pharmaceutical products,
specimen, quantitative measures, and clinical
situations
● Has an underlying description logic (EL)
– EL has been proven to be suitable for medical
terminology
18. SNOMED-CT Challenges
● Its size is deters the use of logical inference
systems to manage and process it (due to
performance issues)
● Most description logic systems run into
challenges with memory exhaustion when
classifying it in its entirety
● In some cases, its definitions are inconsistent or
incomplete
● However, it is the de facto reference for clinical
terminology
19. SNOMED-CT SEP Triplets
● SNOMED-CT uses SEP triplets to model
anatomy concepts and their relationships to
each other
● For every proper SNOMED-CT anatomy
concept (an Entire class), there are two auxiliary
classes:
– A Structure class
– A Part class
● Main motivation is to rely on subsumption to
reason about part-whole relationships
21. Foundational Model of Anatomy
● Has a goal to conceptualize the physical
objects and spaces that constitute the human
body
● Leverages a frame-based knowledge
representation to formulate over 75,000
concepts including:
– Macroscopic, microscopic, and sub-cellular
canonical anatomy
● Anatomy is fundamental to biomedical domains
22. FMA (cont.)
● Concepts are connected by several
mereological relations
● Primarily concerned with part_of and has_part
● Adheres to a strict, aristotelian modeling
paradigm
– Ensures definitions are consistent and state the
essence of anatomy in terms of their characteristics
● Using a 2006 OWL translation from the version
in the OBO foundary
23. Common Anatomy Signature
● There is a significant overlap between anatomy
terms in SNOMED-CT and FMA
● Bodenreider and Zhang (2006) analyzed this
overlap
● Leveraged lexical and structural analysis
● Identified ~ 7500 common concepts
– Refer to as Sanatomy
● Key to the general applicability of our method
within the domain of clinical medicine
24. Normal Forms
● Similarly, SNOMED-CT manual describes
methods for generating normal forms
● Canonical forms comprised of maximally
decomposed logical expressions
– Entailments from full SNOMED-CT still follow
from normal forms
● Useful for comparing post-coordinated
expressions during retrieval or analysis of data
25. Methods
● Start with a list of user-specified SNOMED-CT
concepts
– Determines the domain
● 3 step process resulting in
– A SNOMED-CT module: O'snct-fma
– Transliteration of SEP triplets
– A FMA segment: O'fma-snct
● Segmentation heuristic
● Directly merge into a single ontology
26. Core Procedure
● Extract normal forms from SNOMED-CT
● SNOMED-CT anatomy terms in Sanatomy that are
reached during the extraction are replaced and
used as seeds to extract a segment from the
FMA
● Axioms involving SNOMED-CT anatomy terms
in Sanatomy and the terms themselves are
replaced such that they preserve the intent of
the SEP triplet scheme using FMA terms
27.
28. Segmentation Heuristic
● Seidenberg and Rector (2006) describe an
ontology segmentation heuristic that starts with
a set of terms and creates an extract from an
ontology around those terms
– Traverses ontology structure and is limited by user-
specified recursion depth
● Inspiration for modularization algorithm of
d'Aquin et al. (2006)
30. Segments v.s. Modules
● The segmentation heuristic we use is in
contrast to those of Grau et al. (2008) that
produce modules with 100% semantic fidelity
● Sacrifice semantic fidelity for an expedient
extraction process
● The (tractable) calculation of deductive,
conservative extensions for EL is an open
research problem
● Or at the very least a challenging problem
31. Reifying SEP triplets
● Need to replace SNOMED-CT anatomy terms
in a way that preserves the intent of the SEP
anatomy scheme
● Transcribe them into a more expressive
description logic
● Define a set of rules to determine how axioms
involving mapped SNOMED-CT terms are
replaced
● Shultz et al. (1998) describe how to logically
identify components of an SEP triplet
32. Definitions
● Terms:
– Osnomed is the short normal form of SNOMED-CT
starting from a user-specified term set
● Anatomy module for a clinical domain
– O'snct-fma is a module for Osnomed in Ofma with respect to
Sanatomy
● Clinical domain module for anatomy
– O'fma-snct is a module for Ofma in Osnomed with respect to
Sanatomy
33.
34. Results
● The applied domain
– Sleep studies (Polysomnograms)
● Quantitative analysis
– With and without the use of normal forms
● Example
● How the goals were met
● Advantages
● Challenges
35.
36. Analysis
● Results:
– 825 (718) classes in O'snct-fma
– 901 (648) classes in O'fma-snct
– 81 (53) SNOMED-CT anatomy concepts in Sanatomy
were reached
– 43 (35) were structures, 37 (17) were entire parts,
one was a part
*Numbers in parenthesis are within the normal form
37. Analysis (cont.)
● Of the 366 (85) disorders and procedures, 23
(4) were cross-boundary definitions
● 266 (232) FMA classes were at the periphery of
the segment extraction heuristic
● Candidates for subsequent FMA extraction
– Incrementally expand the domain by
connections to related parts of human
anatomy
38. SEP Reification Example
● In SNOMED-CT, Corticobasal Degeneration is
a disorder that has (as its finding sites):
– Cerebral cortex (structure)
– Basal ganglion (structure)
● As a result of the SEP reification, it is defined
as follows
39. Achieving the Goals
Goal Approach
1.Identify and fill gaps in 1.Allow an informatician
clinical terminology to seed and control
2.Use canonical, the extraction
normalized 2.Take advantage of
representations normal form
3.Has sufficient transformations
expressive power 3.Leveraging more
4.Re-uses the FMA expressive KR
4.Use a set of rules to
reify SEP triplets
40. Advantages
● We further demonstrate the general value of
ontology segmentation within the context of
biomedical terminology
● Address the challenge of managing terminology
and filling in gaps using reference ontologies in
a coordinated way
● The use of a more expressive DL to reify SEP
triplets is similar to the approach of
Suntisrivaraporn (2007)
– We use terms from a reference ontology of
anatomy
41. FMA Enrichment
● Provides partitive axioms that connect the
cerebral cortex to 100 other subordinate
anatomical entities
42. Advantages (cont.)
● O'snct-fma is a deductive, conservative extension
of its combination with O'fma-snct
– Every inclusion axiom involving FMA terms
alone in the combination also holds in FMA as
a whole
– The reification process takes advantage of the
fidelity of the SNOMED-CT to FMA mappings
● Any application that uses the FMA can still use
the combination without loss of meaning of the
FMA terms
43. Challenges
● The use of disjunction operator introduces the
need for a more expressive description logic
than EL++
● Subsumption links are only traversed upwards
from target terms
– Found that downward traversal significantly
impacts the size of the segment
44. Cross-module Definitions
● SNOMED-CT concepts in O'snct-fma defined by
role restrictions where the filler class involve
anatomy terms in Sanatomy
● These embody the kinds of explicit definitions
that normal forms attempt to facilitate
● In some cases, the definitions are enriched due
to connections to FMA
– Resulting in richer entailment
45. Conclusion (cont.)
● However for an application that uses SNOMED-
CT, the same disease may have 2 sites where
one is a SNOMED-CT concept and the other is
an FMA concept.