Keynote at the AI in Medicine Conference (AIME 2005), giving an overview of the work in Ontology Mapping to people in Medical Informatics (which includes explaining the what and why of ontologies in general).
1. Ontology mapping:
a way out of
the medical tower of Babel?
Frank van Harmelen
Vrije Universiteit Amsterdam
The Netherlands Antilles
2. Before we start…
a talk on ontology mappings
is difficult talk to give:
no concensus in the field
• on merits of the different approaches
• on classifying the different approaches
no one can speak with authority on
the solution
this is a personal view, with a sell-by date
other speakers will entirely disagree
(or disapprove)
3. Good overviews of the topic
Knowledge Web D2.2.3:
“State of the art on ontology alignment”
Ontology Mapping Survey
talk by Siyamed Seyhmus SINIR
ESWC'05 Tutorial on
Schema and Ontology Matching
by Pavel Shvaiko Jerome Euzenat
KER 2003 paper Kalfoglou & Schorlemmer
These are all different & incompatible…
5. The Medical tower of Babel
Mesh
• Medical Subject Headings, National Library of Medicine
• 22.000 descriptions
EMTREE
• Commercial Elsevier, Drugs and diseases
• 45.000 terms, 190.000 synonyms
UMLS
• Integrates 100 different vocabularies
SNOMED
• 200.000 concepts, College of American Pathologists
Gene Ontology
• 15.000 terms in molecular biology
NCI Cancer Ontology:
• 17,000 classes (about 1M definitions),
7. What are ontologies &
what are they used for
world
concept
language
Agree on a
no shared understanding conceptualization
Conceptual and
terminological confusion Make it explicit
in some language.
Actors: both humans and machines
8. Ontologies come in very
different kinds
From lightweight to heavyweight:
• Yahoo topic hierarchy
• Open directory (400.000 general categories)
• Cyc, 300.000 axioms
From very specific to very general
• METAR code (weather conditions at air terminals)
• SNOMED (medical concepts)
• Cyc (common sense knowledge)
10. In short
(for the duration of this talk)
Ontologies are not
definitive descriptions of
what exists in the world (= philosphy)
Ontologies are
models of the world
constructed
to facilitate communication
Yes, ontologies exist
(because we build them)
12. Ontology mapping is
old & inevitable
Ontology mapping is old
• db schema integration
• federated databases
Ontology mapping is inevitable
• ontology language is standardised,
• don't even try to standardise contents
13. Ontology mapping is
important
database integration,
heterogeneous database retrieval
(traditional)
catalog matching (e-commerce)
agent communication (theory only)
web service integration (urgent)
P2P information sharing (emerging)
personalisation (emerging)
14. Ontology mapping is
now urgent
Ontology mapping has acquired
new urgency
• physical and syntactic integration is ± solved,
(open world, web)
• automated mappings are now required (P2P)
• shift from off-line to run-time matching
Ontology mapping has new opportunities
• larger volumes of data
• richer schemas (relational vs. ontology)
• applications where partial mappings work
15. Different aspects
of ontology mapping
how to discover a mapping
how to represent a mapping
• subset/equal/disjoint/overlap/
is-somehow-related-to
• logical/equational/category-theoretical
atomic/complex arguments,
confidence measure
how to use it
We only talk about “how to discover”
17. Different approaches to
ontology matching
Linguistics & structure
Shared vocabulary
Instance-based matching
Shared background knowledge
18. Linguistic &
structural mappings
normalisation
(case,blanks,digits,diacritics)
lemmatization, N-grams,
edit-distance, Hamming distance,
distance = fraction of common parents
elements are similar if
their parents/children/siblings are similar
decreasing order of boredom
19. Different approaches to
ontology matching
Linguistics & structure
Shared vocabulary
Instance-based matching
Shared background knowledge
21. Matching through
shared vocabulary
Used in mapping geospatial databases
from German land-registration authorities
(small)
Used in mapping bio-medical and
genetic thesauri
(large)
22. Different approaches to
ontology matching
Linguistics & structure
Shared vocabulary
Instance-based matching
Shared background knowledge
24. Matching through
shared instances
Used by Ichise et al (IJCAI’03) to
succesfully map parts of Yahoo to
parts of Google
Yahoo = 8402 classes, 45.000 instances
Google = 8343 classes, 82.000 instances
Only 6000 shared instances
70% - 80% accuracy obtained (!)
Conclusions from authors:
• semantics is needed to improve on this ceiling
25. Different approaches to
ontology matching
Linguistics & structure
Shared vocabulary
Instance-based matching
Shared background knowledge
27. Ontology mapping
using background knowledge
Case study 1
PHILIPS Work with Zharko Aleksovski @ Philips
• Michel Klein @ VU
KIK @ AMC
28. Overview of test data
Two terminologies from
intensive care domain
OLVG list
• List of reasons for ICU admission
AMC list
• List of reasons for ICU admission
DICE hierarchy
• Additional hierarchical knowledge describing
the reasons for ICU admission
29. OLVG list
developed by clinician
3000 reasons for ICU admission
1390 used in first 24 hours of stay
• 3600 patients since 2000
based on ICD9 + additional material
List of problems for patient admission
Each reason for admission is described with
one label
• Labels consist of 1.8 words on average
• redundancy because of spelling mistakes
• implicit hierarchy (e.g. many fractures)
30. AMC list
List of 1460 problems for ICU admission
Each problem is described using
5 aspects from the DICE terminology:
2500 concepts (5000 terms), 4500 links
• Abnormality (size: 85)
• Action taken (size: 55)
• Body system (size: 13)
• Location (size: 1512)
• Cause (size: 255)
expressed in OWL
allows for subsumption & part-of reasoning
31. Why mapping
AMC list $ OLVG list?
allow easy entering of OLVG data
re-use of data in
• epidemiology
• quality of care assessment
• data-mining (patient prognosis)
32. Linguistic mapping:
Compare each pair of concepts
Use labels and synonyms of concepts
Heuristic method to discover
equivalence and subclass relations
Long brain tumor More specific Long tumor
than
First round
• compare with complete DICE
• 313 suggested matches, around 70 % correct
Second round:
• only compare with “reasons for admission” subtree
• 209 suggested matches, around 90 % correct
High precision, low recall (“the easy cases”)
33. Using background knowledge
Use properties of concepts
Use other ontologies to discover
relation between properties
?
…. ….
…. ….
…. ….
34. Semantic match
DICE aspect
Lexical match taxonomies Given
? Abnormality taxonomy
? Action taxonomy
? Body system taxonomy
? Location taxonomy
? Cause taxonomy
Implicit
OLVG matching: DICE
problem list property problem list
match
35. Semantic match
Taxonomy of body parts
Blood vessel
is more general is more general
Vein
Artery
is more general
Aorta
Lexical match: Lexical match:
has location Reasoning: has location
implies
Aorta thoracalis dissection Dissection of artery
Location match:
has more
general location
36. Example: “Heroin intoxication”
– “drugs overdose”
Cause taxonomy
Drugs
is more general
Heroine
Lexical
match: Lexical
Cause match: match:
cause
has more specific cause
cause
Heroin intoxication
Drugs overdosis
Abnormality match:
has more general
Lexical abnormality Lexical match:
match: abnormality
abnormality Abnormality taxonomy
Intoxicatie
is more general
Overdosis
37. Example results
• OLVG: Acute respiratory failure abnormality
DICE: Asthma cardiale
• OLVG: Aspergillus fumigatus cause
DICE: Aspergilloom
• OLVG: duodenum perforation abnormality,
DICE: Gut perforation cause
• OLVG: HIV
cause
DICE: AIDS
• OLVG: Aorta thoracalis dissectie type B location,
DICE: Dissection of artery abnormality
39. Approximate matching
Translate every class-name into a
propositional formula
(both DNF and CNF versions)
A ⊆ B = (∪Ai ⊆ ∩Bk) = ∀i,k (Ai ⊆ Bk)
ignore increasing number. of
(i,k)-subsumption pairs
varies from classical to trivial
40. Results
(obtained on different domain)
600000
500000
400000
B subClass of A
300000 A subClass of B
equivalences
200000
100000
0
0
3
4
5
6
8
9
1
2
7
0
0.
0.
0.
1.
0.
0.
0.
0.
0.
0.
0.
42. Case Study:
Map GALEN & Tambis,
using UMLS as background knowledge
Select three topics with sufficient overlap
• Substances
• Structures
• Processes
Define some
partial & ad-hoc manual mappings
between individual concepts
Represent mappings in C-OWL
Use semantics of C-OWL
to verify and complete mappings
49. “Conclusions”
Ontology mapping is (still) hard & open
Many different approaches will be required:
• linguistic,
• structural
• statistical
• semantic
• …
Currently no roadmap theory on
what's good for which problems