Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Evaluating Mapping Repair Systems with Large Biomedical Ontologies
1. Preliminaries OAEI Alcomo LogMap-Repair Evaluation
Evaluating Mapping Repair Systems
with Large Biomedical Ontologies
Ernesto Jiménez-Ruiz Christian Meilicke
Bernardo Cuenca Grau Ian Horrocks
Department of Computer Science, University of Oxford
Research Group Data and Web Science, University of Mannheim, Germany
26th Description Logics Workshop
24 July 2013
3. Preliminaries OAEI Alcomo LogMap-Repair Evaluation
Ontology mappings
Mappings M are tuples he1, e2, n, ρi
• e1, e2 are entities in the input ontologies O1 and O2
• n a confidence value between 0 and 1
• ρ is the semantic relationship between e1 and e2 (e.g.
subsumption, equivalence or disjointness)
Formalized as OWL 2 axioms
• Where the semantic relationship ρ is one of {≡, ⊑, ⊒, ⊥}
• Confidence values n are represented as axiom annotations
• No extra semantics
4. Preliminaries OAEI Alcomo LogMap-Repair Evaluation
Ontology mappings
Mappings M are tuples he1, e2, n, ρi
• e1, e2 are entities in the input ontologies O1 and O2
• n a confidence value between 0 and 1
• ρ is the semantic relationship between e1 and e2 (e.g.
subsumption, equivalence or disjointness)
Formalized as OWL 2 axioms
• Where the semantic relationship ρ is one of {≡, ⊑, ⊒, ⊥}
• Confidence values n are represented as axiom annotations
• No extra semantics
5. Preliminaries OAEI Alcomo LogMap-Repair Evaluation
Incoherent mappings
A set of Mappings M wrt O1 and O2 is incoherente if
• there exists a class A such that
• O1 ∪ O2 ∪ M |= A ⊑ ⊥, and
• O1 ∪ O2 6|= A ⊑ ⊥.
7. Preliminaries OAEI Alcomo LogMap-Repair Evaluation
Mapping Repair
R is a repair for M wrt O1 and O2
• if M R is coherent.
• R = M is a candiate repair.
R is a diagnosis for M wrt O1 and O2
• if R′ ⊂ R is not a repair for M.
• Diagnosis are (typically) expensive to obtain.
8. Preliminaries OAEI Alcomo LogMap-Repair Evaluation
Mapping Repair
R is a repair for M wrt O1 and O2
• if M R is coherent.
• R = M is a candiate repair.
R is a diagnosis for M wrt O1 and O2
• if R′ ⊂ R is not a repair for M.
• Diagnosis are (typically) expensive to obtain.
9. Preliminaries OAEI Alcomo LogMap-Repair Evaluation
Computing Mapping Repairs
Standard justification-based ontology debugging
• are expensive since require complete reasoning.
• do not scale when the number of unsatisfiabilities is large.
An approximate Repair R≈
• will not guarantee that M R≈ is coherent;
• but aims at reducing the number of unsatisfiabilities,
• while preserving M as much as possible.
10. Preliminaries OAEI Alcomo LogMap-Repair Evaluation
Computing Mapping Repairs
Standard justification-based ontology debugging
• are expensive since require complete reasoning.
• do not scale when the number of unsatisfiabilities is large.
An approximate Repair R≈
• will not guarantee that M R≈ is coherent;
• but aims at reducing the number of unsatisfiabilities,
• while preserving M as much as possible.
12. Preliminaries OAEI Alcomo LogMap-Repair Evaluation
Evaluation of Ontology Matching Systems
Ontology Alignment Evaluation Initiative (OAEI)
• A annual campaign for the evaluation of ontology matching
systems.
• Deadline September 1.
• http://oaei.ontologymatching.org/
• The matching problems are organized in several tracks.
• Each track involves different test ontologies and reference
alignments.
• Large BioMed track: FMA, NCI and SNOMED CT.
13. Preliminaries OAEI Alcomo LogMap-Repair Evaluation
Evaluation of Ontology Matching Systems: OAEI
Quality of a set of mappings M
• Precision and recall wrt reference mappings or gold standard
• Precision (Pre) = |M ∩ MGS|/|M|
• Recall (Rec) = |M ∩ MGS|/|MGS|
• The F-score (F) = (2 × Pre × Rec)/(Pre + Rec).
• Coherence of M wrt O1 and O2.
• Computation times are also considered.
14. Preliminaries OAEI Alcomo LogMap-Repair Evaluation
Mapping coherence in the Large BioMed track
• FMA v2.0 (78,989 classes), NCI v.08.05d (66,724 classes) and
SNOMED CT v. Jan. 2009 (306,591 classes).
• Evaluated top-7 systems
• Unsat. provided by HermiT and ELK (in SNOMED-NCI)
System
FMA-NCI FMA-SNOMED SNOMED-NCI
Unsat. Ratio Unsat. Ratio Unsat. Ratio
LogMapnoe 9 0.01% 10 0.003% ≥0 ≥0%
LogMap 9 0.01% 10 0.003% ≥17 ≥0.005%
ServOMap 48,743 27% 273,242 71% ≥313,643 ≥84%
ServOMapL 50,334 28% 99,726 26% ≥314,939 ≥84%
GOMMA 5,574 4% 10,752 3% ≥266,051 ≥71%
GOMMAbk 12,939 9% 119,657 31% ≥313,015 ≥84%
YAM++ 50,550 29% 106,107 28% ≥269,107 ≥72%
15. Preliminaries OAEI Alcomo LogMap-Repair Evaluation
Mapping Repair Systems
Mapping repair with Alcomo and LogMap-Repair
• Alcomo is specialised to repair incoherent mappings.
• LogMap, originally implemented as an matching system, can
also operate as a mapping repair system (LogMap-Repair).
• Evaluated using incomplete reasoning.
• Compute and approximate repair R≈ for a given M wrt O1
and O2.
16. Preliminaries OAEI Alcomo LogMap-Repair Evaluation
Mapping Repair Systems
Mapping repair with Alcomo and LogMap-Repair
• Alcomo is specialised to repair incoherent mappings.
• LogMap, originally implemented as an matching system, can
also operate as a mapping repair system (LogMap-Repair).
• Evaluated using incomplete reasoning.
• Compute and approximate repair R≈ for a given M wrt O1
and O2.
18. Preliminaries OAEI Alcomo LogMap-Repair Evaluation
Alcomo mapping repair system
Implemented Repair Techniques
• Complete Reasoning: classical Black-Box appoach (inefficient)
• Incomplete reasoning: pattern-based Approach (efficient but
incomplete)
• Combinations of both techniques (complete, but still not
efficient)
Incomplete reasoning
• Alcomo guarantees that the computed R≈ is a subset of the
diagnosis.
• Alcomo never removes too much.
19. Preliminaries OAEI Alcomo LogMap-Repair Evaluation
Alcomo mapping repair system
Implemented Repair Techniques
• Complete Reasoning: classical Black-Box appoach (inefficient)
• Incomplete reasoning: pattern-based Approach (efficient
but incomplete)
• Combinations of both techniques (complete, but still not
efficient)
Incomplete reasoning
• Alcomo guarantees that the computed R≈ is a subset of the
diagnosis.
• Alcomo never removes too much.
20. Preliminaries OAEI Alcomo LogMap-Repair Evaluation
Alcomo mapping repair system
Pattern-based Repair
• First classify O1 and O2
• Check each pair of mappings for conflict patterns
• See two example patterns below:
A
C
#i
#i
B#j
D#j
F#j
F#j D#j
A
C
#i
#i
B#j
D#j
F#j
F#j D#j
disjoint
disjoint
d
i
s
j
o
i
n
t
21. Preliminaries OAEI Alcomo LogMap-Repair Evaluation
Alcomo: algorithm used in the experiments (iii)
Input: O1, O2: input ontologies; M: input mappings
Output: M′: output mappings; R≈: approximate mapping repair.
1: M′
:= ∅
2: R≈
:= ∅
3: classify O1 and O2
4: C := ConflictPairs(O1, O2, M)
5: for each m ∈ M do ⊲ iterate over M desc. order w.r.t. confidences
6: coh := true
7: for each m′
∈ M′
do
8: if (m′
, m) ∈ C then
9: coh := false
10: break
11: end if
12: end for
13: if coh = true then M′
:= M′
∪ {m}
14: else R≈
:= R≈
∪ {m}
15: end for
16: return hM′
, R≈
i
24. Preliminaries OAEI Alcomo LogMap-Repair Evaluation
LogMap-Repair system
Propositional Horn SAT with Dowling-Gallier (D-G)
• LogMap-Reapir implements the SAT algorithm D-G
• D-G is call for every class C and the propositional theory PC :
• the rule (true → C);
• the propositional representations P1 and P2 of the input
ontologies; and
• the propositional representation of the mappings.
Recording conflictive mappings
• LogMap-Repair extends D-G to record mappings involved in
an unsatifiability.
25. Preliminaries OAEI Alcomo LogMap-Repair Evaluation
LogMap-Repair system
Propositional Horn SAT with Dowling-Gallier (D-G)
• LogMap-Reapir implements the SAT algorithm D-G
• D-G is call for every class C and the propositional theory PC :
• the rule (true → C);
• the propositional representations P1 and P2 of the input
ontologies; and
• the propositional representation of the mappings.
Recording conflictive mappings
• LogMap-Repair extends D-G to record mappings involved in
an unsatifiability.
27. Preliminaries OAEI Alcomo LogMap-Repair Evaluation
LogMap-Repair system
A “greedy” repair algorithm
• A (local) repair RC is computed for each (unsat.) class C
(top-bottom approach)
• The algorithm identifies subsets of the conflictive mappings of
increasing size, and stops when a repair is found.
• LogMap finds all repairs of “smallest” size for C.
• For example: R1 = {m4} and R2 = {m6}
• The repair with less confidence is selected .
28. Preliminaries OAEI Alcomo LogMap-Repair Evaluation
LogMap-Repair system
A “greedy” repair algorithm
• A (local) repair RC is computed for each (unsat.) class C
(top-bottom approach)
• The algorithm identifies subsets of the conflictive mappings of
increasing size, and stops when a repair is found.
• LogMap finds all repairs of “smallest” size for C.
• For example: R1 = {m4} and R2 = {m6}
• The repair with less confidence is selected .
29. Preliminaries OAEI Alcomo LogMap-Repair Evaluation
LogMap-Repair system
Characteristics of the repair algorithm
• LogMap-Repair identifies a (global) repair R≈ =
S
RC .
• P1 ∪ P2 ∪ M R≈ ∪ {true → C} is satisfiable for each C
in P1 ∪ P2.
• The propositional encoding of O1 and O2 is incomplete.
• The algorithm does not ensure satisfiability of each class in
O1 ∪ O2 ∪ M R≈.
• R≈ is an approximate repair of M wrt O1 and O2.
30. Preliminaries OAEI Alcomo LogMap-Repair Evaluation
LogMap-Repair system
Characteristics of the repair algorithm
• LogMap-Repair identifies a (global) repair R≈ =
S
RC .
• P1 ∪ P2 ∪ M R≈ ∪ {true → C} is satisfiable for each C
in P1 ∪ P2.
• The propositional encoding of O1 and O2 is incomplete.
• The algorithm does not ensure satisfiability of each class in
O1 ∪ O2 ∪ M R≈.
• R≈ is an approximate repair of M wrt O1 and O2.
32. Preliminaries OAEI Alcomo LogMap-Repair Evaluation
Evaluation
Evaluation settings
• Three matching problems of the OAEI 2012 Large BioMed
track: FMA-NCI, FMA-SNOMED and SNOMED-NCI.
• Top matching systems in this track apart from LogMap.
• Server with 16 CPUs and allocating 15 Gb RAM.
33. Preliminaries OAEI Alcomo LogMap-Repair Evaluation
Repair in the FMA-NCI matching problem
System
Matching Results OAEI 2012 Repair with Alcomo Repair with LogMap
|M| t (s) F Unsat. |R≈
| t (s) F Unsat. |R≈
| t (s) F Unsat.
ServOMap 4,932 204 0.819 48,743 97 321 0.820 9 115 21 0.819 9
ServOMapL 5,400 251 0.841 50,334 126 342 0.841 9 166 24 0.839 9
GOMMA 5,686 217 0.839 5,574 123 321 0.839 15 142 20 0.838 9
GOMMAbk 6,330 231 0.837 12,939 184 341 0.836 29 259 33 0.833 9
YAM++ 5,476 1,304 0.862 50,550 116 324 0.862 10 141 21 0.861 9
Average 5,565 441 0.840 33,628 129 330 0.840 14 165 24 0.838 9
34. Preliminaries OAEI Alcomo LogMap-Repair Evaluation
Repair in the FMA-SNOMED matching problem
System
Matching Results OAEI 2012 Repair with Alcomo Repair with LogMap
|M| t (s) F Unsat. |R≈
| t (s) F Unsat. |R≈
| t (s) F Unsat.
ServOMap 12,642 532 0.770 273,242 829 2,672 0.749 0 2,640 284 0.721 0
ServOMapL 13,210 517 0.794 99,729 975 2,779 0.767 0 2,953 274 0.752 0
GOMMA 11,648 1,994 0.291 10,752 463 2,840 0.293 0 618 245 0.291 0
GOMMAbk 25,660 1,893 0.708 119,657 1,726 2,809 0.698 1,363 5,295 314 0.678 0
YAM++ 14,088 23,900 0.765 106,107 783 2,800 0.760 0 3,461 262 0.720 0
Average 15,450 5,767 0.666 121,897 955 2,780 0.653 273 2,993 276 0.632 0
35. Preliminaries OAEI Alcomo LogMap-Repair Evaluation
Repair in the SNOMED-NCI matching problem
System
Matching Results OAEI 2012 Repair with Alcomo Repair with LogMap
|M| t (s) F Unsat. |R≈
| t (s) F Unsat. |R≈
| t (s) F Unsat.
ServOMap 24,924 654 0.664 ≥313,643 937 3,223 0.663 ≥35 1,671 276 0.666 ≥296
ServOMapL 27,928 738 0.678 ≥314,939 1,076 2,917 0.677 ≥2,055 1,656 314 0.679 ≥1,241
GOMMA 27,386 1,820 0.606 ≥266,051 1,903 2,947 0.603 ≥2,085 2,949 303 0.607 ≥37
GOMMAbk 34,090 1,940 0.635 ≥313,015 2,720 3,098 0.638 ≥30,583 5,003 435 0.641 ≥1
YAM++ 28,206 30,155 0.680 ≥269,107 757 2,964 0.679 ≥0 1,049 305 0.680 ≥0
Average 28,507 7,061 0.653 ≥295,351 1,479 3,030 0.652 ≥6,952 2,465 326 0.655 ≥315
36. Conclusions
• Computed (approximate) repairs are not aggressive:
• 5% (Alcomo).
• 11% (LogMap-Repair).
• The repair process does not represent a bottleneck in the
matching process.
• Very fast using LogMap-Repair.
• The impact on the F-score is (on average) negligible.
• The incoherence of the repaired mapping sets is always
significantly reduced.
• Alcomo and LogMap-Repair could complement each
other.
37. Conclusions and future work
• Feasibility of integrating the mapping repair techniques within
the ontology matching process.
• Good basis to compare ontology and mapping repair systems:
• Call for Mapping/Ontology repair systems (September 2013)
• http://
38. Questions?
• Alcomo main contact:
Christian Meilicke (@).
• LogMap-Repair main contact:
Ernesto Jimnez-Ruiz (ernesto.jimenez.ruiz@gmail.com)
Thank you for your attention
• Alcomo: http://
• LogMap-Repair:
http://www.cs.ox.ac.uk/isg/tools/LogMap/