Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
LogMap: Logic-based and Scalable Ontology Matching
1. Intro Indexation Extraction Repair Overlapping Evaluation
LogMap
Logic-based and Scalable Ontology Matching
Ernesto Jiménez-Ruiz Bernardo Cuenca Grau
Information Systems Group
Department of Computer Science, University of Oxford
International Semantic Web Conference
27th October 2011
3. Intro Indexation Extraction Repair Overlapping Evaluation
Our approach in a nutshell
LogMap is a. . .
• Highly scalable ontology matching system,
can deal with very large ontologies containing tens (and even
hundreds) of thousands of classes (e.g. FMA, NCI and
SNOMED CT).
• Equipped with built-in reasoning and diagnosis capabilities.
4. Intro Indexation Extraction Repair Overlapping Evaluation
Motivation (I)
Why ontogy matching tools?
• To integrate and migrate data between applications.
• (Biomedical) ontologies are being developed by
different groups, and
• Use different classifications and naming schemas.
5. Intro Indexation Extraction Repair Overlapping Evaluation
Motivation (I)
Why ontogy matching tools?
• To integrate and migrate data between applications.
• (Biomedical) ontologies are being developed by
different groups, and
• Use different classifications and naming schemas.
6. Intro Indexation Extraction Repair Overlapping Evaluation
Motivation (II)
Challenges to be addressed
• Sufficient scalability to deal with large ontologies such as
FMA, NCI or SNOMED CT
• Detecting and repairing inconsistencies.
• Reasoning with O1 ∪ O2 ∪ M aggravates scalability problem
• Logic-based but scalable techniques
7. Intro Indexation Extraction Repair Overlapping Evaluation
Motivation (II)
Challenges to be addressed
• Sufficient scalability to deal with large ontologies such as
FMA, NCI or SNOMED CT
• Detecting and repairing inconsistencies.
• Reasoning with O1 ∪ O2 ∪ M aggravates scalability problem
• Logic-based but scalable techniques
10. Intro Indexation Extraction Repair Overlapping Evaluation
Lexical indexation
Inverted Files
• Each entry is a “set” of words corresponding to entity labels
• Labels are extended with lexicons and stemming algorithms
Inverted index for NCI labels Index for NCI class URIs
Entry Cls ids Cls id URI
secretion 49901 49901 NCI:CellularSecretion
cellular,secretion 49901 37975 NCI:Trapezoid
cellular,secrete 49901 62999 NCI:TrapezoidBone
trapezoid 37975,62999 60791 NCI:Smegma
trapezoid,bone 62999
smegma 60791
Inverted index for FMA labels Index for FMA class URIs
Entry Cls ids Cls id URI
secretion 36792 36792 FMA:Secretion
bone,trapezoid 20948,47996 47996 FMA:Bone of Trapezoid
trapezoid 20948 20948 FMA:Trapezoid
smegma 60947 60947 FMA:Smegma
11. Intro Indexation Extraction Repair Overlapping Evaluation
Structural indexation
Interval labelling schema
• LogMap indexes the “classified” hierarchy.
• Each concept is associated with two preorders and intervals.
• The cost of typical taxonomical queries is reduced.
12. Intro Indexation Extraction Repair Overlapping Evaluation
Structural indexation
Interval labelling schema
• LogMap indexes the “classified” hierarchy.
• Each concept is associated with two preorders and intervals.
• The cost of typical taxonomical queries is reduced.
16. Intro Indexation Extraction Repair Overlapping Evaluation
Computation of confidence values
Based on. . .
• The string based algorithm ISUB
• A principle of locality
• Correct mappings (C1 ≡ C2) are likely to have similar scopes
• ISUB is used to “map” the corresponding scopes of C1 and C2
• Dice’s coeff. (adapted) provides the similarity between scopes
17. Intro Indexation Extraction Repair Overlapping Evaluation
Computation of confidence values
FMA:Trapezoid ≡ NCI:Trapezoid (no scope) vs
FMA:Trapezoid ≡ NCI:TrapezoidBone (with scope)
18. Intro Indexation Extraction Repair Overlapping Evaluation
Mapping discovery
Exploiting initial anchors
• Also based on the principle of locality
• If C1 ≡ C2 is a correct anchor. . .
• . . . their respective scopes are likely to have new mappings
23. Intro Indexation Extraction Repair Overlapping Evaluation
Unsatisfiability checking
Propositional Horn SAT with Dowling-Gallier (D-G)
• LogMap implements the SAT algorithm D-G
• D-G is call for every class C and the propositional theory PC :
• the rule (true → C);
• the propositional representations P1 and P2 of the input
ontologies; and
• the propositional representation PM of the mappings.
25. Intro Indexation Extraction Repair Overlapping Evaluation
Characteristics of our class satisfiability problem
Our class satisfiability algorithm is . . .
• sound
• If LogMap finds a class unsatisfiable, it is indeed unsatisfiable.
• worst-case linear in the size of the (classified) ontologies.
• incomplete, but incompleteness is mitigated:
• Most of the relevant non-propositional reasoning is already
performed when classifying input ontologies independently
• Mappings are Horn propositional axioms
• Most new entailments caused by the mappings likely to be
computable using Horn propositional reasoning only
26. Intro Indexation Extraction Repair Overlapping Evaluation
Computing repair plans
Recording conflictive mappings
• LogMap extends D-G to record conflictive mappings
• For example: {m4, m5, m6, m7}
• Equivalence mappings are split into two propositional rules.
• Repairs may only consider one of the rules.
27. Intro Indexation Extraction Repair Overlapping Evaluation
Computing repair plans
Recording conflictive mappings
• LogMap extends D-G to record conflictive mappings
• For example: {m4, m5, m6, m7}
• Equivalence mappings are split into two propositional rules.
• Repairs may only consider one of the rules.
28. Intro Indexation Extraction Repair Overlapping Evaluation
Computing repair plans
A “greedy” repair algorithm
• The repairs R are computed in order for each unsat. class
• The algorithm identifies subsets of the conflictive mappings of
increasing size, and stops when a repair is found.
• LogMap finds all repairs of “smallest” size.
• For example: R1 = {m4} and R2 = {m6}
• The repair with less confidence is selected .
29. Intro Indexation Extraction Repair Overlapping Evaluation
Computing repair plans
A “greedy” repair algorithm
• The repairs R are computed in order for each unsat. class
• The algorithm identifies subsets of the conflictive mappings of
increasing size, and stops when a repair is found.
• LogMap finds all repairs of “smallest” size.
• For example: R1 = {m4} and R2 = {m6}
• The repair with less confidence is selected .
30. Intro Indexation Extraction Repair Overlapping Evaluation
Repair of property anchors
• Also relies on the intersection of inverted files
• However their repair is not yect integrated with D-G
• Currently, a candidate mapping between p1 and p2 isvalid only
if both their respective domains D1, D2 and ranges R1, R2 are
“compatible”.
• That is, mappings D1 ≡ D2 and R1 ≡ R2 do not lead to
unsatisfiability.
31. Intro Indexation Extraction Repair Overlapping Evaluation
Repair of property anchors
• Also relies on the intersection of inverted files
• However their repair is not yect integrated with D-G
• Currently, a candidate mapping between p1 and p2 isvalid only
if both their respective domains D1, D2 and ranges R1, R2 are
“compatible”.
• That is, mappings D1 ≡ D2 and R1 ≡ R2 do not lead to
unsatisfiability.
33. Intro Indexation Extraction Repair Overlapping Evaluation
Overlapping estimation
• LogMap also returns two fragments representing the
overlapping between the input ontologies
• Correct mappings are unlikely to involve classes outside these
fragments.
• The overlapping is performed in two steps:
• Computation of ‘weak’ anchors
• Module extraction
34. Intro Indexation Extraction Repair Overlapping Evaluation
Overlapping estimation
Computation of ‘weak’ anchors
Extended inverted index for FMA Index for FMA class URIs
Lexical entry Cls ids Cls id Cls name
ductule,efferent,epithelium 45211 45211 EpitheliumOfEfferentDuctuleOfTestis
common,branch,artery 1170,7842 1170 BranchOfCommonCochlearArtery
7842 BranchOfCommonInterosseousArtery
Extended inverted index for NCI Index for NCI class URIs
Lexical entry Cls ids Cls id Cls name
ductule,efferent,epithelium 27924 27924 EfferentDuctuleEpithelium
common,branch,artery 1204,8087, 1204 CommonCarotidArteryBranch
27727 8087 CommonIliacArteryBranch
27727 CommonFemoralArteryBranch
• Not valid (in general) as candidate mappings.
• Useful to detect concepts with similar lexicon
35. Intro Indexation Extraction Repair Overlapping Evaluation
Overlapping estimation
Computation of ‘weak’ anchors
Extended inverted index for FMA Index for FMA class URIs
Lexical entry Cls ids Cls id Cls name
ductule,efferent,epithelium 45211 45211 EpitheliumOfEfferentDuctuleOfTestis
common,branch,artery 1170,7842 1170 BranchOfCommonCochlearArtery
7842 BranchOfCommonInterosseousArtery
Extended inverted index for NCI Index for NCI class URIs
Lexical entry Cls ids Cls id Cls name
ductule,efferent,epithelium 27924 27924 EfferentDuctuleEpithelium
common,branch,artery 1204,8087, 1204 CommonCarotidArteryBranch
27727 8087 CommonIliacArteryBranch
27727 CommonFemoralArteryBranch
• Not valid (in general) as candidate mappings.
• Useful to detect concepts with similar lexicon
36. Intro Indexation Extraction Repair Overlapping Evaluation
Overlapping estimation
Module extraction
• Classes involved in (weak) mappings are used as the module
signature.
• Concretely, locality-based modules have been used.
48. Intro Indexation Extraction Repair Overlapping Evaluation
Participation in the OAEI Campaign 2011
Benchmark track (8 out of 14)
• LogMap relies on the lexical similarities
• LogMap (Precision: 0.99, Recall: 0.50, F-measure: 0.67)
• MapSSS (Precision: 0.97, Recall: 0.64, F-measure: 0.77)
49. Conclusions and future work
• LogMap is a highly scalable ontology matching tool with
builtin reasoning and diagnosis capabilities.
• LogMap is the only matching system that has shown to be
able to deal with ontologies containing tens and hundreds
of thousands of classes.
• There is still plenty of room for improvement.
• LogMap 2.0
• Major improvements w.r.t. the current version
• Will be soon available for download.
50. Conclusions and future work
• LogMap is a highly scalable ontology matching tool with
builtin reasoning and diagnosis capabilities.
• LogMap is the only matching system that has shown to be
able to deal with ontologies containing tens and hundreds
of thousands of classes.
• There is still plenty of room for improvement.
• LogMap 2.0
• Major improvements w.r.t. the current version
• Will be soon available for download.