SlideShare une entreprise Scribd logo
1  sur  51
Télécharger pour lire hors ligne
Intro Indexation Extraction Repair Overlapping Evaluation
LogMap
Logic-based and Scalable Ontology Matching
Ernesto Jiménez-Ruiz Bernardo Cuenca Grau
Information Systems Group
Department of Computer Science, University of Oxford
International Semantic Web Conference
27th October 2011
Intro Indexation Extraction Repair Overlapping Evaluation
Outline
Introduction
Indexation
Mapping extraction
Mapping repair
Overlapping estimation
Evaluation
Intro Indexation Extraction Repair Overlapping Evaluation
Our approach in a nutshell
LogMap is a. . .
• Highly scalable ontology matching system,
can deal with very large ontologies containing tens (and even
hundreds) of thousands of classes (e.g. FMA, NCI and
SNOMED CT).
• Equipped with built-in reasoning and diagnosis capabilities.
Intro Indexation Extraction Repair Overlapping Evaluation
Motivation (I)
Why ontogy matching tools?
• To integrate and migrate data between applications.
• (Biomedical) ontologies are being developed by
different groups, and
• Use different classifications and naming schemas.
Intro Indexation Extraction Repair Overlapping Evaluation
Motivation (I)
Why ontogy matching tools?
• To integrate and migrate data between applications.
• (Biomedical) ontologies are being developed by
different groups, and
• Use different classifications and naming schemas.
Intro Indexation Extraction Repair Overlapping Evaluation
Motivation (II)
Challenges to be addressed
• Sufficient scalability to deal with large ontologies such as
FMA, NCI or SNOMED CT
• Detecting and repairing inconsistencies.
• Reasoning with O1 ∪ O2 ∪ M aggravates scalability problem
• Logic-based but scalable techniques
Intro Indexation Extraction Repair Overlapping Evaluation
Motivation (II)
Challenges to be addressed
• Sufficient scalability to deal with large ontologies such as
FMA, NCI or SNOMED CT
• Detecting and repairing inconsistencies.
• Reasoning with O1 ∪ O2 ∪ M aggravates scalability problem
• Logic-based but scalable techniques
Intro Indexation Extraction Repair Overlapping Evaluation
The anatomy of LogMap
Intro Indexation Extraction Repair Overlapping Evaluation
Outline
Introduction
Indexation
Mapping extraction
Mapping repair
Overlapping estimation
Evaluation
Intro Indexation Extraction Repair Overlapping Evaluation
Lexical indexation
Inverted Files
• Each entry is a “set” of words corresponding to entity labels
• Labels are extended with lexicons and stemming algorithms
Inverted index for NCI labels Index for NCI class URIs
Entry Cls ids Cls id URI
secretion 49901 49901 NCI:CellularSecretion
cellular,secretion 49901 37975 NCI:Trapezoid
cellular,secrete 49901 62999 NCI:TrapezoidBone
trapezoid 37975,62999 60791 NCI:Smegma
trapezoid,bone 62999
smegma 60791
Inverted index for FMA labels Index for FMA class URIs
Entry Cls ids Cls id URI
secretion 36792 36792 FMA:Secretion
bone,trapezoid 20948,47996 47996 FMA:Bone of Trapezoid
trapezoid 20948 20948 FMA:Trapezoid
smegma 60947 60947 FMA:Smegma
Intro Indexation Extraction Repair Overlapping Evaluation
Structural indexation
Interval labelling schema
• LogMap indexes the “classified” hierarchy.
• Each concept is associated with two preorders and intervals.
• The cost of typical taxonomical queries is reduced.
Intro Indexation Extraction Repair Overlapping Evaluation
Structural indexation
Interval labelling schema
• LogMap indexes the “classified” hierarchy.
• Each concept is associated with two preorders and intervals.
• The cost of typical taxonomical queries is reduced.
Intro Indexation Extraction Repair Overlapping Evaluation
Outline
Introduction
Indexation
Mapping extraction
Mapping repair
Overlapping estimation
Evaluation
Intro Indexation Extraction Repair Overlapping Evaluation
Computing initial anchors
Intersection of inverted files
Inverted index for NCI labels Index for NCI class URIs
Entry Cls ids Cls id URI
secretion 49901 49901 NCI:CellularSecretion
cellular,secretion 49901 37975 NCI:Trapezoid
cellular,secrete 49901 62999 NCI:TrapezoidBone
trapezoid 37975,62999 60791 NCI:Smegma
trapezoid,bone 62999
smegma 60791
Inverted index for FMA labels Index for FMA class URIs
Entry Cls ids Cls id URI
secretion 36792 36792 FMA:Secretion
bone,trapezoid 20948,47996 47996 FMA:Bone of Trapezoid
trapezoid 20948 20948 FMA:Trapezoid
smegma 60947 60947 FMA:Smegma
Intro Indexation Extraction Repair Overlapping Evaluation
Computing initial anchors
Intersection of inverted files
Entry FMA ids NCI ids Mappings
secretion 36792 49901 FMA:Secretion ≡ NCI:CellularSecretion
smegma 60947 60791 FMA:Smegma ≡ NCI:Smegma
trapezoid 20948
37975, FMA:Trapezoid ≡ NCI:Trapezoid
62999 FMA:Trapezoid ≡ NCI:TrapezoidBone
trapezoid,bone
20948,
62999
FMA:Trapezoid ≡ NCI:TrapezoidBone
47996 FMA:Bone of Trapezoid ≡ NCI:TrapezoidBone
Intro Indexation Extraction Repair Overlapping Evaluation
Computation of confidence values
Based on. . .
• The string based algorithm ISUB
• A principle of locality
• Correct mappings (C1 ≡ C2) are likely to have similar scopes
• ISUB is used to “map” the corresponding scopes of C1 and C2
• Dice’s coeff. (adapted) provides the similarity between scopes
Intro Indexation Extraction Repair Overlapping Evaluation
Computation of confidence values
FMA:Trapezoid ≡ NCI:Trapezoid (no scope) vs
FMA:Trapezoid ≡ NCI:TrapezoidBone (with scope)
Intro Indexation Extraction Repair Overlapping Evaluation
Mapping discovery
Exploiting initial anchors
• Also based on the principle of locality
• If C1 ≡ C2 is a correct anchor. . .
• . . . their respective scopes are likely to have new mappings
Intro Indexation Extraction Repair Overlapping Evaluation
Mapping discovery
Exploiting initial anchors
Intro Indexation Extraction Repair Overlapping Evaluation
Outline
Introduction
Indexation
Mapping extraction
Mapping repair
Overlapping estimation
Evaluation
Intro Indexation Extraction Repair Overlapping Evaluation
Propositional Horn representation
• LogMap represents the “extended” hierarchies as
Propositional Horn clauses
• This is key to LogMap’s scalability
Propositional FMA (P1) Propositional NCI (P2)
(1) Smegma → Secretion (8) Smegma → ExocrineGlandFluid
(2) Secretion → PortionBodySusbstance (9) ExocrineGlandFluid → Anatomy
(3) PortionBodySusbstance → AnatomicalEntity (10) CellularSecretion → TransmembraneTransport
Computed mappings (PM ) (11) TransmembraneTransport → TransportProcess
(m4) FMA:Secretion → NCI:CellularSecretion (12) TransportProcess → BiologicalProcess
(m5) NCI:CellularSecretion → FMA:Secretion (13) Anatomy ∧ BiologicalProcess → false
(m6) FMA:Smegma → NCI:Smegma (14) ExocrineGlandFluid ∧ ExfolCells → Smegma
(m7) NCI:Smegma → FMA:Smegma
Intro Indexation Extraction Repair Overlapping Evaluation
Propositional Horn representation
• LogMap represents the “extended” hierarchies as
Propositional Horn clauses
• This is key to LogMap’s scalability
Propositional FMA (P1) Propositional NCI (P2)
(1) Smegma → Secretion (8) Smegma → ExocrineGlandFluid
(2) Secretion → PortionBodySusbstance (9) ExocrineGlandFluid → Anatomy
(3) PortionBodySusbstance → AnatomicalEntity (10) CellularSecretion → TransmembraneTransport
Computed mappings (PM ) (11) TransmembraneTransport → TransportProcess
(m4) FMA:Secretion → NCI:CellularSecretion (12) TransportProcess → BiologicalProcess
(m5) NCI:CellularSecretion → FMA:Secretion (13) Anatomy ∧ BiologicalProcess → false
(m6) FMA:Smegma → NCI:Smegma (14) ExocrineGlandFluid ∧ ExfolCells → Smegma
(m7) NCI:Smegma → FMA:Smegma
Intro Indexation Extraction Repair Overlapping Evaluation
Unsatisfiability checking
Propositional Horn SAT with Dowling-Gallier (D-G)
• LogMap implements the SAT algorithm D-G
• D-G is call for every class C and the propositional theory PC :
• the rule (true → C);
• the propositional representations P1 and P2 of the input
ontologies; and
• the propositional representation PM of the mappings.
Intro Indexation Extraction Repair Overlapping Evaluation
Unsatisfiability checking
Intro Indexation Extraction Repair Overlapping Evaluation
Characteristics of our class satisfiability problem
Our class satisfiability algorithm is . . .
• sound
• If LogMap finds a class unsatisfiable, it is indeed unsatisfiable.
• worst-case linear in the size of the (classified) ontologies.
• incomplete, but incompleteness is mitigated:
• Most of the relevant non-propositional reasoning is already
performed when classifying input ontologies independently
• Mappings are Horn propositional axioms
• Most new entailments caused by the mappings likely to be
computable using Horn propositional reasoning only
Intro Indexation Extraction Repair Overlapping Evaluation
Computing repair plans
Recording conflictive mappings
• LogMap extends D-G to record conflictive mappings
• For example: {m4, m5, m6, m7}
• Equivalence mappings are split into two propositional rules.
• Repairs may only consider one of the rules.
Intro Indexation Extraction Repair Overlapping Evaluation
Computing repair plans
Recording conflictive mappings
• LogMap extends D-G to record conflictive mappings
• For example: {m4, m5, m6, m7}
• Equivalence mappings are split into two propositional rules.
• Repairs may only consider one of the rules.
Intro Indexation Extraction Repair Overlapping Evaluation
Computing repair plans
A “greedy” repair algorithm
• The repairs R are computed in order for each unsat. class
• The algorithm identifies subsets of the conflictive mappings of
increasing size, and stops when a repair is found.
• LogMap finds all repairs of “smallest” size.
• For example: R1 = {m4} and R2 = {m6}
• The repair with less confidence is selected .
Intro Indexation Extraction Repair Overlapping Evaluation
Computing repair plans
A “greedy” repair algorithm
• The repairs R are computed in order for each unsat. class
• The algorithm identifies subsets of the conflictive mappings of
increasing size, and stops when a repair is found.
• LogMap finds all repairs of “smallest” size.
• For example: R1 = {m4} and R2 = {m6}
• The repair with less confidence is selected .
Intro Indexation Extraction Repair Overlapping Evaluation
Repair of property anchors
• Also relies on the intersection of inverted files
• However their repair is not yect integrated with D-G
• Currently, a candidate mapping between p1 and p2 isvalid only
if both their respective domains D1, D2 and ranges R1, R2 are
“compatible”.
• That is, mappings D1 ≡ D2 and R1 ≡ R2 do not lead to
unsatisfiability.
Intro Indexation Extraction Repair Overlapping Evaluation
Repair of property anchors
• Also relies on the intersection of inverted files
• However their repair is not yect integrated with D-G
• Currently, a candidate mapping between p1 and p2 isvalid only
if both their respective domains D1, D2 and ranges R1, R2 are
“compatible”.
• That is, mappings D1 ≡ D2 and R1 ≡ R2 do not lead to
unsatisfiability.
Intro Indexation Extraction Repair Overlapping Evaluation
Outline
Introduction
Indexation
Mapping extraction
Mapping repair
Overlapping estimation
Evaluation
Intro Indexation Extraction Repair Overlapping Evaluation
Overlapping estimation
• LogMap also returns two fragments representing the
overlapping between the input ontologies
• Correct mappings are unlikely to involve classes outside these
fragments.
• The overlapping is performed in two steps:
• Computation of ‘weak’ anchors
• Module extraction
Intro Indexation Extraction Repair Overlapping Evaluation
Overlapping estimation
Computation of ‘weak’ anchors
Extended inverted index for FMA Index for FMA class URIs
Lexical entry Cls ids Cls id Cls name
ductule,efferent,epithelium 45211 45211 EpitheliumOfEfferentDuctuleOfTestis
common,branch,artery 1170,7842 1170 BranchOfCommonCochlearArtery
7842 BranchOfCommonInterosseousArtery
Extended inverted index for NCI Index for NCI class URIs
Lexical entry Cls ids Cls id Cls name
ductule,efferent,epithelium 27924 27924 EfferentDuctuleEpithelium
common,branch,artery 1204,8087, 1204 CommonCarotidArteryBranch
27727 8087 CommonIliacArteryBranch
27727 CommonFemoralArteryBranch
• Not valid (in general) as candidate mappings.
• Useful to detect concepts with similar lexicon
Intro Indexation Extraction Repair Overlapping Evaluation
Overlapping estimation
Computation of ‘weak’ anchors
Extended inverted index for FMA Index for FMA class URIs
Lexical entry Cls ids Cls id Cls name
ductule,efferent,epithelium 45211 45211 EpitheliumOfEfferentDuctuleOfTestis
common,branch,artery 1170,7842 1170 BranchOfCommonCochlearArtery
7842 BranchOfCommonInterosseousArtery
Extended inverted index for NCI Index for NCI class URIs
Lexical entry Cls ids Cls id Cls name
ductule,efferent,epithelium 27924 27924 EfferentDuctuleEpithelium
common,branch,artery 1204,8087, 1204 CommonCarotidArteryBranch
27727 8087 CommonIliacArteryBranch
27727 CommonFemoralArteryBranch
• Not valid (in general) as candidate mappings.
• Useful to detect concepts with similar lexicon
Intro Indexation Extraction Repair Overlapping Evaluation
Overlapping estimation
Module extraction
• Classes involved in (weak) mappings are used as the module
signature.
• Concretely, locality-based modules have been used.
Intro Indexation Extraction Repair Overlapping Evaluation
Outline
Introduction
Indexation
Mapping extraction
Mapping repair
Overlapping estimation
Evaluation
Intro Indexation Extraction Repair Overlapping Evaluation
Evaluation
Used ontologies
• SNOMED CT Jan. 2009 version (306, 591 classes)
• NCI version 08.05d (66, 724 classes)
• FMA version 2.0 (78, 989 classes)
• OAEI ontologies: NCI Anatomy (3, 304 classes), Mouse
Anatomy (2, 744 classes), conference and benchmarck
ontologies (< 200 classes).
Intro Indexation Extraction Repair Overlapping Evaluation
Evaluation
Tasks
• Repair of gold standards
• Matching large ontologies
• Overlapping estimation
• Participation in the OAEI 2011
Intro Indexation Extraction Repair Overlapping Evaluation
Repair of gold standards
• UMLS as a reference to align FMA-NCI, FMA-SNOMED and
SNOMED-NCI
• The OAEI 2010 anatomy track gold standard
GS Mappings Repaired Mappings
Ontologies Total Unsat. Total v Time (s)
FMA-NCI 3,024 655 (96%) 2,898 78 10.6
FMA-SNOMED 9,072 6,179 (89%) 8,111 1,619 81.4
SNOMED-NCI 19,622 20,944 (93%) 18,322 837 812.4
Mouse-NCIAnat 1,520 0 1,520 - -
Intro Indexation Extraction Repair Overlapping Evaluation
Repair of gold standards
Non reported/repaired unsatisfiable class
Intro Indexation Extraction Repair Overlapping Evaluation
Matching large ontologies
Mappings computed by LogMap
Found Mapp. Output Mapp. Time (s)
Ontologies Total Unsat. Total v Anchors Total
FMA-NCI 3,185 597 (94%) 3,000 43 28.3 69.8
FMA-SNOMED 2,068 570 (99%) 2,059 32 35.6 92.2
SNOMED-NCI 14,250 10,452 (95%) 13,562 1,540 528.6 1370.0
Intro Indexation Extraction Repair Overlapping Evaluation
Matching large ontologies
Precision and recall w.r.t. Gold Standard
Found Mappings Output Mappings
Ontologies Precision Recall F-score Precision Recall F-score
FMA-NCI 0.767 0.843 0.803 0.811 0.840 0.825
FMA-SNOMED 0.767 0.195 0.312 0.771 0.195 0.312
SNOMED-NCI 0.753 0.585 0.659 0.786 0.582 0,668
Intro Indexation Extraction Repair Overlapping Evaluation
Matching large ontologies
Lexical similarity of GS mappings
GS ISUB ≥ 0.95 GS ISUB ≥ 0.80 GS ISUB ≥ 0.50
Ontologies % Mapp. Recall % Mapp. Recall % Mapp. Recall
FMA-NCI 88% 0.96 93% 0.90 97% 0.87
FMA-SNOMED 21% 0.95 64% 0.30 92% 0.21
SNOMED-NCI 62% 0.94 75% 0.77 89% 0.65
Intro Indexation Extraction Repair Overlapping Evaluation
Overlapping estimation
Overlapping computed by LogMap
Ontologies Overlapping for O1 Overlapping for O2
O1-O2 O0
1 % O1 Recall O0
2 % O2 Recall
FMA-NCI 6,512 8% 0.95 12,867 19% 0.97
FMA-SNOMED 20,278 26% 0.92 50,656 17% 0.94
SNOMED-NCI 70,705 23% 0.86 33,829 51% 0.96
Intro Indexation Extraction Repair Overlapping Evaluation
Participation in the OAEI Campaign 2011
Anatomy track (2nd out of 6)
System Precision Recall F-score Time (s) Incoherence
AgrMaker 0.943 0.892 0.917 634 -
LogMap 0.948 0.846 0.894 24 0%
CODI 0.965 0.825 0.889 1,890 -
Lily 0.814 0.734 0.772 563 -
AROMA 0.742 0.625 0.679 39 -
CSA 0.465 0.757 0.576 4,685 -
Intro Indexation Extraction Repair Overlapping Evaluation
Participation in the OAEI Campaign 2011
Conference track (3rd out of 14)
System Precision Recall F-score Incoherence
YAM++ 0.78 0.56 0.65 -
CODI 0.74 0.57 0.64 -
LogMap 0.84 0.5 0.63 0%
AgrMaker 0.65 0.59 0.62 -
MassMatch 0.83 0.42 0.56 -
CSA 0.5 0.6 0.55 -
CIDER 0.64 0.45 0.53 -
. . . . . . . . . . . . . . .
Intro Indexation Extraction Repair Overlapping Evaluation
Participation in the OAEI Campaign 2011
Benchmark track (8 out of 14)
• LogMap relies on the lexical similarities
• LogMap (Precision: 0.99, Recall: 0.50, F-measure: 0.67)
• MapSSS (Precision: 0.97, Recall: 0.64, F-measure: 0.77)
Conclusions and future work
• LogMap is a highly scalable ontology matching tool with
builtin reasoning and diagnosis capabilities.
• LogMap is the only matching system that has shown to be
able to deal with ontologies containing tens and hundreds
of thousands of classes.
• There is still plenty of room for improvement.
• LogMap 2.0
• Major improvements w.r.t. the current version
• Will be soon available for download.
Conclusions and future work
• LogMap is a highly scalable ontology matching tool with
builtin reasoning and diagnosis capabilities.
• LogMap is the only matching system that has shown to be
able to deal with ontologies containing tens and hundreds
of thousands of classes.
• There is still plenty of room for improvement.
• LogMap 2.0
• Major improvements w.r.t. the current version
• Will be soon available for download.
Questions?
Contact
• LogMap Project:
http://www.cs.ox.ac.uk/isg/projects/LogMap/
• ernesto.jimenez.ruiz@gmail.com
Thank you for your attention
Acknowledgements
• Funding support of the Royal Society and EPRSC.
• Organizers of the OAEI campaign.
• R. Berlanga and V. Nebot.

Contenu connexe

Tendances

Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationMarco Righini
 
Web-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim HunterWeb-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim HunterDatabricks
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsClaudia Wagner
 
Machine learning with graph
Machine learning with graphMachine learning with graph
Machine learning with graphDing Li
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge GraphsJeff Z. Pan
 
DeepWalk: Online Learning of Representations
DeepWalk: Online Learning of RepresentationsDeepWalk: Online Learning of Representations
DeepWalk: Online Learning of RepresentationsBryan Perozzi
 
Word2vec algorithm
Word2vec algorithmWord2vec algorithm
Word2vec algorithmAndrew Koo
 
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAIYurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAILviv Startup Club
 
SHACL in Apache jena - ApacheCon2020
SHACL in Apache jena - ApacheCon2020SHACL in Apache jena - ApacheCon2020
SHACL in Apache jena - ApacheCon2020andyseaborne
 
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from RealityBuilding an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from RealityJoshua Shinavier
 
Mapping Hierarchical Sources into RDF using the RML Mapping Language
Mapping Hierarchical Sources into RDF using the RML Mapping LanguageMapping Hierarchical Sources into RDF using the RML Mapping Language
Mapping Hierarchical Sources into RDF using the RML Mapping Languageandimou
 
Incorporating Diversity in a Learning to Rank Recommender System
Incorporating Diversity in a Learning to Rank Recommender SystemIncorporating Diversity in a Learning to Rank Recommender System
Incorporating Diversity in a Learning to Rank Recommender SystemJacek Wasilewski
 
MongoDB company and case studies - john hong
MongoDB company and case studies - john hong MongoDB company and case studies - john hong
MongoDB company and case studies - john hong Ha-Yang(White) Moon
 
Migrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDBMigrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDBMongoDB
 
Graph Representation Learning
Graph Representation LearningGraph Representation Learning
Graph Representation LearningJure Leskovec
 
淺談 Java GC 原理、調教和 新發展
淺談 Java GC 原理、調教和新發展淺談 Java GC 原理、調教和新發展
淺談 Java GC 原理、調教和 新發展Leon Chen
 

Tendances (20)

Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
Web-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim HunterWeb-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim Hunter
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic Models
 
Machine Learning na globo-com
Machine Learning na globo-comMachine Learning na globo-com
Machine Learning na globo-com
 
ShEx by Example
ShEx by ExampleShEx by Example
ShEx by Example
 
Gnn overview
Gnn overviewGnn overview
Gnn overview
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
Machine learning with graph
Machine learning with graphMachine learning with graph
Machine learning with graph
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
 
DeepWalk: Online Learning of Representations
DeepWalk: Online Learning of RepresentationsDeepWalk: Online Learning of Representations
DeepWalk: Online Learning of Representations
 
Word2vec algorithm
Word2vec algorithmWord2vec algorithm
Word2vec algorithm
 
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAIYurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
 
SHACL in Apache jena - ApacheCon2020
SHACL in Apache jena - ApacheCon2020SHACL in Apache jena - ApacheCon2020
SHACL in Apache jena - ApacheCon2020
 
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from RealityBuilding an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
 
Mapping Hierarchical Sources into RDF using the RML Mapping Language
Mapping Hierarchical Sources into RDF using the RML Mapping LanguageMapping Hierarchical Sources into RDF using the RML Mapping Language
Mapping Hierarchical Sources into RDF using the RML Mapping Language
 
Incorporating Diversity in a Learning to Rank Recommender System
Incorporating Diversity in a Learning to Rank Recommender SystemIncorporating Diversity in a Learning to Rank Recommender System
Incorporating Diversity in a Learning to Rank Recommender System
 
MongoDB company and case studies - john hong
MongoDB company and case studies - john hong MongoDB company and case studies - john hong
MongoDB company and case studies - john hong
 
Migrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDBMigrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDB
 
Graph Representation Learning
Graph Representation LearningGraph Representation Learning
Graph Representation Learning
 
淺談 Java GC 原理、調教和 新發展
淺談 Java GC 原理、調教和新發展淺談 Java GC 原理、調教和新發展
淺談 Java GC 原理、調教和 新發展
 

Similaire à LogMap: Logic-based and Scalable Ontology Matching

Evaluating Mapping Repair Systems with Large Biomedical Ontologies
Evaluating Mapping Repair Systems with Large Biomedical OntologiesEvaluating Mapping Repair Systems with Large Biomedical Ontologies
Evaluating Mapping Repair Systems with Large Biomedical OntologiesErnesto Jimenez Ruiz
 
2016 bioinformatics i_alignments_wim_vancriekinge
2016 bioinformatics i_alignments_wim_vancriekinge2016 bioinformatics i_alignments_wim_vancriekinge
2016 bioinformatics i_alignments_wim_vancriekingeProf. Wim Van Criekinge
 
2015 bioinformatics alignments_wim_vancriekinge
2015 bioinformatics alignments_wim_vancriekinge2015 bioinformatics alignments_wim_vancriekinge
2015 bioinformatics alignments_wim_vancriekingeProf. Wim Van Criekinge
 
EVOLUTION OF ONTOLOGY-BASED MAPPINGS
EVOLUTION OF ONTOLOGY-BASED MAPPINGSEVOLUTION OF ONTOLOGY-BASED MAPPINGS
EVOLUTION OF ONTOLOGY-BASED MAPPINGSAksw Group
 
Metody logiczne w analizie danych
Metody logiczne w analizie danych Metody logiczne w analizie danych
Metody logiczne w analizie danych Data Science Warsaw
 
sequence alignment
sequence alignmentsequence alignment
sequence alignmentammar kareem
 
EpiMOLAS: An Intuitive Web-based Framework for Genome-Wide DNA Methylation An...
EpiMOLAS: An Intuitive Web-based Framework for Genome-Wide DNA Methylation An...EpiMOLAS: An Intuitive Web-based Framework for Genome-Wide DNA Methylation An...
EpiMOLAS: An Intuitive Web-based Framework for Genome-Wide DNA Methylation An...Daniel Su
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Li Shen
 
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmApache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmArvind Surve
 
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmApache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmArvind Surve
 
June 25-26, Workshop
 June 25-26,  Workshop June 25-26,  Workshop
June 25-26, WorkshopFahadahammed2
 
Metagenomic Data Analysis: Computational Methods and Applications
Metagenomic Data Analysis: Computational Methods and ApplicationsMetagenomic Data Analysis: Computational Methods and Applications
Metagenomic Data Analysis: Computational Methods and ApplicationsFabio Gori
 
From Billions to Quintillions: Paving the way to real-time motif discovery in...
From Billions to Quintillions: Paving the way to real-time motif discovery in...From Billions to Quintillions: Paving the way to real-time motif discovery in...
From Billions to Quintillions: Paving the way to real-time motif discovery in...J On The Beach
 
2014 anu-canberra-streaming
2014 anu-canberra-streaming2014 anu-canberra-streaming
2014 anu-canberra-streamingc.titus.brown
 
SERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_schoolSERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_schoolHenry Muccini
 
SERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the CloudSERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the CloudSERENEWorkshop
 

Similaire à LogMap: Logic-based and Scalable Ontology Matching (20)

Evaluating Mapping Repair Systems with Large Biomedical Ontologies
Evaluating Mapping Repair Systems with Large Biomedical OntologiesEvaluating Mapping Repair Systems with Large Biomedical Ontologies
Evaluating Mapping Repair Systems with Large Biomedical Ontologies
 
Bioinformatica t3-scoring matrices
Bioinformatica t3-scoring matricesBioinformatica t3-scoring matrices
Bioinformatica t3-scoring matrices
 
sequencea.ppt
sequencea.pptsequencea.ppt
sequencea.ppt
 
2016 bioinformatics i_alignments_wim_vancriekinge
2016 bioinformatics i_alignments_wim_vancriekinge2016 bioinformatics i_alignments_wim_vancriekinge
2016 bioinformatics i_alignments_wim_vancriekinge
 
2015 bioinformatics alignments_wim_vancriekinge
2015 bioinformatics alignments_wim_vancriekinge2015 bioinformatics alignments_wim_vancriekinge
2015 bioinformatics alignments_wim_vancriekinge
 
EVOLUTION OF ONTOLOGY-BASED MAPPINGS
EVOLUTION OF ONTOLOGY-BASED MAPPINGSEVOLUTION OF ONTOLOGY-BASED MAPPINGS
EVOLUTION OF ONTOLOGY-BASED MAPPINGS
 
Metody logiczne w analizie danych
Metody logiczne w analizie danych Metody logiczne w analizie danych
Metody logiczne w analizie danych
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
EpiMOLAS: An Intuitive Web-based Framework for Genome-Wide DNA Methylation An...
EpiMOLAS: An Intuitive Web-based Framework for Genome-Wide DNA Methylation An...EpiMOLAS: An Intuitive Web-based Framework for Genome-Wide DNA Methylation An...
EpiMOLAS: An Intuitive Web-based Framework for Genome-Wide DNA Methylation An...
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2
 
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmApache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
 
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmApache SystemML Optimizer and Runtime techniques by Matthias Boehm
Apache SystemML Optimizer and Runtime techniques by Matthias Boehm
 
June 25-26, Workshop
 June 25-26,  Workshop June 25-26,  Workshop
June 25-26, Workshop
 
Metagenomic Data Analysis: Computational Methods and Applications
Metagenomic Data Analysis: Computational Methods and ApplicationsMetagenomic Data Analysis: Computational Methods and Applications
Metagenomic Data Analysis: Computational Methods and Applications
 
From Billions to Quintillions: Paving the way to real-time motif discovery in...
From Billions to Quintillions: Paving the way to real-time motif discovery in...From Billions to Quintillions: Paving the way to real-time motif discovery in...
From Billions to Quintillions: Paving the way to real-time motif discovery in...
 
ChIP-seq - Data processing
ChIP-seq - Data processingChIP-seq - Data processing
ChIP-seq - Data processing
 
2014 anu-canberra-streaming
2014 anu-canberra-streaming2014 anu-canberra-streaming
2014 anu-canberra-streaming
 
SERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_schoolSERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_school
 
SERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the CloudSERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the Cloud
 
Bioinformatica t4-alignments
Bioinformatica t4-alignmentsBioinformatica t4-alignments
Bioinformatica t4-alignments
 

Dernier

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Dernier (20)

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

LogMap: Logic-based and Scalable Ontology Matching

  • 1. Intro Indexation Extraction Repair Overlapping Evaluation LogMap Logic-based and Scalable Ontology Matching Ernesto Jiménez-Ruiz Bernardo Cuenca Grau Information Systems Group Department of Computer Science, University of Oxford International Semantic Web Conference 27th October 2011
  • 2. Intro Indexation Extraction Repair Overlapping Evaluation Outline Introduction Indexation Mapping extraction Mapping repair Overlapping estimation Evaluation
  • 3. Intro Indexation Extraction Repair Overlapping Evaluation Our approach in a nutshell LogMap is a. . . • Highly scalable ontology matching system, can deal with very large ontologies containing tens (and even hundreds) of thousands of classes (e.g. FMA, NCI and SNOMED CT). • Equipped with built-in reasoning and diagnosis capabilities.
  • 4. Intro Indexation Extraction Repair Overlapping Evaluation Motivation (I) Why ontogy matching tools? • To integrate and migrate data between applications. • (Biomedical) ontologies are being developed by different groups, and • Use different classifications and naming schemas.
  • 5. Intro Indexation Extraction Repair Overlapping Evaluation Motivation (I) Why ontogy matching tools? • To integrate and migrate data between applications. • (Biomedical) ontologies are being developed by different groups, and • Use different classifications and naming schemas.
  • 6. Intro Indexation Extraction Repair Overlapping Evaluation Motivation (II) Challenges to be addressed • Sufficient scalability to deal with large ontologies such as FMA, NCI or SNOMED CT • Detecting and repairing inconsistencies. • Reasoning with O1 ∪ O2 ∪ M aggravates scalability problem • Logic-based but scalable techniques
  • 7. Intro Indexation Extraction Repair Overlapping Evaluation Motivation (II) Challenges to be addressed • Sufficient scalability to deal with large ontologies such as FMA, NCI or SNOMED CT • Detecting and repairing inconsistencies. • Reasoning with O1 ∪ O2 ∪ M aggravates scalability problem • Logic-based but scalable techniques
  • 8. Intro Indexation Extraction Repair Overlapping Evaluation The anatomy of LogMap
  • 9. Intro Indexation Extraction Repair Overlapping Evaluation Outline Introduction Indexation Mapping extraction Mapping repair Overlapping estimation Evaluation
  • 10. Intro Indexation Extraction Repair Overlapping Evaluation Lexical indexation Inverted Files • Each entry is a “set” of words corresponding to entity labels • Labels are extended with lexicons and stemming algorithms Inverted index for NCI labels Index for NCI class URIs Entry Cls ids Cls id URI secretion 49901 49901 NCI:CellularSecretion cellular,secretion 49901 37975 NCI:Trapezoid cellular,secrete 49901 62999 NCI:TrapezoidBone trapezoid 37975,62999 60791 NCI:Smegma trapezoid,bone 62999 smegma 60791 Inverted index for FMA labels Index for FMA class URIs Entry Cls ids Cls id URI secretion 36792 36792 FMA:Secretion bone,trapezoid 20948,47996 47996 FMA:Bone of Trapezoid trapezoid 20948 20948 FMA:Trapezoid smegma 60947 60947 FMA:Smegma
  • 11. Intro Indexation Extraction Repair Overlapping Evaluation Structural indexation Interval labelling schema • LogMap indexes the “classified” hierarchy. • Each concept is associated with two preorders and intervals. • The cost of typical taxonomical queries is reduced.
  • 12. Intro Indexation Extraction Repair Overlapping Evaluation Structural indexation Interval labelling schema • LogMap indexes the “classified” hierarchy. • Each concept is associated with two preorders and intervals. • The cost of typical taxonomical queries is reduced.
  • 13. Intro Indexation Extraction Repair Overlapping Evaluation Outline Introduction Indexation Mapping extraction Mapping repair Overlapping estimation Evaluation
  • 14. Intro Indexation Extraction Repair Overlapping Evaluation Computing initial anchors Intersection of inverted files Inverted index for NCI labels Index for NCI class URIs Entry Cls ids Cls id URI secretion 49901 49901 NCI:CellularSecretion cellular,secretion 49901 37975 NCI:Trapezoid cellular,secrete 49901 62999 NCI:TrapezoidBone trapezoid 37975,62999 60791 NCI:Smegma trapezoid,bone 62999 smegma 60791 Inverted index for FMA labels Index for FMA class URIs Entry Cls ids Cls id URI secretion 36792 36792 FMA:Secretion bone,trapezoid 20948,47996 47996 FMA:Bone of Trapezoid trapezoid 20948 20948 FMA:Trapezoid smegma 60947 60947 FMA:Smegma
  • 15. Intro Indexation Extraction Repair Overlapping Evaluation Computing initial anchors Intersection of inverted files Entry FMA ids NCI ids Mappings secretion 36792 49901 FMA:Secretion ≡ NCI:CellularSecretion smegma 60947 60791 FMA:Smegma ≡ NCI:Smegma trapezoid 20948 37975, FMA:Trapezoid ≡ NCI:Trapezoid 62999 FMA:Trapezoid ≡ NCI:TrapezoidBone trapezoid,bone 20948, 62999 FMA:Trapezoid ≡ NCI:TrapezoidBone 47996 FMA:Bone of Trapezoid ≡ NCI:TrapezoidBone
  • 16. Intro Indexation Extraction Repair Overlapping Evaluation Computation of confidence values Based on. . . • The string based algorithm ISUB • A principle of locality • Correct mappings (C1 ≡ C2) are likely to have similar scopes • ISUB is used to “map” the corresponding scopes of C1 and C2 • Dice’s coeff. (adapted) provides the similarity between scopes
  • 17. Intro Indexation Extraction Repair Overlapping Evaluation Computation of confidence values FMA:Trapezoid ≡ NCI:Trapezoid (no scope) vs FMA:Trapezoid ≡ NCI:TrapezoidBone (with scope)
  • 18. Intro Indexation Extraction Repair Overlapping Evaluation Mapping discovery Exploiting initial anchors • Also based on the principle of locality • If C1 ≡ C2 is a correct anchor. . . • . . . their respective scopes are likely to have new mappings
  • 19. Intro Indexation Extraction Repair Overlapping Evaluation Mapping discovery Exploiting initial anchors
  • 20. Intro Indexation Extraction Repair Overlapping Evaluation Outline Introduction Indexation Mapping extraction Mapping repair Overlapping estimation Evaluation
  • 21. Intro Indexation Extraction Repair Overlapping Evaluation Propositional Horn representation • LogMap represents the “extended” hierarchies as Propositional Horn clauses • This is key to LogMap’s scalability Propositional FMA (P1) Propositional NCI (P2) (1) Smegma → Secretion (8) Smegma → ExocrineGlandFluid (2) Secretion → PortionBodySusbstance (9) ExocrineGlandFluid → Anatomy (3) PortionBodySusbstance → AnatomicalEntity (10) CellularSecretion → TransmembraneTransport Computed mappings (PM ) (11) TransmembraneTransport → TransportProcess (m4) FMA:Secretion → NCI:CellularSecretion (12) TransportProcess → BiologicalProcess (m5) NCI:CellularSecretion → FMA:Secretion (13) Anatomy ∧ BiologicalProcess → false (m6) FMA:Smegma → NCI:Smegma (14) ExocrineGlandFluid ∧ ExfolCells → Smegma (m7) NCI:Smegma → FMA:Smegma
  • 22. Intro Indexation Extraction Repair Overlapping Evaluation Propositional Horn representation • LogMap represents the “extended” hierarchies as Propositional Horn clauses • This is key to LogMap’s scalability Propositional FMA (P1) Propositional NCI (P2) (1) Smegma → Secretion (8) Smegma → ExocrineGlandFluid (2) Secretion → PortionBodySusbstance (9) ExocrineGlandFluid → Anatomy (3) PortionBodySusbstance → AnatomicalEntity (10) CellularSecretion → TransmembraneTransport Computed mappings (PM ) (11) TransmembraneTransport → TransportProcess (m4) FMA:Secretion → NCI:CellularSecretion (12) TransportProcess → BiologicalProcess (m5) NCI:CellularSecretion → FMA:Secretion (13) Anatomy ∧ BiologicalProcess → false (m6) FMA:Smegma → NCI:Smegma (14) ExocrineGlandFluid ∧ ExfolCells → Smegma (m7) NCI:Smegma → FMA:Smegma
  • 23. Intro Indexation Extraction Repair Overlapping Evaluation Unsatisfiability checking Propositional Horn SAT with Dowling-Gallier (D-G) • LogMap implements the SAT algorithm D-G • D-G is call for every class C and the propositional theory PC : • the rule (true → C); • the propositional representations P1 and P2 of the input ontologies; and • the propositional representation PM of the mappings.
  • 24. Intro Indexation Extraction Repair Overlapping Evaluation Unsatisfiability checking
  • 25. Intro Indexation Extraction Repair Overlapping Evaluation Characteristics of our class satisfiability problem Our class satisfiability algorithm is . . . • sound • If LogMap finds a class unsatisfiable, it is indeed unsatisfiable. • worst-case linear in the size of the (classified) ontologies. • incomplete, but incompleteness is mitigated: • Most of the relevant non-propositional reasoning is already performed when classifying input ontologies independently • Mappings are Horn propositional axioms • Most new entailments caused by the mappings likely to be computable using Horn propositional reasoning only
  • 26. Intro Indexation Extraction Repair Overlapping Evaluation Computing repair plans Recording conflictive mappings • LogMap extends D-G to record conflictive mappings • For example: {m4, m5, m6, m7} • Equivalence mappings are split into two propositional rules. • Repairs may only consider one of the rules.
  • 27. Intro Indexation Extraction Repair Overlapping Evaluation Computing repair plans Recording conflictive mappings • LogMap extends D-G to record conflictive mappings • For example: {m4, m5, m6, m7} • Equivalence mappings are split into two propositional rules. • Repairs may only consider one of the rules.
  • 28. Intro Indexation Extraction Repair Overlapping Evaluation Computing repair plans A “greedy” repair algorithm • The repairs R are computed in order for each unsat. class • The algorithm identifies subsets of the conflictive mappings of increasing size, and stops when a repair is found. • LogMap finds all repairs of “smallest” size. • For example: R1 = {m4} and R2 = {m6} • The repair with less confidence is selected .
  • 29. Intro Indexation Extraction Repair Overlapping Evaluation Computing repair plans A “greedy” repair algorithm • The repairs R are computed in order for each unsat. class • The algorithm identifies subsets of the conflictive mappings of increasing size, and stops when a repair is found. • LogMap finds all repairs of “smallest” size. • For example: R1 = {m4} and R2 = {m6} • The repair with less confidence is selected .
  • 30. Intro Indexation Extraction Repair Overlapping Evaluation Repair of property anchors • Also relies on the intersection of inverted files • However their repair is not yect integrated with D-G • Currently, a candidate mapping between p1 and p2 isvalid only if both their respective domains D1, D2 and ranges R1, R2 are “compatible”. • That is, mappings D1 ≡ D2 and R1 ≡ R2 do not lead to unsatisfiability.
  • 31. Intro Indexation Extraction Repair Overlapping Evaluation Repair of property anchors • Also relies on the intersection of inverted files • However their repair is not yect integrated with D-G • Currently, a candidate mapping between p1 and p2 isvalid only if both their respective domains D1, D2 and ranges R1, R2 are “compatible”. • That is, mappings D1 ≡ D2 and R1 ≡ R2 do not lead to unsatisfiability.
  • 32. Intro Indexation Extraction Repair Overlapping Evaluation Outline Introduction Indexation Mapping extraction Mapping repair Overlapping estimation Evaluation
  • 33. Intro Indexation Extraction Repair Overlapping Evaluation Overlapping estimation • LogMap also returns two fragments representing the overlapping between the input ontologies • Correct mappings are unlikely to involve classes outside these fragments. • The overlapping is performed in two steps: • Computation of ‘weak’ anchors • Module extraction
  • 34. Intro Indexation Extraction Repair Overlapping Evaluation Overlapping estimation Computation of ‘weak’ anchors Extended inverted index for FMA Index for FMA class URIs Lexical entry Cls ids Cls id Cls name ductule,efferent,epithelium 45211 45211 EpitheliumOfEfferentDuctuleOfTestis common,branch,artery 1170,7842 1170 BranchOfCommonCochlearArtery 7842 BranchOfCommonInterosseousArtery Extended inverted index for NCI Index for NCI class URIs Lexical entry Cls ids Cls id Cls name ductule,efferent,epithelium 27924 27924 EfferentDuctuleEpithelium common,branch,artery 1204,8087, 1204 CommonCarotidArteryBranch 27727 8087 CommonIliacArteryBranch 27727 CommonFemoralArteryBranch • Not valid (in general) as candidate mappings. • Useful to detect concepts with similar lexicon
  • 35. Intro Indexation Extraction Repair Overlapping Evaluation Overlapping estimation Computation of ‘weak’ anchors Extended inverted index for FMA Index for FMA class URIs Lexical entry Cls ids Cls id Cls name ductule,efferent,epithelium 45211 45211 EpitheliumOfEfferentDuctuleOfTestis common,branch,artery 1170,7842 1170 BranchOfCommonCochlearArtery 7842 BranchOfCommonInterosseousArtery Extended inverted index for NCI Index for NCI class URIs Lexical entry Cls ids Cls id Cls name ductule,efferent,epithelium 27924 27924 EfferentDuctuleEpithelium common,branch,artery 1204,8087, 1204 CommonCarotidArteryBranch 27727 8087 CommonIliacArteryBranch 27727 CommonFemoralArteryBranch • Not valid (in general) as candidate mappings. • Useful to detect concepts with similar lexicon
  • 36. Intro Indexation Extraction Repair Overlapping Evaluation Overlapping estimation Module extraction • Classes involved in (weak) mappings are used as the module signature. • Concretely, locality-based modules have been used.
  • 37. Intro Indexation Extraction Repair Overlapping Evaluation Outline Introduction Indexation Mapping extraction Mapping repair Overlapping estimation Evaluation
  • 38. Intro Indexation Extraction Repair Overlapping Evaluation Evaluation Used ontologies • SNOMED CT Jan. 2009 version (306, 591 classes) • NCI version 08.05d (66, 724 classes) • FMA version 2.0 (78, 989 classes) • OAEI ontologies: NCI Anatomy (3, 304 classes), Mouse Anatomy (2, 744 classes), conference and benchmarck ontologies (< 200 classes).
  • 39. Intro Indexation Extraction Repair Overlapping Evaluation Evaluation Tasks • Repair of gold standards • Matching large ontologies • Overlapping estimation • Participation in the OAEI 2011
  • 40. Intro Indexation Extraction Repair Overlapping Evaluation Repair of gold standards • UMLS as a reference to align FMA-NCI, FMA-SNOMED and SNOMED-NCI • The OAEI 2010 anatomy track gold standard GS Mappings Repaired Mappings Ontologies Total Unsat. Total v Time (s) FMA-NCI 3,024 655 (96%) 2,898 78 10.6 FMA-SNOMED 9,072 6,179 (89%) 8,111 1,619 81.4 SNOMED-NCI 19,622 20,944 (93%) 18,322 837 812.4 Mouse-NCIAnat 1,520 0 1,520 - -
  • 41. Intro Indexation Extraction Repair Overlapping Evaluation Repair of gold standards Non reported/repaired unsatisfiable class
  • 42. Intro Indexation Extraction Repair Overlapping Evaluation Matching large ontologies Mappings computed by LogMap Found Mapp. Output Mapp. Time (s) Ontologies Total Unsat. Total v Anchors Total FMA-NCI 3,185 597 (94%) 3,000 43 28.3 69.8 FMA-SNOMED 2,068 570 (99%) 2,059 32 35.6 92.2 SNOMED-NCI 14,250 10,452 (95%) 13,562 1,540 528.6 1370.0
  • 43. Intro Indexation Extraction Repair Overlapping Evaluation Matching large ontologies Precision and recall w.r.t. Gold Standard Found Mappings Output Mappings Ontologies Precision Recall F-score Precision Recall F-score FMA-NCI 0.767 0.843 0.803 0.811 0.840 0.825 FMA-SNOMED 0.767 0.195 0.312 0.771 0.195 0.312 SNOMED-NCI 0.753 0.585 0.659 0.786 0.582 0,668
  • 44. Intro Indexation Extraction Repair Overlapping Evaluation Matching large ontologies Lexical similarity of GS mappings GS ISUB ≥ 0.95 GS ISUB ≥ 0.80 GS ISUB ≥ 0.50 Ontologies % Mapp. Recall % Mapp. Recall % Mapp. Recall FMA-NCI 88% 0.96 93% 0.90 97% 0.87 FMA-SNOMED 21% 0.95 64% 0.30 92% 0.21 SNOMED-NCI 62% 0.94 75% 0.77 89% 0.65
  • 45. Intro Indexation Extraction Repair Overlapping Evaluation Overlapping estimation Overlapping computed by LogMap Ontologies Overlapping for O1 Overlapping for O2 O1-O2 O0 1 % O1 Recall O0 2 % O2 Recall FMA-NCI 6,512 8% 0.95 12,867 19% 0.97 FMA-SNOMED 20,278 26% 0.92 50,656 17% 0.94 SNOMED-NCI 70,705 23% 0.86 33,829 51% 0.96
  • 46. Intro Indexation Extraction Repair Overlapping Evaluation Participation in the OAEI Campaign 2011 Anatomy track (2nd out of 6) System Precision Recall F-score Time (s) Incoherence AgrMaker 0.943 0.892 0.917 634 - LogMap 0.948 0.846 0.894 24 0% CODI 0.965 0.825 0.889 1,890 - Lily 0.814 0.734 0.772 563 - AROMA 0.742 0.625 0.679 39 - CSA 0.465 0.757 0.576 4,685 -
  • 47. Intro Indexation Extraction Repair Overlapping Evaluation Participation in the OAEI Campaign 2011 Conference track (3rd out of 14) System Precision Recall F-score Incoherence YAM++ 0.78 0.56 0.65 - CODI 0.74 0.57 0.64 - LogMap 0.84 0.5 0.63 0% AgrMaker 0.65 0.59 0.62 - MassMatch 0.83 0.42 0.56 - CSA 0.5 0.6 0.55 - CIDER 0.64 0.45 0.53 - . . . . . . . . . . . . . . .
  • 48. Intro Indexation Extraction Repair Overlapping Evaluation Participation in the OAEI Campaign 2011 Benchmark track (8 out of 14) • LogMap relies on the lexical similarities • LogMap (Precision: 0.99, Recall: 0.50, F-measure: 0.67) • MapSSS (Precision: 0.97, Recall: 0.64, F-measure: 0.77)
  • 49. Conclusions and future work • LogMap is a highly scalable ontology matching tool with builtin reasoning and diagnosis capabilities. • LogMap is the only matching system that has shown to be able to deal with ontologies containing tens and hundreds of thousands of classes. • There is still plenty of room for improvement. • LogMap 2.0 • Major improvements w.r.t. the current version • Will be soon available for download.
  • 50. Conclusions and future work • LogMap is a highly scalable ontology matching tool with builtin reasoning and diagnosis capabilities. • LogMap is the only matching system that has shown to be able to deal with ontologies containing tens and hundreds of thousands of classes. • There is still plenty of room for improvement. • LogMap 2.0 • Major improvements w.r.t. the current version • Will be soon available for download.
  • 51. Questions? Contact • LogMap Project: http://www.cs.ox.ac.uk/isg/projects/LogMap/ • ernesto.jimenez.ruiz@gmail.com Thank you for your attention Acknowledgements • Funding support of the Royal Society and EPRSC. • Organizers of the OAEI campaign. • R. Berlanga and V. Nebot.