Semantic Similarity Assessment to Browse Resources exposed as Linked Data: an...
Presentation at MTSR 2012
1. Date: 30/11/2012
SSONDE: Semantic Similarity On
liNked Data Entities
Riccardo Albertoni
ralbertoni@delicias.dia.fi.upm.es
Ontology Engineering Group. Departamento de Inteligencia Artificial
Facultad de Informática
Universidad Politécnica de Madrid
Joint work with Monica De Martino (CNR-IMATI-GE)
MTSR 2012,
6th Metadata and Semantics Research Conference
28-30 November 2012 - Cádiz (Spain)
2. 2
Presentation Outline
1. How SSONDE fits with other linked data
technologies
• What is it for? what is it not for?
2. Characteristics of instance similarity in SSONDE
• The theory behind SSONDE’s similarity is detailed in
• Albertoni R. and De Martino M.; Asymmetric and
context dependent semantic similarity among
ontology instances, Journal of Data Semantics,
LNCS, 2008.
3. SSONDE Architecture and Examples on Linked
Data
Riccardo Albertoni
3. 3
Linked data Crawling architectural pattern
Riccardo Albertoni
SSONDE
LDSPIDER/FUSE
KI
LDIF
Cluster analysis Explorative search
on resources
Build analysis
services
Tom Heath and Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space (1st
edition). 1-136. Morgan & Claypool
4. 4
SSONDE Instance similarity
is not
to align ontologies/schemas;
to interlink/consolidate entities;
aims at
• providing a method for comparing entities represented as
instances in ontology driven repository or as entities
exposed in linked data;
• supporting in explorative searches.
assumes all the integration steps are done
Actually, it works at the Application Layer of the Linked
Data Crawling Architectural Pattern
main characteristics (make SSONDE unique in its kind)
Context to represent similarity criteria (algorithm parameters);
Asymmetry to emphasize containment between instances.
Example: comparing researchers
5. 5
Presentation Outline
1. How SSONDE fits with other linked data
technologies
• What is it for? what is it not for?
2. Characteristics of instance similarity in SSONDE
• The theory behind SSONDE’s similarity is detailed in
• Albertoni R. and De Martino M.; Asymmetric and
context dependent semantic similarity among
ontology instances, Journal of Data Semantics,
LNCS, 2008.
3. SSONDE Architecture and Examples on Linked
Data
Riccardo Albertoni
7. 7
• Common
publications
• Common research
projects
• Similar research
interests
Different Contexts
the researchers, publications, … are instances
Researcher’s
Experience
Researchers’
Scientific
Interest
• Age
• Number of
publications
• Number of projects
Contexts
Researchers’ Features
(Data/Object properties)
considered in the Sim.
It is used only in
this context!!
They are used
In both the
contexts!!
8. 8
[ResearchStaff, Interest]{{{TopicName,Inter}},{{RelatedTopic, Inter} }}
Formalization of Application Context
A function that for each recursion path
specifies data/objects properties and
which operations to consider
Example
• Common publications
• Common research
project
• Similar research interest
Researchers’
Scientific
Interest
[ResearchStaff] {{Φ}, {{Publication, Inter} {WorkAtProject, Inter}
{interest, Simil}}}
9. 9
Why an Asymmetric Similarity?
Sim(a,b) might differ from Sim(b,a)
• Sim is not the inverse of a metric distance metric properties
cannot be exploited to prune comparisons
Here asymmetry is adopted to highlight the
containment between instances A, B
Example of containment: (Comparing wrt publications only)
• A is Ph.D student who has always published with his tutor
B,
A
B
pub 3
pub 1
pub 2
Aiscontainedin B!!! (A<<B)
A can be replaced by B
B is notcontainedin A!!!
If you replace B with A
some experience got lost !!
10. 10
SSONDE’s Asymmetric Similarity returns
Sim(A,B) ranges in [0,1]
It is proportional to the number of data and
object property values that A shares with B
• A is contained in B Sim(A,B)=1
• If A is not contained in B Sim(A,B)<1
• If A and B don’t share any “features” Sim(A,B)=0
• If A has exactly the same characteristics of B (A<<B,
B<<A) Sim(A,B) = Sim(B,A) = 1
11. 11
Results comparing young and senior researchers of IMATI
Research Experience Research Interest
The darkest is the matrix value the more is the similarity
12. 12
Presentation Outline
1. How SSONDE fits with other linked data
technologies
• What is it for? what is it not for?
2. Characteristics of instance similarity in SSONDE
• The theory behind SSONDE’s similarity is detailed in
• Albertoni R. and De Martino M.; Asymmetric and
context dependent semantic similarity among
ontology instances, Journal of Data Semantics,
LNCS, 2008.
3. SSONDE Architecture and Examples on Linked
Data
Riccardo Albertoni
13. 13
SSONDE
Output
TDB
Rep.
SDB
Rep.
RDF
Dumps
Configuration Similarity
Context Layer
Ontology Layer
Data Layer
Data wrappers
JENA
TDB
JENA
SDB
JENA
MEM
List of Instances
Java Class to
generate the list
Ref. Context
Ref. Rules (e.g.,
JENA rules)
Similarity matrix in
CSV
n-most similar
entities
In JSON
...Virtuoso
Wrppr
virtuoso
Kind of Store
….
WEBOF
DATA
RDF
Dumps
HTTP DEREFERENCIABLE
URIs
SPARQL
End Points
Third parties
Served Linked dataset
Crawling architectural pattern
LDIFLDSpider +Fuseki
Linked data consumption
Local Data Store
/Cache
SSONDE ARCHITECTURE
14. 14
SSONDE: a building block for new analysis services
SSONDE applied on “real linked data”
• Analysing Habitat and Species
• published in NatureSDIplus (ECP-2007-GEO-317007), a
European project developing a Spatial Data Infrastructure for
Nature Conservation.
• to rank habitats according to the species they host an
insight into inter-dependencies between habitats and
species
• Analysing overlaps among scientific interests
• Subset of linked dataset provided data.cnr.it as part of
SemanticScout framework by third parties (Gangemi et al)
• to compare IMATI-CNR researcher according to their
research interests
Riccardo Albertoni
17. 17
Configuration file 1
{ "StoreConfiguration":{
"KindOfStore":"JENATDB",
"RDFDocumentURIs":[ ],
"TDBDirectory":"data/CNRIT/TDB-0.8.9/CNRR/"
},
"InstanceConfiguration":{
"InstanceURIsClass":"application.dataCNRIt.GetResearcherIMATIplusCoauthor"
},
"OutputConfiguration":{
"KindOfOutput":"JSONOrderedResult",
"NumberOfOrderedResult":”20",
"FilePath":"conf/dataCNRIt/ComplexContextResearchInterest/CRRIIntPub.res.json"
},
"ContextConfiguration":{
"ContextFilePath":"conf/dataCNRIt/ComplexContextResearchInterest/CCRIIntPub.ctx"
}
}
Riccardo Albertoni
List of LOD Entities URI
Java class Implementing ListOfInputInstances
Similarity Matrix CSV - JSON encoding of top n-most
similar
Context Encoded in a format in-house text format/
hopefully soon in JSON
18. 18
Crawled by Data.CNR.it
Crawled by DBPEDIA
Data.cnr.it – defining a context
Riccardo Albertoni
Res 226
pub: 22
Topic:25Res 225
Topic:26
pub: 26
Topic:2
pub: 29
Res 226
Topic:27
Topic:23
skos:broader
dc:subject
pub:autoreCNRdi
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX pub: <http://www.cnr.it/ontology/cnr/pubblicazioni.owl#>
[owl:Thing, dc:subject]-> {{},{(skos:broader, Inter)}}
[owl:Thing]-> {{}, { (pub:autoreCNRDi, Inter),(dc:subject, Simil)}}
No data
properties are
considered in this
context
Publications
Interests
Interest Hierarchy
20. 20
Hierarchical clustering: Scientific cluster are discovered
Hierarchical Clustering Hierarchical Clustering Explorer, 3.0, Human-Computer
Interaction Lab University of Maryland. http://www.cs.umd.edu/hcil/multi-cluster/.
21. 21
What next?
(i) semantic similarity optimization:
(i) the caching of intermediate similarity results
(ii) the adoption of MapReduce paradigm to speed up the
assessment of semantic similarity;
(ii) domain driven extensions at data layer:
(i) defining new data layer measures suited for geo-
referenced entities
(ii) the multilingual similarity
(iii) definition of interfaces sifting entities according to
their similarity exploiting visualization frameworks
such as Exibit, Google visualization and JavaScript
InfoVis Toolkit.
Riccardo Albertoni
22. 22
THANKS for your kind attention!!!
Questions/ Discussion / Suggestion
Riccardo Albertoni
• SSONDE can be deployed in some of your future projects
(proposal)
• You are interested in contributing to SSONDE Open
framework
Do not hesitate to contact us if
SSONDE framework
• pushes our instance similarity as a ready-to-go tool for the
analysis of linked data.
• its Java Code available in Google Code
• http://purl.oclc.org/NET/SSONDE
• licenced as open source code (GNU GPL v3)
23. 23
SSONDE Framework
• R. Albertoni, M. De Martino, SSONDE: Semantic Similarity On liNked Data Entities, 6th Metadata
and Semantics Research Conference, 28-30 November 2012 - Cádiz (Spain) [to appear]
• Framework Installation & use http://code.google.com/p/ssonde/wiki/GettingStarted
Semantic Similarity Theoretical Framework
• Albertoni R. and De Martino M.; Asymmetric and context dependent semantic similarity among
ontology instances, Journal of Data Semantics, LNCS, 2008.
• Albertoni R. and De Martino M.;. Semantic similarity of ontology instances tailored on the
application context. Full paper at On the Move to Meaningful Internet Systems 2006: CoopIS, DOA,
GADA, and ODBASE, volume 4275 of LNCS, pages 1020–1038. Springer, 2006.
Issues adapting theoretical framework to Linked Data
• Albertoni R., De Martino M.; Semantic Similarity and Selection of Resources Published
According to Linked Data Best Practice, OnToContent 2010, Part of the OTM (OTM'10)
Further Applications
Comparing EUNIS habitats wrt their species
• Albertoni R., De Martino M.; Semantic Technology to Exploit Digital Content Exposed as Linked
Data, eChallenges e-2011, 26-28 October 2011 Florence, Italy
Comparing shapes metadata (not Linked Data)
• Albertoni R., De Martino M.; Using Context Dependent Semantic Similarity to Browse
Information Resources: an Application for the Industrial Design, First workshop on multimedia
Annotation and Retrieval enabled by Shared Ontologies, Genoa, Italy, (2007)
A complete list of references on SSONDE and its Instance Similarity