2. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
The problem: RDF data interlinking
3
http://data.bnf.fr/12144801/edgar allan poe the gold bug/, dc:title, “The gold bug”
The gold bug
title
creator
en
E. Poe
lang
firstname lastname
Writer
Work
rdf:type
rdf:type
b a1 a2
Baudelaire Malarm´e
The raven
orig
name name
name
orig
authortranslator translator
Person
Book
rdf:type
rdf:type
≈
≥
≤
≥
J´erˆome Euzenat Data interlinking 3 / 0
3. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Goal of the lecture
Provide an overview of the problem of data interlinking
Describe broad categories of solutions
Point to useful tools for generating links
Mostly about generating links, not on finding how to generate them
J´erˆome Euzenat Data interlinking 4 / 0
4. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Outline
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
J´erˆome Euzenat Data interlinking 5 / 0
5. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Data interlinking
I use (with the same meaning):
instance matching
entity linking
data interlinking
I do not use:
record linkage
data deduplication
entity reconciliation
coreference resolution
J´erˆome Euzenat Data interlinking 6 / 0
6. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
The data interlinking problem
Data interlinking is the task of finding same entities within different datasets
(RDF graphs).
Data source 1 Data source 2
interlinking
owl:sameAs
J´erˆome Euzenat Data interlinking 7 / 0
7. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
The data interlinking process
Data source
Data source
interlinking Resulting linksSample links
parameters
resources
J´erˆome Euzenat Data interlinking 8 / 0
8. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
The data interlinking process (2)
d
d
extraction
Linkage spec
generation l
interlinking
J´erˆome Euzenat Data interlinking 9 / 0
9. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Approaches to data interlinking
There are two main approaches to data interlinking:
similarity-based: resources are compared through a similarity measure
and if they are similar enough, they are the same.
key-based: sufficient conditions for two resources to be the same are
induced and used to find same entities
J´erˆome Euzenat Data interlinking 10 / 0
10. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Classification of similarities
Data interlinking techniques may be based on:
Data ID (URIs);
Data keys
External relations: (explicit or implicit) links to other resources
Data description (content)
J´erˆome Euzenat Data interlinking 12 / 0
11. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Manual resource matching
URI1 URI2
Manual observation
owl:sameAs
This does not scale.
But may be good for a first sample or reference.
Crowdsourcing?
J´erˆome Euzenat Data interlinking 13 / 0
12. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
URI matching
URI1 URI2
URI transformation
owl:sameAs
http://dbpedia.org/resource/Johann Sebastian Bach owl:sameAs
http://www.lastfm.fr/music/Johann+Sebastian+Bach
http://rdf.insee.fr/geo/regions-2011.rdf#REG 11 ?
http://ec.europa.eu/eurostat/ramon/rdfdata/nuts2008/FR10
J´erˆome Euzenat Data interlinking 14 / 0
13. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Id matching
id id
Finding same ids
owl:sameAs
You can find such types of ids:
Social security numbers
ISBN, DOI, MAC addresses, etc.
authorities: ISO (countries, languages), IATA (airports)
Most databases are built on such identifiers. . . but they are often local to the
database.
J´erˆome Euzenat Data interlinking 15 / 0
14. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Context-based similarity
URI1 URI2
VIAF
Context-based
“similarity”
owl:sameAs
Process:
Project your data into another resource (DBPedia, geonames, viaf, etc.)
Assess relations between considered terms
Import the relation in the dataset
J´erˆome Euzenat Data interlinking 16 / 0
15. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Content-based similarity
3
The gold bug
title
creator
E. Poe
firstname lastname
Writer
Work
rdf:type
rdf:type
b a1 a2
Baudelaire Poe
Le corbeauLe scarab´e d’or
orig
name name
title
authortranslator
Person
Book
rdf:type
rdf:type
Compute similarity
owl:sameAs
Two main approaches:
bag of text
structured similarity
J´erˆome Euzenat Data interlinking 17 / 0
16. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Term-based similarity
The gold bug
E. Poe
firstname lastname
Writer
Work
type
type Baudelaire Poe
Le corbeau
Le scarab´e d’or
orig
name name
title
authortranslator
Person
Book
type type
Compute “bag of words” similarity
owl:sameAs
Various tools:
Normalisation (Stemmer, Tokenizers)
Use of linguistic resources (Wordnet)
Translation
Many similarity measures, especially from information retrieval
J´erˆome Euzenat Data interlinking 18 / 0
17. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Structure similarity
title
creator
firstname lastname
type
type orig
name name
title
authortranslator
type
type
Compute structure similarity
owl:sameAs
Techniques:
Based on graph matching techniques
Can be used to learn weights on properties (but need matching)
Problem: scalability
J´erˆome Euzenat Data interlinking 19 / 0
18. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Cross-lingual RDF data interlinking
http://a.org/Mus999 France
Mus´ee du Louvre
nom
lieu
Paris
99,rue de Rivoli
75001
adresse
ville
rue
zip
http://bb.cn/盧浮宮
盧浮宮
法國巴黎
稱號
位於
owl:sameAs ?
J´erˆome Euzenat Data interlinking 20 / 0
19. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Similarity-based data interlinking
RESOURCE RESOURCE
SIMILARITY
owl:sameAs ?
Hypothesis: ↑ similarity ↑ probability that it is the same object
DOCUMENT DOCUMENTSIMILARITY
owl:sameAs ?
Yuzhong Qu, Wei Hu, Gong Cheng: Constructing virtual documents for ontology matching. WWW 2006: 23-31.
DOCUMENT(zh) DOCUMENT(en)
DOCUMENT(en)
translation
DOCUMENT(zh)
translationSIMSIM
SIMILARITY
owl:sameAs ?
BabelNet(IDs) BabelNet(IDs)SIMILARITY
owl:sameAs ?
J´erˆome Euzenat Data interlinking 21 / 0
20. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
General cross-lingual interlinking
framework
1 Virtual
Documents
3 Similarity
Computation
4 Link
Generation
2 Language
Normalization
J´erˆome Euzenat Data interlinking 22 / 0
21. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Building virtual documents by levels
http://dbpedia.org/resource/Charles Perrault
Charles Perrault
dbpedia:France
Level 1
France is a sovereign
country in Western Eu-
rope that includes over-
seas regions and territo-
ries. . .
Level 2
J´erˆome Euzenat Data interlinking 23 / 0
22. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Machine translation: parameters
1 Virtual
Documents
2.1 Machine
Translation
2.2 NLP
Preprocessing
3 Similarity
Computation
4 Link
Generation
Level 1
Level 2
ZH→EN
Lowercase+Tokenize
+ Filter stop words
+ Stemming (Porter)
+ Bigrams (terms)
TF+cosine
TF*IDF+cosine
Greedy
Hungarian
32 settings have been explored in total
J´erˆome Euzenat Data interlinking 24 / 0
23. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Lcase+Tokenization with TF*IDF at
Level 1
0 - 0.11
0.11 - 0.15
0.15 - 0.25
0.25 - 0.35
0.35 - 0.45
0.45 - 1
J´erˆome Euzenat Data interlinking 25 / 0
26. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Database keys
A set of attributes which uniquely identifies elements of a relation
e.g., Book: isbn, People: fistname, lastname, birthplace, birthdate
usually given and used to check integrity
They may be used for identifying same entities across two databases.
But they require alignments.
J´erˆome Euzenat Data interlinking 29 / 0
27. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Example of interlinking with keys and
alignments
Are the resources bnf:cb118949856 and bne:XX1721208 the same?
if BNF ontology states foaf:Person owl:hasKey {foaf:name, dc:dates}
and we have the following alignment
foaf:Person
bnf:cb118949856
Albert Camus
07-11-1913
04-01-1960
Romancier, dramaturge et essayiste
http://id.loc.gov/vocabulary/countries/fr
Mondovi (Alg´erie)
1913-1960
foaf:name
rda:dateOfBirth
rda:dateOfDeath
rda:biographicalInformation
rda:countryAssociatedWithThePerson
rda:placeOfBirth
dc:dates
frbrer:C1005
bne:XX1721208
Camus, Albert
1913-1960
Aut [...]1980
frber:P3039
frber:P3040
rda:sourceConsulted
≡
≡
≈
≈
owl:sameAs
J´erˆome Euzenat Data interlinking 30 / 0
28. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Key-based interlinking methods
Database keys allow for identifying entities: if they are aligned, this can be
used for linking.
Advantages
they are logically grounded
they allow to minimize the number of properties to compare (if we use
minimal keys)
Drawbacks
Require alignment between properties and classes
Very few key axioms are available, and they are not necessarily useful for
interlinking
We overcome these drawbacks by introducing link keys
J´erˆome Euzenat Data interlinking 31 / 0
29. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Link key
A link key
{ p1, q1 , . . . , pn, qn }{ p1, q1 , . . . , pm, qm } linkkey c, d
holds iff
For all pairs of instances a and b belonging respectively to classes c and d of
ontologies O and O ,
if a and b share at least one value (object) for each pairs of
properties pi and qi respectively,
and a and b share all their values (objects) for each pairs of
properties pi and qi respectively,
then they are the same ( a, owl:sameAs, b ).
Example:
{ foaf:name, frbr:P3039 }{ dc:dates, frbr:P3040 } linkkey foaf:Person, frbr:C1005
J´erˆome Euzenat Data interlinking 32 / 0
30. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Link key extraction
Problem: How to induce such link keys from data?
The number of set of pairs of properties is exponential
Our approach:
discover only candidate link keys.
evaluate them in order to select only the “good” ones
J´erˆome Euzenat Data interlinking 33 / 0
31. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Candidate link key
A candidate link key is a set of property pairs { p1, q1 , . . . , pk, qk } that
1. would generate at least one link if used as a link key
2. is maximal for at least one link, or is the intersection of several
candidate link keys
J´erˆome Euzenat Data interlinking 34 / 0
32. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Supervised selection measures
If a sample of reference links is available:
Positive examples (L+) : a set of owl:sameAs links
Negative examples (L−) : a set of owl:differentFrom links
Idea: Approximate precision and recall on that sample
Definition (Relative precision and recall)
precision(K, L+
, L−
) =
|L+ ∩ LD,D (K)|
|(L+ ∪ L−) ∩ LD,D (K)|
recall(K, L+
) =
|L+ ∩ LD,D (K)|
|L+|
J´erˆome Euzenat Data interlinking 35 / 0
33. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Unsupervised selection measures
When no reference link is available.
Idea: measuring how close the extracted links would be from
one-to-one and total.
Definition (Discriminability)
disc(K, D, D ) =
min(|{a : a, b ∈ LD,D (K)}|, |{b : a, b ∈ LD,D (K)}|)
|LD,D (K)|
Definition (Coverage)
cov(K, D, D ) =
|{a : a, b ∈ LD,D (K)} ∪ {b : a, b ∈ LD,D (K)}|
|{a : c(a) ∈ D} ∪ {b : d(b) ∈ D }|
J´erˆome Euzenat Data interlinking 36 / 0
34. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Experimental evaluation
These selection measures were evaluated on public datasets.
Finding links between French municipalities described in two different
datasets:
Insee dataset: 36700 instances;
Geonames dataset: 36552 instances.
The reference link set is composed of:
Positive links: 36552 owl:sameAs statements;
owl:differentFrom links derived from owl:sameAs links (closed world
assumption).
J´erˆome Euzenat Data interlinking 37 / 0
36. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Evaluation
Correlation between the harmonic means of discriminability and coverage and
F-measure:
bad F-measure≈ 0
high F-measure≈ .99
good F-measure≈ 0.89
{1} {2} {3, 4} {5, 6}
{7, 1} {2, 1} {3, 4, 1} {3, 2, 4}
{3, 7, 4, 1} {3, 2, 4, 1}
{3, 7, 2, 4, 1}
h-mean(disc.,cov)≈ .99 h-mean(disc.,cov)≈ .89 h-mean(disc.,cov) ≈ 0
1 = nom, name 2 = nom, alternateName
3 = subdivisionDe, parentFeature 4 = subdivisionDe, parentADM3
5 = codeINSEE, population 6 = codeCommune, population
7 = nom, officialName
J´erˆome Euzenat Data interlinking 38 / 0
37. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Why using ontologies?
Because it is obvious that we must compare the instances of equivalent
classes based on equivalent properties.
More precisely:
For reducing the search space for finding link keys and similarities
For reducing the scope of linkage specifications
Because not the same linkage rules work for the same classes
Because classes and properties are hint like others of the similarity
between resources
Ex. With similarity and with keys
J´erˆome Euzenat Data interlinking 40 / 0
38. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Data interlinking through a common
ontology
o
URI1 URI2
Resource matching
of datasets
described by the
same ontology
owl:sameAs
J´erˆome Euzenat Data interlinking 41 / 0
39. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Matching with a common ontology
+ Focus the search: only match instances of the same class;
– Not sufficient: it remains to identify corresponding entities
+ If keys are defined (OWL 2), this is done;
+ At least we know which properties to compare;
– Inferring secondary keys may be useful;
– Correcting discrepancies: record linkage.
J´erˆome Euzenat Data interlinking 42 / 0
40. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Record linkage
Name Johann
Date 1665-03-21
Place M¨unchen
NameJohannes
Date31/03/1665
PlaceMonaco di Bavaria
Having a common ontology does not solve all problems.
J´erˆome Euzenat Data interlinking 43 / 0
41. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Different types of mismatch
Different domains, connected (BIM, Energy demand)
⇒ few correspondences, any type
Same domain, different models (engineer, policy maker)
⇒ many correspondences, mostly equivalence
Same domain, different granularity (city management, building design)
⇒ many correspondences, mostly subsumption
J´erˆome Euzenat Data interlinking 44 / 0
42. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Data interlinking with different
ontologies (implicit alignment)
o o
URI1 URI2
Resource matching
of datasets
described by
different ontologies
owl:sameAs
J´erˆome Euzenat Data interlinking 45 / 0
43. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Data interlinking with different
ontologies (explicit alignment)
o o
URI1 URI2
A
Resource matching
of datasets
described by
different ontologies
owl:sameAs
J´erˆome Euzenat Data interlinking 46 / 0
44. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Ontology matching for data interlinking
o o
URI1 URI2
Ontology matching
A
Data interlinking
owl:sameAs
J´erˆome Euzenat Data interlinking 47 / 0
45. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Heterogeneity problem
Resources being expressed in different ways must be reconciled before being
used.
Mismatch between formalized knowledge can occur when:
different languages are used (OWL vs. Topic maps);
different terminologies are used:
English vs. Chinese;
Book vs. Monograph.
different models are used:
different classes: Autobiography vs. Paperback;
classes vs. property: Essay vs. literarygenre;
classes vs. instances: One physical book as an instance vs. one work as
an instance.
different scopes and granularity are used.
Only books vs. cultural items vs. any product;
Books detailed to the print and translation level vs. books as works.
J´erˆome Euzenat Data interlinking 48 / 0
46. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Ontology alignment
Item
DVD
Book
Paperback
Hardcover
CD
price
title
doi
creator
pp
author
integer
string
uri
Person
Monograph
Essay
Literary critics
Politics
Biography
Autobiography
Literature
pages
isbn
author
title
subject
Human
Writer
≥
≥
≥
≤
≥
J´erˆome Euzenat Data interlinking 49 / 0
49. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Example: Administrative ontology
Territoire FR
Pays
Region
Departement
Arrondissement
Commune
code
nom
chef-lieu
subdivision
integer
string
J´erˆome Euzenat Data interlinking 52 / 0
50. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Example: NUTS dataset
NUTSRegion table:
level code name hasParentRegion
0 FR FRANCE
1 FR1 ˆILE DE FRANCE FR
2 FR10 ˆIle de France FR1
3 FR101 Paris FR10
3 FR104 Essonne FR10
J´erˆome Euzenat Data interlinking 53 / 0
51. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Example: Linking INSEE and NUTS
NUTS: Nomenclature of territorial units for statistics
#INSEE INSEE name NUTS Level #NUTS
1 Pays 0 34
1 142
26 R´egion 2 344
100 D´epartement 3 1488
342 Arrondissement
4036 Canton 4
52422 Commune 5
J´erˆome Euzenat Data interlinking 54 / 0
52. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Example: Linking INSEE and NUTS
Territoire FR
Pays
Region
Departement
Commune
PAYS FR
REG 11
DEP 75
DEP 77
DEP 78
COM 75056
Region
Country
NUTSRegion
LAURegion
FR
UK
FR1
FR10
FR101
FR102
FR103
owl:sameAs
owl:sameAs
owl:sameAs
owl:sameAs
owl:sameAs
J´erˆome Euzenat Data interlinking 55 / 0
54. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Example: interesting sets
nuts
onsordnance s. igninsee
geonames dbpedia freebase
J´erˆome Euzenat Data interlinking 57 / 0
55. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
A simple algorithm
Find matching concepts [concept matching];
For each of them, determine matching properties based on the similarity
between their values in both datasets [property matching];
From them find property combinations identifying corresponding entities
[key extraction];
Link corresponding entities [link generation].
For instance, nom/RegionINSEE ⊆ name/NUTSRegionNUTS and moreover
they are unambiguous.
J´erˆome Euzenat Data interlinking 58 / 0
56. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
INSEE and NUTS: ontology alignment
Territoire FR
Pays
Region
Departement
Arrondissement
Canton
Commune
code
nom
chef-lieu
subdivision
integer
string
Region
Country
NUTSRegion
LAURegion
name
level
code
hasSubRegion
=
≤
≤
≤
≤
≤
=
J´erˆome Euzenat Data interlinking 59 / 0
57. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Simple alignments are not sufficient
Territoire FR
Region
Departement
Commune
nom
DEP 75
nom
COM 75056
nom
Region
NUTSRegion
name
FR101
name
Paris
=
=
=
≤
≤
≤
=
=
=
J´erˆome Euzenat Data interlinking 60 / 0
58. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Expressive alignments are necessary
Region
NUTSRegion
level
hasParentRegion
2 =
FR
=
=
subdivision hasSubRegion
=
nom name
=
J´erˆome Euzenat Data interlinking 61 / 0
59. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
What does this mean?
Ontology alignments are schema-level expression of correspondences;
They are useful for focussing the search;
Expressive alignments are necessary;
They can be turned into SPARQL-based link generators.
but it is also necessary to express instance level constraints:
for converting data (e.g., mph vs. m/s);
for expressing matching constraint on data (e.g., similarity).
J´erˆome Euzenat Data interlinking 62 / 0
60. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Data interlinking and ontology matching
d
o
d
oMatcher
A
Generator
l
J´erˆome Euzenat Data interlinking 63 / 0
61. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Tools for data interlinking
Linkage spec extraction generation
similarity LIMES Silk, LIMES, OpenRefine
key LinkKeyDisco SPARQL
J´erˆome Euzenat Data interlinking 65 / 0
62. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Silk
Silk is a robust software for interlinking data sets.
It relies on an expressive specification of linking conditions:
Declare data sources (DataSource);
Circumscribe entities to compare (Source/TargetDataset);
Describe how to compare them (LinkageRule):
Select properties to compare through paths (Input);
Compute distances between them (Compare+threshold);
Aggregate all comparisons (Aggregate);
Select those pairs of entities to be linked (Filter);
Generate links (Output+thresholds).
J´erˆome Euzenat Data interlinking 66 / 0
73. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Other issue: performances
n × m n
3 × m
3 + n
3 × m
3 + n
3 × m
3
10 × 10 = 100
1000 × 1000 = 1000000
100000 × 100000 = 10000000000
J´erˆome Euzenat Data interlinking 77 / 0
74. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Other issue: performances
Blocking: index+cluster
Dataset 1 Dataset 2
J´erˆome Euzenat Data interlinking 78 / 0
75. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Other issue: performances
Blocks can be obtained from:
clustering values in index
predefined block (based on equality)
classes in an ontology (blocks are defined as class expressions)
J´erˆome Euzenat Data interlinking 79 / 0
76. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Other issue: evaluation
d
d
interlinking l
Reference links
evaluation
Precision
Recall
F-measure
J´erˆome Euzenat Data interlinking 80 / 0
77. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Other issue: learning
d
d
Training links interlinking l
evaluation
J´erˆome Euzenat Data interlinking 81 / 0
78. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Conclusion
Data interlinking is one of the most critical task in linked data
. . . but not only, e.g. smart cities
If faces many problems due to:
heterogeneity (format, languages, convention)
size
Interlinking can be based on similarities or keys
There is active work to infer such interlinking pattern
J´erˆome Euzenat Data interlinking 82 / 0
79. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Further reading
T. Heath, C. Bizer, Linked Data: Evolving the Web into a Global Data
Space, Morgan & Claypool (US), 2011 http://linkeddatabook.com/
J. Euzenat, P. Shvaiko, Ontology matching, 2nd ed., Springer,
Heildelberg (DE), 2013 http://book.ontologymatching.org
K. Stefanidis, V. Efthymiou, M. Herschel, V. Christophides, Entity
Resolution in the Web of Data, Tutorial, WWW conference, Seoul
(KR), 2014 http://www.csd.uoc.gr/~vefthym/er/
Silk http://silk-framework.com/
Alignment API http://alignapi.gforge.inria.fr
Al 4 SC http://al4sc.inrialpes.fr
J´erˆome Euzenat Data interlinking 83 / 0
80. Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Thanks
To my colleagues Manuel Atencia, J´erˆome David, Nicolas Guillouet and
Fran¸cois Scharffe
The Datalift and Lindicle projects
The Ready4SmartCities project
J´erˆome Euzenat Data interlinking 84 / 0