SlideShare une entreprise Scribd logo
1  sur  81
Télécharger pour lire hors ligne
Data interlinking
J´erˆome Euzenat
Montbonnot, France
Jerome.Euzenat@inria.fr
June 10, 2015
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
The problem: RDF data interlinking
3
http://data.bnf.fr/12144801/edgar allan poe the gold bug/, dc:title, “The gold bug”
The gold bug
title
creator
en
E. Poe
lang
firstname lastname
Writer
Work
rdf:type
rdf:type
b a1 a2
Baudelaire Malarm´e
The raven
orig
name name
name
orig
authortranslator translator
Person
Book
rdf:type
rdf:type
≈
≥
≤
≥
J´erˆome Euzenat Data interlinking 3 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Goal of the lecture
Provide an overview of the problem of data interlinking
Describe broad categories of solutions
Point to useful tools for generating links
Mostly about generating links, not on finding how to generate them
J´erˆome Euzenat Data interlinking 4 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Outline
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
J´erˆome Euzenat Data interlinking 5 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Data interlinking
I use (with the same meaning):
instance matching
entity linking
data interlinking
I do not use:
record linkage
data deduplication
entity reconciliation
coreference resolution
J´erˆome Euzenat Data interlinking 6 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
The data interlinking problem
Data interlinking is the task of finding same entities within different datasets
(RDF graphs).
Data source 1 Data source 2
interlinking
owl:sameAs
J´erˆome Euzenat Data interlinking 7 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
The data interlinking process
Data source
Data source
interlinking Resulting linksSample links
parameters
resources
J´erˆome Euzenat Data interlinking 8 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
The data interlinking process (2)
d
d
extraction
Linkage spec
generation l
interlinking
J´erˆome Euzenat Data interlinking 9 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Approaches to data interlinking
There are two main approaches to data interlinking:
similarity-based: resources are compared through a similarity measure
and if they are similar enough, they are the same.
key-based: sufficient conditions for two resources to be the same are
induced and used to find same entities
J´erˆome Euzenat Data interlinking 10 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Classification of similarities
Data interlinking techniques may be based on:
Data ID (URIs);
Data keys
External relations: (explicit or implicit) links to other resources
Data description (content)
J´erˆome Euzenat Data interlinking 12 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Manual resource matching
URI1 URI2
Manual observation
owl:sameAs
This does not scale.
But may be good for a first sample or reference.
Crowdsourcing?
J´erˆome Euzenat Data interlinking 13 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
URI matching
URI1 URI2
URI transformation
owl:sameAs
http://dbpedia.org/resource/Johann Sebastian Bach owl:sameAs
http://www.lastfm.fr/music/Johann+Sebastian+Bach
http://rdf.insee.fr/geo/regions-2011.rdf#REG 11 ?
http://ec.europa.eu/eurostat/ramon/rdfdata/nuts2008/FR10
J´erˆome Euzenat Data interlinking 14 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Id matching
id id
Finding same ids
owl:sameAs
You can find such types of ids:
Social security numbers
ISBN, DOI, MAC addresses, etc.
authorities: ISO (countries, languages), IATA (airports)
Most databases are built on such identifiers. . . but they are often local to the
database.
J´erˆome Euzenat Data interlinking 15 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Context-based similarity
URI1 URI2
VIAF
Context-based
“similarity”
owl:sameAs
Process:
Project your data into another resource (DBPedia, geonames, viaf, etc.)
Assess relations between considered terms
Import the relation in the dataset
J´erˆome Euzenat Data interlinking 16 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Content-based similarity
3
The gold bug
title
creator
E. Poe
firstname lastname
Writer
Work
rdf:type
rdf:type
b a1 a2
Baudelaire Poe
Le corbeauLe scarab´e d’or
orig
name name
title
authortranslator
Person
Book
rdf:type
rdf:type
Compute similarity
owl:sameAs
Two main approaches:
bag of text
structured similarity
J´erˆome Euzenat Data interlinking 17 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Term-based similarity
The gold bug
E. Poe
firstname lastname
Writer
Work
type
type Baudelaire Poe
Le corbeau
Le scarab´e d’or
orig
name name
title
authortranslator
Person
Book
type type
Compute “bag of words” similarity
owl:sameAs
Various tools:
Normalisation (Stemmer, Tokenizers)
Use of linguistic resources (Wordnet)
Translation
Many similarity measures, especially from information retrieval
J´erˆome Euzenat Data interlinking 18 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Structure similarity
title
creator
firstname lastname
type
type orig
name name
title
authortranslator
type
type
Compute structure similarity
owl:sameAs
Techniques:
Based on graph matching techniques
Can be used to learn weights on properties (but need matching)
Problem: scalability
J´erˆome Euzenat Data interlinking 19 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Cross-lingual RDF data interlinking
http://a.org/Mus999 France
Mus´ee du Louvre
nom
lieu
Paris
99,rue de Rivoli
75001
adresse
ville
rue
zip
http://bb.cn/盧浮宮
盧浮宮
法國巴黎
稱號
位於
owl:sameAs ?
J´erˆome Euzenat Data interlinking 20 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Similarity-based data interlinking
RESOURCE RESOURCE
SIMILARITY
owl:sameAs ?
Hypothesis: ↑ similarity ↑ probability that it is the same object
DOCUMENT DOCUMENTSIMILARITY
owl:sameAs ?
Yuzhong Qu, Wei Hu, Gong Cheng: Constructing virtual documents for ontology matching. WWW 2006: 23-31.
DOCUMENT(zh) DOCUMENT(en)
DOCUMENT(en)
translation
DOCUMENT(zh)
translationSIMSIM
SIMILARITY
owl:sameAs ?
BabelNet(IDs) BabelNet(IDs)SIMILARITY
owl:sameAs ?
J´erˆome Euzenat Data interlinking 21 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
General cross-lingual interlinking
framework
1 Virtual
Documents
3 Similarity
Computation
4 Link
Generation
2 Language
Normalization
J´erˆome Euzenat Data interlinking 22 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Building virtual documents by levels
http://dbpedia.org/resource/Charles Perrault
Charles Perrault
dbpedia:France
Level 1
France is a sovereign
country in Western Eu-
rope that includes over-
seas regions and territo-
ries. . .
Level 2
J´erˆome Euzenat Data interlinking 23 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Machine translation: parameters
1 Virtual
Documents
2.1 Machine
Translation
2.2 NLP
Preprocessing
3 Similarity
Computation
4 Link
Generation
Level 1
Level 2
ZH→EN
Lowercase+Tokenize
+ Filter stop words
+ Stemming (Porter)
+ Bigrams (terms)
TF+cosine
TF*IDF+cosine
Greedy
Hungarian
32 settings have been explored in total
J´erˆome Euzenat Data interlinking 24 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Lcase+Tokenization with TF*IDF at
Level 1
0 - 0.11
0.11 - 0.15
0.15 - 0.25
0.25 - 0.35
0.35 - 0.45
0.45 - 1
J´erˆome Euzenat Data interlinking 25 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Adding noise
J´erˆome Euzenat Data interlinking 26 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
BabelNet method: parameters
1 Virtual
Documents
2 Multilingual
KB Mapping
3 Similarity
Computation
4 Link
Generation
Level 1
Level 2
TF+cosine
TF*IDF+cosine
Greedy
Hungarian
J´erˆome Euzenat Data interlinking 27 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Database keys
A set of attributes which uniquely identifies elements of a relation
e.g., Book: isbn, People: fistname, lastname, birthplace, birthdate
usually given and used to check integrity
They may be used for identifying same entities across two databases.
But they require alignments.
J´erˆome Euzenat Data interlinking 29 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Example of interlinking with keys and
alignments
Are the resources bnf:cb118949856 and bne:XX1721208 the same?
if BNF ontology states foaf:Person owl:hasKey {foaf:name, dc:dates}
and we have the following alignment
foaf:Person
bnf:cb118949856
Albert Camus
07-11-1913
04-01-1960
Romancier, dramaturge et essayiste
http://id.loc.gov/vocabulary/countries/fr
Mondovi (Alg´erie)
1913-1960
foaf:name
rda:dateOfBirth
rda:dateOfDeath
rda:biographicalInformation
rda:countryAssociatedWithThePerson
rda:placeOfBirth
dc:dates
frbrer:C1005
bne:XX1721208
Camus, Albert
1913-1960
Aut [...]1980
frber:P3039
frber:P3040
rda:sourceConsulted
≡
≡
≈
≈
owl:sameAs
J´erˆome Euzenat Data interlinking 30 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Key-based interlinking methods
Database keys allow for identifying entities: if they are aligned, this can be
used for linking.
Advantages
they are logically grounded
they allow to minimize the number of properties to compare (if we use
minimal keys)
Drawbacks
Require alignment between properties and classes
Very few key axioms are available, and they are not necessarily useful for
interlinking
We overcome these drawbacks by introducing link keys
J´erˆome Euzenat Data interlinking 31 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Link key
A link key
{ p1, q1 , . . . , pn, qn }{ p1, q1 , . . . , pm, qm } linkkey c, d
holds iff
For all pairs of instances a and b belonging respectively to classes c and d of
ontologies O and O ,
if a and b share at least one value (object) for each pairs of
properties pi and qi respectively,
and a and b share all their values (objects) for each pairs of
properties pi and qi respectively,
then they are the same ( a, owl:sameAs, b ).
Example:
{ foaf:name, frbr:P3039 }{ dc:dates, frbr:P3040 } linkkey foaf:Person, frbr:C1005
J´erˆome Euzenat Data interlinking 32 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Link key extraction
Problem: How to induce such link keys from data?
The number of set of pairs of properties is exponential
Our approach:
discover only candidate link keys.
evaluate them in order to select only the “good” ones
J´erˆome Euzenat Data interlinking 33 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Candidate link key
A candidate link key is a set of property pairs { p1, q1 , . . . , pk, qk } that
1. would generate at least one link if used as a link key
2. is maximal for at least one link, or is the intersection of several
candidate link keys
J´erˆome Euzenat Data interlinking 34 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Supervised selection measures
If a sample of reference links is available:
Positive examples (L+) : a set of owl:sameAs links
Negative examples (L−) : a set of owl:differentFrom links
Idea: Approximate precision and recall on that sample
Definition (Relative precision and recall)
precision(K, L+
, L−
) =
|L+ ∩ LD,D (K)|
|(L+ ∪ L−) ∩ LD,D (K)|
recall(K, L+
) =
|L+ ∩ LD,D (K)|
|L+|
J´erˆome Euzenat Data interlinking 35 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Unsupervised selection measures
When no reference link is available.
Idea: measuring how close the extracted links would be from
one-to-one and total.
Definition (Discriminability)
disc(K, D, D ) =
min(|{a : a, b ∈ LD,D (K)}|, |{b : a, b ∈ LD,D (K)}|)
|LD,D (K)|
Definition (Coverage)
cov(K, D, D ) =
|{a : a, b ∈ LD,D (K)} ∪ {b : a, b ∈ LD,D (K)}|
|{a : c(a) ∈ D} ∪ {b : d(b) ∈ D }|
J´erˆome Euzenat Data interlinking 36 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Experimental evaluation
These selection measures were evaluated on public datasets.
Finding links between French municipalities described in two different
datasets:
Insee dataset: 36700 instances;
Geonames dataset: 36552 instances.
The reference link set is composed of:
Positive links: 36552 owl:sameAs statements;
owl:differentFrom links derived from owl:sameAs links (closed world
assumption).
J´erˆome Euzenat Data interlinking 37 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Evaluation
The algorithm extracted 11 candidate link keys:
{1} {2} {3, 4} {5, 6}
{7, 1} {2, 1} {3, 4, 1} {3, 2, 4}
{3, 7, 4, 1} {3, 2, 4, 1}
{3, 7, 2, 4, 1}
coverage
discriminability
1 = nom, name 2 = nom, alternateName
3 = subdivisionDe, parentFeature 4 = subdivisionDe, parentADM3
5 = codeINSEE, population 6 = codeCommune, population
7 = nom, officialName
J´erˆome Euzenat Data interlinking 38 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Evaluation
Correlation between the harmonic means of discriminability and coverage and
F-measure:
bad F-measure≈ 0
high F-measure≈ .99
good F-measure≈ 0.89
{1} {2} {3, 4} {5, 6}
{7, 1} {2, 1} {3, 4, 1} {3, 2, 4}
{3, 7, 4, 1} {3, 2, 4, 1}
{3, 7, 2, 4, 1}
h-mean(disc.,cov)≈ .99 h-mean(disc.,cov)≈ .89 h-mean(disc.,cov) ≈ 0
1 = nom, name 2 = nom, alternateName
3 = subdivisionDe, parentFeature 4 = subdivisionDe, parentADM3
5 = codeINSEE, population 6 = codeCommune, population
7 = nom, officialName
J´erˆome Euzenat Data interlinking 38 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Why using ontologies?
Because it is obvious that we must compare the instances of equivalent
classes based on equivalent properties.
More precisely:
For reducing the search space for finding link keys and similarities
For reducing the scope of linkage specifications
Because not the same linkage rules work for the same classes
Because classes and properties are hint like others of the similarity
between resources
Ex. With similarity and with keys
J´erˆome Euzenat Data interlinking 40 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Data interlinking through a common
ontology
o
URI1 URI2
Resource matching
of datasets
described by the
same ontology
owl:sameAs
J´erˆome Euzenat Data interlinking 41 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Matching with a common ontology
+ Focus the search: only match instances of the same class;
– Not sufficient: it remains to identify corresponding entities
+ If keys are defined (OWL 2), this is done;
+ At least we know which properties to compare;
– Inferring secondary keys may be useful;
– Correcting discrepancies: record linkage.
J´erˆome Euzenat Data interlinking 42 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Record linkage
Name Johann
Date 1665-03-21
Place M¨unchen
NameJohannes
Date31/03/1665
PlaceMonaco di Bavaria
Having a common ontology does not solve all problems.
J´erˆome Euzenat Data interlinking 43 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Different types of mismatch
Different domains, connected (BIM, Energy demand)
⇒ few correspondences, any type
Same domain, different models (engineer, policy maker)
⇒ many correspondences, mostly equivalence
Same domain, different granularity (city management, building design)
⇒ many correspondences, mostly subsumption
J´erˆome Euzenat Data interlinking 44 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Data interlinking with different
ontologies (implicit alignment)
o o
URI1 URI2
Resource matching
of datasets
described by
different ontologies
owl:sameAs
J´erˆome Euzenat Data interlinking 45 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Data interlinking with different
ontologies (explicit alignment)
o o
URI1 URI2
A
Resource matching
of datasets
described by
different ontologies
owl:sameAs
J´erˆome Euzenat Data interlinking 46 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Ontology matching for data interlinking
o o
URI1 URI2
Ontology matching
A
Data interlinking
owl:sameAs
J´erˆome Euzenat Data interlinking 47 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Heterogeneity problem
Resources being expressed in different ways must be reconciled before being
used.
Mismatch between formalized knowledge can occur when:
different languages are used (OWL vs. Topic maps);
different terminologies are used:
English vs. Chinese;
Book vs. Monograph.
different models are used:
different classes: Autobiography vs. Paperback;
classes vs. property: Essay vs. literarygenre;
classes vs. instances: One physical book as an instance vs. one work as
an instance.
different scopes and granularity are used.
Only books vs. cultural items vs. any product;
Books detailed to the print and translation level vs. books as works.
J´erˆome Euzenat Data interlinking 48 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Ontology alignment
Item
DVD
Book
Paperback
Hardcover
CD
price
title
doi
creator
pp
author
integer
string
uri
Person
Monograph
Essay
Literary critics
Politics
Biography
Autobiography
Literature
pages
isbn
author
title
subject
Human
Writer
≥
≥
≥
≤
≥
J´erˆome Euzenat Data interlinking 49 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Expressive alignments (EDOAL)
Pocket
Book
topic
author
=
Volume
size14
≥
Autobiography
=
∀x, Pocket(x) ⇐ Volume(x) ∧ size(x, y) ∧ y ≤ 14
∀x, Book(x) ∧ author(x, y) ∧ topic(x, y) ≡ Autobiography(x)
J´erˆome Euzenat Data interlinking 50 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Example: INSEE dataset
R´egion table:
code nom chef-lieu
11 ˆIle-de-France 75056
21 Champagne-Ardenne 51108
22 Picardie 80021
Sous-r´egion table:
r´egion d´epartement
11 75
11 77
11 78
11 91
11 92
11 93
J´erˆome Euzenat Data interlinking 51 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Example: Administrative ontology
Territoire FR
Pays
Region
Departement
Arrondissement
Commune
code
nom
chef-lieu
subdivision
integer
string
J´erˆome Euzenat Data interlinking 52 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Example: NUTS dataset
NUTSRegion table:
level code name hasParentRegion
0 FR FRANCE
1 FR1 ˆILE DE FRANCE FR
2 FR10 ˆIle de France FR1
3 FR101 Paris FR10
3 FR104 Essonne FR10
J´erˆome Euzenat Data interlinking 53 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Example: Linking INSEE and NUTS
NUTS: Nomenclature of territorial units for statistics
#INSEE INSEE name NUTS Level #NUTS
1 Pays 0 34
1 142
26 R´egion 2 344
100 D´epartement 3 1488
342 Arrondissement
4036 Canton 4
52422 Commune 5
J´erˆome Euzenat Data interlinking 54 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Example: Linking INSEE and NUTS
Territoire FR
Pays
Region
Departement
Commune
PAYS FR
REG 11
DEP 75
DEP 77
DEP 78
COM 75056
Region
Country
NUTSRegion
LAURegion
FR
UK
FR1
FR10
FR101
FR102
FR103
owl:sameAs
owl:sameAs
owl:sameAs
owl:sameAs
owl:sameAs
J´erˆome Euzenat Data interlinking 55 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Example: Linksets
Specific data sets containing URIs.
<http://www.example.org/linkset/INSEE-NUTS>
a void:Linkset ;
void:target <http://rdf.insee.fr/geo/regions-2011.rdf>;
void:target <http://nuts.psi.enakting.org/id/>;
insee:PAYS FR owl:sameAs nuts:FR
insee:REG 11 owl:sameAs nuts:FR10
insee:DEP 75 owl:sameAs nuts:FR101
insee:DEP 77 owl:sameAs nuts:FR102
insee:DEP 78 owl:sameAs nuts:FR103
J´erˆome Euzenat Data interlinking 56 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Example: interesting sets
nuts
onsordnance s. igninsee
geonames dbpedia freebase
J´erˆome Euzenat Data interlinking 57 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
A simple algorithm
Find matching concepts [concept matching];
For each of them, determine matching properties based on the similarity
between their values in both datasets [property matching];
From them find property combinations identifying corresponding entities
[key extraction];
Link corresponding entities [link generation].
For instance, nom/RegionINSEE ⊆ name/NUTSRegionNUTS and moreover
they are unambiguous.
J´erˆome Euzenat Data interlinking 58 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
INSEE and NUTS: ontology alignment
Territoire FR
Pays
Region
Departement
Arrondissement
Canton
Commune
code
nom
chef-lieu
subdivision
integer
string
Region
Country
NUTSRegion
LAURegion
name
level
code
hasSubRegion
=
≤
≤
≤
≤
≤
=
J´erˆome Euzenat Data interlinking 59 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Simple alignments are not sufficient
Territoire FR
Region
Departement
Commune
nom
DEP 75
nom
COM 75056
nom
Region
NUTSRegion
name
FR101
name
Paris
=
=
=
≤
≤
≤
=
=
=
J´erˆome Euzenat Data interlinking 60 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Expressive alignments are necessary
Region
NUTSRegion
level
hasParentRegion
2 =
FR
=
=
subdivision hasSubRegion
=
nom name
=
J´erˆome Euzenat Data interlinking 61 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
What does this mean?
Ontology alignments are schema-level expression of correspondences;
They are useful for focussing the search;
Expressive alignments are necessary;
They can be turned into SPARQL-based link generators.
but it is also necessary to express instance level constraints:
for converting data (e.g., mph vs. m/s);
for expressing matching constraint on data (e.g., similarity).
J´erˆome Euzenat Data interlinking 62 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Data interlinking and ontology matching
d
o
d
oMatcher
A
Generator
l
J´erˆome Euzenat Data interlinking 63 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Tools for data interlinking
Linkage spec extraction generation
similarity LIMES Silk, LIMES, OpenRefine
key LinkKeyDisco SPARQL
J´erˆome Euzenat Data interlinking 65 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Silk
Silk is a robust software for interlinking data sets.
It relies on an expressive specification of linking conditions:
Declare data sources (DataSource);
Circumscribe entities to compare (Source/TargetDataset);
Describe how to compare them (LinkageRule):
Select properties to compare through paths (Input);
Compute distances between them (Compare+threshold);
Aggregate all comparisons (Aggregate);
Select those pairs of entities to be linked (Filter);
Generate links (Output+thresholds).
J´erˆome Euzenat Data interlinking 66 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
A Silk script
Consider a linking script between INSEE and NUTS:
<Silk>
<Prefix id="nuts"
namespace="http://ec.europa.eu/.../geographic.rdf#" />
<Prefix id="insee"
namespace="http://rdf.insee.fr/geo/" />
<DataSource id="nuts2008"
type="sparqlEndpoint">
<Param name="endpointURI"
value="http://localhost:9091/.../internal"/>
<Param name="graph"
value="http://localhost:9091/.../nuts2008-complete-1"/>
</DataSource>
<DataSource id="insee2010"
type="sparqlEndpoint">
<Param name="endpointURI"
value="http://localhost:9091/.../internal"/>
<Param name="graph"
value="http://localhost:9091/.../source/regions-2010-1"/>
</DataSource>
<Thresholds accept="0.9" verify="0.7" />
<Outputs>
<Output type="sparul">
<Param name="graphUri"
value="http://localhost:9091/.../source/insee-nuts-silk"/>
<Param name="uri"
value="http://localhost:9091/.../lifted/"/>
<Param name="parameter" value="update"/>
</Output>
</Outputs>
<Interlinks>
<Interlink id="linkingNUTS">
<LinkType>owl:sameAs</LinkType>
<SourceDataset dataSource="nuts2008" var="s">
<RestrictTo>?s rdf:type nuts:NUTSRegion.
?s nuts:level 2.
</RestrictTo>
</SourceDataset>
<TargetDataset dataSource="insee2010" var="ss">
<RestrictTo>?ss rdf:type insee:Region</RestrictTo
</TargetDataset>
<LinkageRule>
<Aggregate type="max">
<Compare metric="levenshteinDistance"
threshold=".2">
<Input path="?s/nuts:name"/>
<Input path="?ss/insee:nom"/>
</Compare>
</Aggregate>
</LinkageRule>
</Interlinks>
</Interlink>
</Silk>
J´erˆome Euzenat Data interlinking 67 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Silk: prefix and sources
<Silk>
<Prefix id="nuts" namespace="http://ec.europa.eu/.../geographic.rdf#" />
<Prefix id="insee" namespace="http://rdf.insee.fr/geo/" />
<DataSource id="nuts2008" type="sparqlEndpoint">
<Param name="endpointURI" value="http://localhost:9091/.../internal"/>
<Param name="graph" value="http://localhost:9091/.../nuts2008-complete-1"/
</DataSource>
<DataSource id="id1" type="file">
<Param name="file" value="/Skratch/TutoLinking/admin/regions-2010.rdf"/>
<Param name="format" value="RDF/XML" />
</DataSource>
Sources can be files or SPARQL endpoint.
J´erˆome Euzenat Data interlinking 68 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Silk rules
<Interlinks>
<Interlink id="linkingNUTS">
<LinkType>owl:sameAs</LinkType>
<SourceDataset dataSource="nuts2008" var="s">
<RestrictTo>?s rdf:type nuts:NUTSRegion.
?s nuts:level 2.
</RestrictTo>
</SourceDataset>
<TargetDataset dataSource="insee2010" var="ss">
<RestrictTo>?ss rdf:type insee:Region</RestrictTo>
</TargetDataset>
<Thresholds accept="0.9" verify="0.7" />
<Outputs>
<Output type="sparul">
<Param name="graphUri" value="http://localhost:9091/.../source/insee-nut
<Param name="uri" value="http://localhost:9091/.../lifted/"/>
<Param name="parameter" value="update"/>
</Output>
</Outputs>
J´erˆome Euzenat Data interlinking 69 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Silk rules (cont’ed)
<LinkageRule>
<Aggregate type="max">
<Compare metric="levenshteinDistance" threshold=".2">
<Input path="?s/nuts:name"/>
<Input path="?ss/insee:nom"/>
</Compare>
</Aggregate>
</LinkageRule>
</Interlinks>
</Interlink>
</Silk>
They can:
transform the data (lowercase, tokenize, to integers, etc.),
use comparison metrics (equality, levenshtein, Jaro-Winkler, etc.), and
aggregate their values (average, min, max, etc.).
J´erˆome Euzenat Data interlinking 70 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Silk workbench
J´erˆome Euzenat Data interlinking 71 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
EDOAL Alignments
<Cell>
<entity1><e:Class rdf:about="&insee;Region"/></entity1>
<entity2>
<e:Class>
<e:and rdf:parseType="Collection">
<e:Class rdf:about="&nuts;NUTSRegion"/>
<e:AttributeValueRestriction>
<e:onAttribute><e:Property rdf:about="&nuts;level"/></e:onAttribute>
<e:comparator rdf:resource="&edoal;equals"/>
<e:value><e:Literal e:type="&xsd;integer" e:string="2" /></e:value>
</e:AttributeValueRestriction>
<e:AttributeValueRestriction>
<e:onAttribute>
<e:Relation rdf:about="&nuts;hasParentRegion" />
</e:onAttribute>
<e:comparator rdf:resource="&edoal;equals"/>
<e:value><e:Instance rdf:about="&esdata;FR" /></e:value>
</e:AttributeValueRestriction>
</e:and>
</e:Class>
</entity2>
<relation>equivalence</relation>J´erˆome Euzenat Data interlinking 72 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Link keys in the Alignment API
<e:linkkey>
<e:Linkkey>
<e:binding>
<e:Intersects>
<e:property1><e:Property rdf:about="&insee;nom" /></e:property1>
<e:property2><e:Property rdf:about="&nuts;name" /></e:property2>
</e:Intersects>
<e:Equals>
<e:property1>
<e:Property>
<e:inverse><e:Property rdf:about="&insee;subdivision" /></e:inverse>
</e:property1>
<e:property2><e:Property rdf:about="&nuts;hasParentRegion" /></e:propert
</e:Equals>
</e:binding>
</e:Linkkey>
</e:linkkey>
J´erˆome Euzenat Data interlinking 73 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Query generation
PREFIX insee: <http://rdf.insee.fr/ontologie-geo-2006.rdf#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?r
FROM <http://rdf.insee.fr/geo/regions-2011.rdf>
WHERE {
?r rdf:type insee:Region .
}
PREFIX nuts: <http://ec.europa.eu/eurostat/ramon/ontologies/geographi
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?n
FROM <http://ec.europa.eu/eurostat/ramon/rdfdata/nuts2008/>
WHERE {
?n rdf:type nuts:NUTSRegion .
?n nuts:level 2^^xsd:integer .
?n nuts:hasParentRegion nuts:FR .
}
J´erˆome Euzenat Data interlinking 74 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Data transformation
PREFIX insee: <http://rdf.insee.fr/ontologie-geo-2006.rdf#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX insee: <http://rdf.insee.fr/ontologie-geo-2006.rdf#>
CONSTRUCT {
?r rdf:type nuts:NUTSRegion .
?r nuts:level 2^^xsd:integer .
?r nuts:hasParentRegion nuts:FR .
}
FROM <http://rdf.insee.fr/geo/regions-2011.rdf>
WHERE {
?r rdf:type insee:Region .
}
J´erˆome Euzenat Data interlinking 75 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
SameAs link generation generation
PREFIX insee: <http://rdf.insee.fr/ontologie-geo-2006.rdf#>
PREFIX nuts: <http://ec.europa.eu/eurostat/ramon/ontologies/geographi
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
CONSTRUCT { ?r owl:sameAs ?n . }
FROM <http://rdf.insee.fr/geo/regions-2011.rdf>
FROM <http://ec.europa.eu/eurostat/ramon/rdfdata/nuts2008/>
WHERE {
?r rdf:type insee:Region .
?r insee:nom ?l .
?n rdf:type nuts:NUTSRegion .
?n nuts:name ?l .
?n nuts:level 2^^xsd:integer .
?n nuts:hasParentRegion nuts:FR .
}
J´erˆome Euzenat Data interlinking 76 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Other issue: performances
n × m n
3 × m
3 + n
3 × m
3 + n
3 × m
3
10 × 10 = 100
1000 × 1000 = 1000000
100000 × 100000 = 10000000000
J´erˆome Euzenat Data interlinking 77 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Other issue: performances
Blocking: index+cluster
Dataset 1 Dataset 2
J´erˆome Euzenat Data interlinking 78 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Other issue: performances
Blocks can be obtained from:
clustering values in index
predefined block (based on equality)
classes in an ontology (blocks are defined as class expressions)
J´erˆome Euzenat Data interlinking 79 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Other issue: evaluation
d
d
interlinking l
Reference links
evaluation
Precision
Recall
F-measure
J´erˆome Euzenat Data interlinking 80 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Other issue: learning
d
d
Training links interlinking l
evaluation
J´erˆome Euzenat Data interlinking 81 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Conclusion
Data interlinking is one of the most critical task in linked data
. . . but not only, e.g. smart cities
If faces many problems due to:
heterogeneity (format, languages, convention)
size
Interlinking can be based on similarities or keys
There is active work to infer such interlinking pattern
J´erˆome Euzenat Data interlinking 82 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Further reading
T. Heath, C. Bizer, Linked Data: Evolving the Web into a Global Data
Space, Morgan & Claypool (US), 2011 http://linkeddatabook.com/
J. Euzenat, P. Shvaiko, Ontology matching, 2nd ed., Springer,
Heildelberg (DE), 2013 http://book.ontologymatching.org
K. Stefanidis, V. Efthymiou, M. Herschel, V. Christophides, Entity
Resolution in the Web of Data, Tutorial, WWW conference, Seoul
(KR), 2014 http://www.csd.uoc.gr/~vefthym/er/
Silk http://silk-framework.com/
Alignment API http://alignapi.gforge.inria.fr
Al 4 SC http://al4sc.inrialpes.fr
J´erˆome Euzenat Data interlinking 83 / 0
Data interlinling
Similarity-based approach
Key-based interlinking
Ontology matching & data interlinking
Tools
Thanks
To my colleagues Manuel Atencia, J´erˆome David, Nicolas Guillouet and
Fran¸cois Scharffe
The Datalift and Lindicle projects
The Ready4SmartCities project
J´erˆome Euzenat Data interlinking 84 / 0
http://exmo.inria.fr
Jerome . Euzenat @ inria . fr

Contenu connexe

Tendances

Text data mining1
Text data mining1Text data mining1
Text data mining1KU Leuven
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibEl Habib NFAOUI
 
Translating Ontologies in Real-World Settings
Translating Ontologies in Real-World SettingsTranslating Ontologies in Real-World Settings
Translating Ontologies in Real-World SettingsMauro Dragoni
 
Text Data Mining
Text Data MiningText Data Mining
Text Data MiningKU Leuven
 
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...Iman Mirrezaei
 
Matching and merging anonymous terms from web sources
Matching and merging anonymous terms from web sourcesMatching and merging anonymous terms from web sources
Matching and merging anonymous terms from web sourcesIJwest
 
Searching Linked Data
Searching Linked DataSearching Linked Data
Searching Linked DataThanh Tran
 
Annotating Digital Texts in the Brown University Library
Annotating Digital Texts in the Brown University LibraryAnnotating Digital Texts in the Brown University Library
Annotating Digital Texts in the Brown University LibraryTimothy Cole
 
A Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia ArticlesA Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia Articlesijma
 
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESFINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESijnlc
 
A Linked Open Data Service for Performing Arts
A Linked Open Data Service for Performing ArtsA Linked Open Data Service for Performing Arts
A Linked Open Data Service for Performing ArtsPaolo Nesi
 
Web app development_html_01
Web app development_html_01Web app development_html_01
Web app development_html_01Hassen Poreya
 
Annotations chicago
Annotations chicagoAnnotations chicago
Annotations chicagoTimothy Cole
 
XML Retrieval - A Slot Filling Approach
XML Retrieval - A Slot Filling ApproachXML Retrieval - A Slot Filling Approach
XML Retrieval - A Slot Filling Approach鍾誠 陳鍾誠
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibEl Habib NFAOUI
 
3. introduction to text mining
3. introduction to text mining3. introduction to text mining
3. introduction to text miningLokesh Ramaswamy
 
Web classification of Digital Libraries using GATE Machine Learning  
Web classification of Digital Libraries using GATE Machine Learning  	Web classification of Digital Libraries using GATE Machine Learning  
Web classification of Digital Libraries using GATE Machine Learning   sstose
 

Tendances (20)

Text data mining1
Text data mining1Text data mining1
Text data mining1
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
 
Text Mining Analytics 101
Text Mining Analytics 101Text Mining Analytics 101
Text Mining Analytics 101
 
Translating Ontologies in Real-World Settings
Translating Ontologies in Real-World SettingsTranslating Ontologies in Real-World Settings
Translating Ontologies in Real-World Settings
 
Using r
Using rUsing r
Using r
 
Text Data Mining
Text Data MiningText Data Mining
Text Data Mining
 
Week12
Week12Week12
Week12
 
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
 
Matching and merging anonymous terms from web sources
Matching and merging anonymous terms from web sourcesMatching and merging anonymous terms from web sources
Matching and merging anonymous terms from web sources
 
Searching Linked Data
Searching Linked DataSearching Linked Data
Searching Linked Data
 
Annotating Digital Texts in the Brown University Library
Annotating Digital Texts in the Brown University LibraryAnnotating Digital Texts in the Brown University Library
Annotating Digital Texts in the Brown University Library
 
A Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia ArticlesA Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia Articles
 
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESFINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
 
A Linked Open Data Service for Performing Arts
A Linked Open Data Service for Performing ArtsA Linked Open Data Service for Performing Arts
A Linked Open Data Service for Performing Arts
 
Web app development_html_01
Web app development_html_01Web app development_html_01
Web app development_html_01
 
Annotations chicago
Annotations chicagoAnnotations chicago
Annotations chicago
 
XML Retrieval - A Slot Filling Approach
XML Retrieval - A Slot Filling ApproachXML Retrieval - A Slot Filling Approach
XML Retrieval - A Slot Filling Approach
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_Habib
 
3. introduction to text mining
3. introduction to text mining3. introduction to text mining
3. introduction to text mining
 
Web classification of Digital Libraries using GATE Machine Learning  
Web classification of Digital Libraries using GATE Machine Learning  	Web classification of Digital Libraries using GATE Machine Learning  
Web classification of Digital Libraries using GATE Machine Learning  
 

En vedette

BabelNet Workshop 2016 - Making sense of building data and building product data
BabelNet Workshop 2016 - Making sense of building data and building product dataBabelNet Workshop 2016 - Making sense of building data and building product data
BabelNet Workshop 2016 - Making sense of building data and building product dataPieter Pauwels
 
CIB W78 2015 - Keynote "The Web of Construction Data:Pathways and Opportunities"
CIB W78 2015 - Keynote "The Web of Construction Data:Pathways and Opportunities"CIB W78 2015 - Keynote "The Web of Construction Data:Pathways and Opportunities"
CIB W78 2015 - Keynote "The Web of Construction Data:Pathways and Opportunities"Pieter Pauwels
 
Publish and use your data
Publish and use your dataPublish and use your data
Publish and use your dataLD4SC
 
SustainablePlaces_ifcOWL_applications_2015-09-17
SustainablePlaces_ifcOWL_applications_2015-09-17SustainablePlaces_ifcOWL_applications_2015-09-17
SustainablePlaces_ifcOWL_applications_2015-09-17Pieter Pauwels
 
ECPPM2016 - ifcOWL for Managing Product Data
ECPPM2016 - ifcOWL for Managing Product DataECPPM2016 - ifcOWL for Managing Product Data
ECPPM2016 - ifcOWL for Managing Product DataPieter Pauwels
 
UGent Research Projects on Linked Data in Architecture and Construction
UGent Research Projects on Linked Data in Architecture and ConstructionUGent Research Projects on Linked Data in Architecture and Construction
UGent Research Projects on Linked Data in Architecture and ConstructionPieter Pauwels
 
LDAC 2015 - Towards an industry-wide ifcOWL: choices and issues
LDAC 2015 - Towards an industry-wide ifcOWL: choices and issuesLDAC 2015 - Towards an industry-wide ifcOWL: choices and issues
LDAC 2015 - Towards an industry-wide ifcOWL: choices and issuesPieter Pauwels
 
BuildingSMART Standards Summit 2015 - JBeetz - Product Room - Use Cases for i...
BuildingSMART Standards Summit 2015 - JBeetz - Product Room - Use Cases for i...BuildingSMART Standards Summit 2015 - JBeetz - Product Room - Use Cases for i...
BuildingSMART Standards Summit 2015 - JBeetz - Product Room - Use Cases for i...Pieter Pauwels
 
Semantics for Smarter Cities
Semantics for Smarter CitiesSemantics for Smarter Cities
Semantics for Smarter CitiesLD4SC
 
ECPPM2016 - SimpleBIM: from full ifcOWL graphs to simplified building graphs
ECPPM2016 - SimpleBIM: from full ifcOWL graphs to simplified building graphsECPPM2016 - SimpleBIM: from full ifcOWL graphs to simplified building graphs
ECPPM2016 - SimpleBIM: from full ifcOWL graphs to simplified building graphsPieter Pauwels
 
ACM SIGMOD SBD2016 - Querying and reasoning over large scale building dataset...
ACM SIGMOD SBD2016 - Querying and reasoning over large scale building dataset...ACM SIGMOD SBD2016 - Querying and reasoning over large scale building dataset...
ACM SIGMOD SBD2016 - Querying and reasoning over large scale building dataset...Pieter Pauwels
 
2_presFriday_ontologydevelopment
2_presFriday_ontologydevelopment2_presFriday_ontologydevelopment
2_presFriday_ontologydevelopmentPieter Pauwels
 
LDAC Workshop 2016 - Linked Building Data Community Efforts
LDAC Workshop 2016 - Linked Building Data Community EffortsLDAC Workshop 2016 - Linked Building Data Community Efforts
LDAC Workshop 2016 - Linked Building Data Community EffortsPieter Pauwels
 
The SWIMing project
The SWIMing projectThe SWIMing project
The SWIMing projectLD4SC
 
TPAC2016 - From Linked Building Data to Building Data on the Web
TPAC2016 - From Linked Building Data to Building Data on the WebTPAC2016 - From Linked Building Data to Building Data on the Web
TPAC2016 - From Linked Building Data to Building Data on the WebPieter Pauwels
 
ECPPM2016 - SemCat: Publishing and Accessing Building Product Information as ...
ECPPM2016 - SemCat: Publishing and Accessing Building Product Information as ...ECPPM2016 - SemCat: Publishing and Accessing Building Product Information as ...
ECPPM2016 - SemCat: Publishing and Accessing Building Product Information as ...Pieter Pauwels
 
CIB W78 2015 - Semantic Rule-checking for Regulation Compliance Checking
CIB W78 2015 - Semantic Rule-checking for Regulation Compliance CheckingCIB W78 2015 - Semantic Rule-checking for Regulation Compliance Checking
CIB W78 2015 - Semantic Rule-checking for Regulation Compliance CheckingPieter Pauwels
 
LDAC 2015 - Selection of IFC subsets using ifcOWL and rewrite rules
LDAC 2015 - Selection of IFC subsets using ifcOWL and rewrite rulesLDAC 2015 - Selection of IFC subsets using ifcOWL and rewrite rules
LDAC 2015 - Selection of IFC subsets using ifcOWL and rewrite rulesPieter Pauwels
 
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF tools
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF toolsCIB W78 Accelerating BIM Workshop 2015 - IFC2RDF tools
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF toolsPieter Pauwels
 
BIMMeeting 2016 - BIM-Infra-GIS: building bridges from single buildings to di...
BIMMeeting 2016 - BIM-Infra-GIS: building bridges from single buildings to di...BIMMeeting 2016 - BIM-Infra-GIS: building bridges from single buildings to di...
BIMMeeting 2016 - BIM-Infra-GIS: building bridges from single buildings to di...Pieter Pauwels
 

En vedette (20)

BabelNet Workshop 2016 - Making sense of building data and building product data
BabelNet Workshop 2016 - Making sense of building data and building product dataBabelNet Workshop 2016 - Making sense of building data and building product data
BabelNet Workshop 2016 - Making sense of building data and building product data
 
CIB W78 2015 - Keynote "The Web of Construction Data:Pathways and Opportunities"
CIB W78 2015 - Keynote "The Web of Construction Data:Pathways and Opportunities"CIB W78 2015 - Keynote "The Web of Construction Data:Pathways and Opportunities"
CIB W78 2015 - Keynote "The Web of Construction Data:Pathways and Opportunities"
 
Publish and use your data
Publish and use your dataPublish and use your data
Publish and use your data
 
SustainablePlaces_ifcOWL_applications_2015-09-17
SustainablePlaces_ifcOWL_applications_2015-09-17SustainablePlaces_ifcOWL_applications_2015-09-17
SustainablePlaces_ifcOWL_applications_2015-09-17
 
ECPPM2016 - ifcOWL for Managing Product Data
ECPPM2016 - ifcOWL for Managing Product DataECPPM2016 - ifcOWL for Managing Product Data
ECPPM2016 - ifcOWL for Managing Product Data
 
UGent Research Projects on Linked Data in Architecture and Construction
UGent Research Projects on Linked Data in Architecture and ConstructionUGent Research Projects on Linked Data in Architecture and Construction
UGent Research Projects on Linked Data in Architecture and Construction
 
LDAC 2015 - Towards an industry-wide ifcOWL: choices and issues
LDAC 2015 - Towards an industry-wide ifcOWL: choices and issuesLDAC 2015 - Towards an industry-wide ifcOWL: choices and issues
LDAC 2015 - Towards an industry-wide ifcOWL: choices and issues
 
BuildingSMART Standards Summit 2015 - JBeetz - Product Room - Use Cases for i...
BuildingSMART Standards Summit 2015 - JBeetz - Product Room - Use Cases for i...BuildingSMART Standards Summit 2015 - JBeetz - Product Room - Use Cases for i...
BuildingSMART Standards Summit 2015 - JBeetz - Product Room - Use Cases for i...
 
Semantics for Smarter Cities
Semantics for Smarter CitiesSemantics for Smarter Cities
Semantics for Smarter Cities
 
ECPPM2016 - SimpleBIM: from full ifcOWL graphs to simplified building graphs
ECPPM2016 - SimpleBIM: from full ifcOWL graphs to simplified building graphsECPPM2016 - SimpleBIM: from full ifcOWL graphs to simplified building graphs
ECPPM2016 - SimpleBIM: from full ifcOWL graphs to simplified building graphs
 
ACM SIGMOD SBD2016 - Querying and reasoning over large scale building dataset...
ACM SIGMOD SBD2016 - Querying and reasoning over large scale building dataset...ACM SIGMOD SBD2016 - Querying and reasoning over large scale building dataset...
ACM SIGMOD SBD2016 - Querying and reasoning over large scale building dataset...
 
2_presFriday_ontologydevelopment
2_presFriday_ontologydevelopment2_presFriday_ontologydevelopment
2_presFriday_ontologydevelopment
 
LDAC Workshop 2016 - Linked Building Data Community Efforts
LDAC Workshop 2016 - Linked Building Data Community EffortsLDAC Workshop 2016 - Linked Building Data Community Efforts
LDAC Workshop 2016 - Linked Building Data Community Efforts
 
The SWIMing project
The SWIMing projectThe SWIMing project
The SWIMing project
 
TPAC2016 - From Linked Building Data to Building Data on the Web
TPAC2016 - From Linked Building Data to Building Data on the WebTPAC2016 - From Linked Building Data to Building Data on the Web
TPAC2016 - From Linked Building Data to Building Data on the Web
 
ECPPM2016 - SemCat: Publishing and Accessing Building Product Information as ...
ECPPM2016 - SemCat: Publishing and Accessing Building Product Information as ...ECPPM2016 - SemCat: Publishing and Accessing Building Product Information as ...
ECPPM2016 - SemCat: Publishing and Accessing Building Product Information as ...
 
CIB W78 2015 - Semantic Rule-checking for Regulation Compliance Checking
CIB W78 2015 - Semantic Rule-checking for Regulation Compliance CheckingCIB W78 2015 - Semantic Rule-checking for Regulation Compliance Checking
CIB W78 2015 - Semantic Rule-checking for Regulation Compliance Checking
 
LDAC 2015 - Selection of IFC subsets using ifcOWL and rewrite rules
LDAC 2015 - Selection of IFC subsets using ifcOWL and rewrite rulesLDAC 2015 - Selection of IFC subsets using ifcOWL and rewrite rules
LDAC 2015 - Selection of IFC subsets using ifcOWL and rewrite rules
 
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF tools
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF toolsCIB W78 Accelerating BIM Workshop 2015 - IFC2RDF tools
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF tools
 
BIMMeeting 2016 - BIM-Infra-GIS: building bridges from single buildings to di...
BIMMeeting 2016 - BIM-Infra-GIS: building bridges from single buildings to di...BIMMeeting 2016 - BIM-Infra-GIS: building bridges from single buildings to di...
BIMMeeting 2016 - BIM-Infra-GIS: building bridges from single buildings to di...
 

Similaire à Data Interlinking

Chapter 10 Data Mining Techniques
 Chapter 10 Data Mining Techniques Chapter 10 Data Mining Techniques
Chapter 10 Data Mining TechniquesHouw Liong The
 
Copy of 10text (2)
Copy of 10text (2)Copy of 10text (2)
Copy of 10text (2)Uma Se
 
EDF2012 Mariana Damova - Factforge
EDF2012   Mariana Damova - FactforgeEDF2012   Mariana Damova - Factforge
EDF2012 Mariana Damova - FactforgeEuropean Data Forum
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language ProcessingMichel Bruley
 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAijistjournal
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations onijistjournal
 
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Julien PLU
 
Information Retrieval
Information Retrieval Information Retrieval
Information Retrieval ShujaatZaheer3
 
Linking data without common identifiers
Linking data without common identifiersLinking data without common identifiers
Linking data without common identifiersLars Marius Garshol
 
turban_ch07ch07ch07ch07ch07ch07dss9e_ch07.ppt
turban_ch07ch07ch07ch07ch07ch07dss9e_ch07.pptturban_ch07ch07ch07ch07ch07ch07dss9e_ch07.ppt
turban_ch07ch07ch07ch07ch07ch07dss9e_ch07.pptDEEPAK948083
 
Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking  Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking Mohamed BEN ELLEFI
 
#opentourism - Linked Open Data Publishing and Discovery Workshop
#opentourism - Linked Open Data Publishing and Discovery Workshop#opentourism - Linked Open Data Publishing and Discovery Workshop
#opentourism - Linked Open Data Publishing and Discovery WorkshopRaf Buyle
 
Linked Open Europeana: Semantics for the Citizen
Linked Open Europeana: Semantics for the CitizenLinked Open Europeana: Semantics for the Citizen
Linked Open Europeana: Semantics for the CitizenStefan Gradmann
 
20101015 linked openeuropeanafi
20101015 linked openeuropeanafi20101015 linked openeuropeanafi
20101015 linked openeuropeanafiStefan Gradmann
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaGiorgia Lodi
 
Using Hyperlinks to Enrich Message Board Content with Linked Data
Using Hyperlinks to Enrich Message Board Content with Linked DataUsing Hyperlinks to Enrich Message Board Content with Linked Data
Using Hyperlinks to Enrich Message Board Content with Linked DataSheila Kinsella
 

Similaire à Data Interlinking (20)

Web and text
Web and textWeb and text
Web and text
 
Chapter 10 Data Mining Techniques
 Chapter 10 Data Mining Techniques Chapter 10 Data Mining Techniques
Chapter 10 Data Mining Techniques
 
Copy of 10text (2)
Copy of 10text (2)Copy of 10text (2)
Copy of 10text (2)
 
EDF2012 Mariana Damova - Factforge
EDF2012   Mariana Damova - FactforgeEDF2012   Mariana Damova - Factforge
EDF2012 Mariana Damova - Factforge
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations on
 
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
 
Information Retrieval
Information Retrieval Information Retrieval
Information Retrieval
 
Linking data without common identifiers
Linking data without common identifiersLinking data without common identifiers
Linking data without common identifiers
 
Fact forge20 edf
Fact forge20 edfFact forge20 edf
Fact forge20 edf
 
turban_ch07ch07ch07ch07ch07ch07dss9e_ch07.ppt
turban_ch07ch07ch07ch07ch07ch07dss9e_ch07.pptturban_ch07ch07ch07ch07ch07ch07dss9e_ch07.ppt
turban_ch07ch07ch07ch07ch07ch07dss9e_ch07.ppt
 
Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking  Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking
 
#opentourism - Linked Open Data Publishing and Discovery Workshop
#opentourism - Linked Open Data Publishing and Discovery Workshop#opentourism - Linked Open Data Publishing and Discovery Workshop
#opentourism - Linked Open Data Publishing and Discovery Workshop
 
Linked Open Europeana: Semantics for the Citizen
Linked Open Europeana: Semantics for the CitizenLinked Open Europeana: Semantics for the Citizen
Linked Open Europeana: Semantics for the Citizen
 
20101015 linked openeuropeanafi
20101015 linked openeuropeanafi20101015 linked openeuropeanafi
20101015 linked openeuropeanafi
 
NLP
NLPNLP
NLP
 
NLP
NLPNLP
NLP
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenza
 
Using Hyperlinks to Enrich Message Board Content with Linked Data
Using Hyperlinks to Enrich Message Board Content with Linked DataUsing Hyperlinks to Enrich Message Board Content with Linked Data
Using Hyperlinks to Enrich Message Board Content with Linked Data
 

Plus de LD4SC

Smart Cities and Open Data
Smart Cities and Open DataSmart Cities and Open Data
Smart Cities and Open DataLD4SC
 
Smart cities and open data platforms
Smart cities and open data platformsSmart cities and open data platforms
Smart cities and open data platformsLD4SC
 
ifcOWL - An ontology for building data
ifcOWL - An ontology for building dataifcOWL - An ontology for building data
ifcOWL - An ontology for building dataLD4SC
 
Linking with OpenRefine
Linking with OpenRefineLinking with OpenRefine
Linking with OpenRefineLD4SC
 
ICT for Smart Cities
ICT for Smart CitiesICT for Smart Cities
ICT for Smart CitiesLD4SC
 
Linked Data Generation Process
Linked Data Generation ProcessLinked Data Generation Process
Linked Data Generation ProcessLD4SC
 
Ontologies for Smart Cities
Ontologies for Smart CitiesOntologies for Smart Cities
Ontologies for Smart CitiesLD4SC
 
RDF(S) and SPARQL
RDF(S) and SPARQLRDF(S) and SPARQL
RDF(S) and SPARQLLD4SC
 

Plus de LD4SC (8)

Smart Cities and Open Data
Smart Cities and Open DataSmart Cities and Open Data
Smart Cities and Open Data
 
Smart cities and open data platforms
Smart cities and open data platformsSmart cities and open data platforms
Smart cities and open data platforms
 
ifcOWL - An ontology for building data
ifcOWL - An ontology for building dataifcOWL - An ontology for building data
ifcOWL - An ontology for building data
 
Linking with OpenRefine
Linking with OpenRefineLinking with OpenRefine
Linking with OpenRefine
 
ICT for Smart Cities
ICT for Smart CitiesICT for Smart Cities
ICT for Smart Cities
 
Linked Data Generation Process
Linked Data Generation ProcessLinked Data Generation Process
Linked Data Generation Process
 
Ontologies for Smart Cities
Ontologies for Smart CitiesOntologies for Smart Cities
Ontologies for Smart Cities
 
RDF(S) and SPARQL
RDF(S) and SPARQLRDF(S) and SPARQL
RDF(S) and SPARQL
 

Dernier

Exploration Method’s in Archaeological Studies & Research
Exploration Method’s in Archaeological Studies & ResearchExploration Method’s in Archaeological Studies & Research
Exploration Method’s in Archaeological Studies & ResearchPrachya Adhyayan
 
3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...
3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...
3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...PirithiRaju
 
biosynthesis of the cell wall and antibiotics
biosynthesis of the cell wall and antibioticsbiosynthesis of the cell wall and antibiotics
biosynthesis of the cell wall and antibioticsSafaFallah
 
Main Exam Applied biochemistry final year
Main Exam Applied biochemistry final yearMain Exam Applied biochemistry final year
Main Exam Applied biochemistry final yearmarwaahmad357
 
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...Sérgio Sacani
 
CW marking grid Analytical BS - M Ahmad.docx
CW  marking grid Analytical BS - M Ahmad.docxCW  marking grid Analytical BS - M Ahmad.docx
CW marking grid Analytical BS - M Ahmad.docxmarwaahmad357
 
Pests of cumbu_Identification, Binomics, Integrated ManagementDr.UPR.pdf
Pests of cumbu_Identification, Binomics, Integrated ManagementDr.UPR.pdfPests of cumbu_Identification, Binomics, Integrated ManagementDr.UPR.pdf
Pests of cumbu_Identification, Binomics, Integrated ManagementDr.UPR.pdfPirithiRaju
 
Pests of Redgram_Identification, Binomics_Dr.UPR
Pests of Redgram_Identification, Binomics_Dr.UPRPests of Redgram_Identification, Binomics_Dr.UPR
Pests of Redgram_Identification, Binomics_Dr.UPRPirithiRaju
 
PSP3 employability assessment form .docx
PSP3 employability assessment form .docxPSP3 employability assessment form .docx
PSP3 employability assessment form .docxmarwaahmad357
 
Basic Concepts in Pharmacology in molecular .pptx
Basic Concepts in Pharmacology in molecular  .pptxBasic Concepts in Pharmacology in molecular  .pptx
Basic Concepts in Pharmacology in molecular .pptxVijayaKumarR28
 
Controlling Parameters of Carbonate platform Environment
Controlling Parameters of Carbonate platform EnvironmentControlling Parameters of Carbonate platform Environment
Controlling Parameters of Carbonate platform EnvironmentRahulVishwakarma71547
 
Role of Herbs in Cosmetics in Cosmetic Science.
Role of Herbs in Cosmetics in Cosmetic Science.Role of Herbs in Cosmetics in Cosmetic Science.
Role of Herbs in Cosmetics in Cosmetic Science.ShwetaHattimare
 
Physics Serway Jewett 6th edition for Scientists and Engineers
Physics Serway Jewett 6th edition for Scientists and EngineersPhysics Serway Jewett 6th edition for Scientists and Engineers
Physics Serway Jewett 6th edition for Scientists and EngineersAndreaLucarelli
 
Intensive Housing systems for Poultry.pptx
Intensive Housing systems for Poultry.pptxIntensive Housing systems for Poultry.pptx
Intensive Housing systems for Poultry.pptxHarshiniAlapati
 
Application of Foraminiferal Ecology- Rahul.pptx
Application of Foraminiferal Ecology- Rahul.pptxApplication of Foraminiferal Ecology- Rahul.pptx
Application of Foraminiferal Ecology- Rahul.pptxRahulVishwakarma71547
 
Pests of ragi_Identification, Binomics_Dr.UPR
Pests of ragi_Identification, Binomics_Dr.UPRPests of ragi_Identification, Binomics_Dr.UPR
Pests of ragi_Identification, Binomics_Dr.UPRPirithiRaju
 
Substances in Common Use for Shahu College Screening Test
Substances in Common Use for Shahu College Screening TestSubstances in Common Use for Shahu College Screening Test
Substances in Common Use for Shahu College Screening TestAkashDTejwani
 
Bureau of Indian Standards Specification of Shampoo.pptx
Bureau of Indian Standards Specification of Shampoo.pptxBureau of Indian Standards Specification of Shampoo.pptx
Bureau of Indian Standards Specification of Shampoo.pptxkastureyashashree
 
001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...
001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...
001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...marwaahmad357
 
TORSION IN GASTROPODS- Anatomical event (Zoology)
TORSION IN GASTROPODS- Anatomical event (Zoology)TORSION IN GASTROPODS- Anatomical event (Zoology)
TORSION IN GASTROPODS- Anatomical event (Zoology)chatterjeesoumili50
 

Dernier (20)

Exploration Method’s in Archaeological Studies & Research
Exploration Method’s in Archaeological Studies & ResearchExploration Method’s in Archaeological Studies & Research
Exploration Method’s in Archaeological Studies & Research
 
3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...
3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...
3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...
 
biosynthesis of the cell wall and antibiotics
biosynthesis of the cell wall and antibioticsbiosynthesis of the cell wall and antibiotics
biosynthesis of the cell wall and antibiotics
 
Main Exam Applied biochemistry final year
Main Exam Applied biochemistry final yearMain Exam Applied biochemistry final year
Main Exam Applied biochemistry final year
 
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
 
CW marking grid Analytical BS - M Ahmad.docx
CW  marking grid Analytical BS - M Ahmad.docxCW  marking grid Analytical BS - M Ahmad.docx
CW marking grid Analytical BS - M Ahmad.docx
 
Pests of cumbu_Identification, Binomics, Integrated ManagementDr.UPR.pdf
Pests of cumbu_Identification, Binomics, Integrated ManagementDr.UPR.pdfPests of cumbu_Identification, Binomics, Integrated ManagementDr.UPR.pdf
Pests of cumbu_Identification, Binomics, Integrated ManagementDr.UPR.pdf
 
Pests of Redgram_Identification, Binomics_Dr.UPR
Pests of Redgram_Identification, Binomics_Dr.UPRPests of Redgram_Identification, Binomics_Dr.UPR
Pests of Redgram_Identification, Binomics_Dr.UPR
 
PSP3 employability assessment form .docx
PSP3 employability assessment form .docxPSP3 employability assessment form .docx
PSP3 employability assessment form .docx
 
Basic Concepts in Pharmacology in molecular .pptx
Basic Concepts in Pharmacology in molecular  .pptxBasic Concepts in Pharmacology in molecular  .pptx
Basic Concepts in Pharmacology in molecular .pptx
 
Controlling Parameters of Carbonate platform Environment
Controlling Parameters of Carbonate platform EnvironmentControlling Parameters of Carbonate platform Environment
Controlling Parameters of Carbonate platform Environment
 
Role of Herbs in Cosmetics in Cosmetic Science.
Role of Herbs in Cosmetics in Cosmetic Science.Role of Herbs in Cosmetics in Cosmetic Science.
Role of Herbs in Cosmetics in Cosmetic Science.
 
Physics Serway Jewett 6th edition for Scientists and Engineers
Physics Serway Jewett 6th edition for Scientists and EngineersPhysics Serway Jewett 6th edition for Scientists and Engineers
Physics Serway Jewett 6th edition for Scientists and Engineers
 
Intensive Housing systems for Poultry.pptx
Intensive Housing systems for Poultry.pptxIntensive Housing systems for Poultry.pptx
Intensive Housing systems for Poultry.pptx
 
Application of Foraminiferal Ecology- Rahul.pptx
Application of Foraminiferal Ecology- Rahul.pptxApplication of Foraminiferal Ecology- Rahul.pptx
Application of Foraminiferal Ecology- Rahul.pptx
 
Pests of ragi_Identification, Binomics_Dr.UPR
Pests of ragi_Identification, Binomics_Dr.UPRPests of ragi_Identification, Binomics_Dr.UPR
Pests of ragi_Identification, Binomics_Dr.UPR
 
Substances in Common Use for Shahu College Screening Test
Substances in Common Use for Shahu College Screening TestSubstances in Common Use for Shahu College Screening Test
Substances in Common Use for Shahu College Screening Test
 
Bureau of Indian Standards Specification of Shampoo.pptx
Bureau of Indian Standards Specification of Shampoo.pptxBureau of Indian Standards Specification of Shampoo.pptx
Bureau of Indian Standards Specification of Shampoo.pptx
 
001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...
001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...
001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...
 
TORSION IN GASTROPODS- Anatomical event (Zoology)
TORSION IN GASTROPODS- Anatomical event (Zoology)TORSION IN GASTROPODS- Anatomical event (Zoology)
TORSION IN GASTROPODS- Anatomical event (Zoology)
 

Data Interlinking

  • 1. Data interlinking J´erˆome Euzenat Montbonnot, France Jerome.Euzenat@inria.fr June 10, 2015
  • 2. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools The problem: RDF data interlinking 3 http://data.bnf.fr/12144801/edgar allan poe the gold bug/, dc:title, “The gold bug” The gold bug title creator en E. Poe lang firstname lastname Writer Work rdf:type rdf:type b a1 a2 Baudelaire Malarm´e The raven orig name name name orig authortranslator translator Person Book rdf:type rdf:type ≈ ≥ ≤ ≥ J´erˆome Euzenat Data interlinking 3 / 0
  • 3. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Goal of the lecture Provide an overview of the problem of data interlinking Describe broad categories of solutions Point to useful tools for generating links Mostly about generating links, not on finding how to generate them J´erˆome Euzenat Data interlinking 4 / 0
  • 4. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Outline Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools J´erˆome Euzenat Data interlinking 5 / 0
  • 5. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Data interlinking I use (with the same meaning): instance matching entity linking data interlinking I do not use: record linkage data deduplication entity reconciliation coreference resolution J´erˆome Euzenat Data interlinking 6 / 0
  • 6. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools The data interlinking problem Data interlinking is the task of finding same entities within different datasets (RDF graphs). Data source 1 Data source 2 interlinking owl:sameAs J´erˆome Euzenat Data interlinking 7 / 0
  • 7. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools The data interlinking process Data source Data source interlinking Resulting linksSample links parameters resources J´erˆome Euzenat Data interlinking 8 / 0
  • 8. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools The data interlinking process (2) d d extraction Linkage spec generation l interlinking J´erˆome Euzenat Data interlinking 9 / 0
  • 9. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Approaches to data interlinking There are two main approaches to data interlinking: similarity-based: resources are compared through a similarity measure and if they are similar enough, they are the same. key-based: sufficient conditions for two resources to be the same are induced and used to find same entities J´erˆome Euzenat Data interlinking 10 / 0
  • 10. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Classification of similarities Data interlinking techniques may be based on: Data ID (URIs); Data keys External relations: (explicit or implicit) links to other resources Data description (content) J´erˆome Euzenat Data interlinking 12 / 0
  • 11. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Manual resource matching URI1 URI2 Manual observation owl:sameAs This does not scale. But may be good for a first sample or reference. Crowdsourcing? J´erˆome Euzenat Data interlinking 13 / 0
  • 12. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools URI matching URI1 URI2 URI transformation owl:sameAs http://dbpedia.org/resource/Johann Sebastian Bach owl:sameAs http://www.lastfm.fr/music/Johann+Sebastian+Bach http://rdf.insee.fr/geo/regions-2011.rdf#REG 11 ? http://ec.europa.eu/eurostat/ramon/rdfdata/nuts2008/FR10 J´erˆome Euzenat Data interlinking 14 / 0
  • 13. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Id matching id id Finding same ids owl:sameAs You can find such types of ids: Social security numbers ISBN, DOI, MAC addresses, etc. authorities: ISO (countries, languages), IATA (airports) Most databases are built on such identifiers. . . but they are often local to the database. J´erˆome Euzenat Data interlinking 15 / 0
  • 14. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Context-based similarity URI1 URI2 VIAF Context-based “similarity” owl:sameAs Process: Project your data into another resource (DBPedia, geonames, viaf, etc.) Assess relations between considered terms Import the relation in the dataset J´erˆome Euzenat Data interlinking 16 / 0
  • 15. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Content-based similarity 3 The gold bug title creator E. Poe firstname lastname Writer Work rdf:type rdf:type b a1 a2 Baudelaire Poe Le corbeauLe scarab´e d’or orig name name title authortranslator Person Book rdf:type rdf:type Compute similarity owl:sameAs Two main approaches: bag of text structured similarity J´erˆome Euzenat Data interlinking 17 / 0
  • 16. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Term-based similarity The gold bug E. Poe firstname lastname Writer Work type type Baudelaire Poe Le corbeau Le scarab´e d’or orig name name title authortranslator Person Book type type Compute “bag of words” similarity owl:sameAs Various tools: Normalisation (Stemmer, Tokenizers) Use of linguistic resources (Wordnet) Translation Many similarity measures, especially from information retrieval J´erˆome Euzenat Data interlinking 18 / 0
  • 17. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Structure similarity title creator firstname lastname type type orig name name title authortranslator type type Compute structure similarity owl:sameAs Techniques: Based on graph matching techniques Can be used to learn weights on properties (but need matching) Problem: scalability J´erˆome Euzenat Data interlinking 19 / 0
  • 18. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Cross-lingual RDF data interlinking http://a.org/Mus999 France Mus´ee du Louvre nom lieu Paris 99,rue de Rivoli 75001 adresse ville rue zip http://bb.cn/盧浮宮 盧浮宮 法國巴黎 稱號 位於 owl:sameAs ? J´erˆome Euzenat Data interlinking 20 / 0
  • 19. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Similarity-based data interlinking RESOURCE RESOURCE SIMILARITY owl:sameAs ? Hypothesis: ↑ similarity ↑ probability that it is the same object DOCUMENT DOCUMENTSIMILARITY owl:sameAs ? Yuzhong Qu, Wei Hu, Gong Cheng: Constructing virtual documents for ontology matching. WWW 2006: 23-31. DOCUMENT(zh) DOCUMENT(en) DOCUMENT(en) translation DOCUMENT(zh) translationSIMSIM SIMILARITY owl:sameAs ? BabelNet(IDs) BabelNet(IDs)SIMILARITY owl:sameAs ? J´erˆome Euzenat Data interlinking 21 / 0
  • 20. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools General cross-lingual interlinking framework 1 Virtual Documents 3 Similarity Computation 4 Link Generation 2 Language Normalization J´erˆome Euzenat Data interlinking 22 / 0
  • 21. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Building virtual documents by levels http://dbpedia.org/resource/Charles Perrault Charles Perrault dbpedia:France Level 1 France is a sovereign country in Western Eu- rope that includes over- seas regions and territo- ries. . . Level 2 J´erˆome Euzenat Data interlinking 23 / 0
  • 22. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Machine translation: parameters 1 Virtual Documents 2.1 Machine Translation 2.2 NLP Preprocessing 3 Similarity Computation 4 Link Generation Level 1 Level 2 ZH→EN Lowercase+Tokenize + Filter stop words + Stemming (Porter) + Bigrams (terms) TF+cosine TF*IDF+cosine Greedy Hungarian 32 settings have been explored in total J´erˆome Euzenat Data interlinking 24 / 0
  • 23. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Lcase+Tokenization with TF*IDF at Level 1 0 - 0.11 0.11 - 0.15 0.15 - 0.25 0.25 - 0.35 0.35 - 0.45 0.45 - 1 J´erˆome Euzenat Data interlinking 25 / 0
  • 24. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Adding noise J´erˆome Euzenat Data interlinking 26 / 0
  • 25. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools BabelNet method: parameters 1 Virtual Documents 2 Multilingual KB Mapping 3 Similarity Computation 4 Link Generation Level 1 Level 2 TF+cosine TF*IDF+cosine Greedy Hungarian J´erˆome Euzenat Data interlinking 27 / 0
  • 26. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Database keys A set of attributes which uniquely identifies elements of a relation e.g., Book: isbn, People: fistname, lastname, birthplace, birthdate usually given and used to check integrity They may be used for identifying same entities across two databases. But they require alignments. J´erˆome Euzenat Data interlinking 29 / 0
  • 27. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Example of interlinking with keys and alignments Are the resources bnf:cb118949856 and bne:XX1721208 the same? if BNF ontology states foaf:Person owl:hasKey {foaf:name, dc:dates} and we have the following alignment foaf:Person bnf:cb118949856 Albert Camus 07-11-1913 04-01-1960 Romancier, dramaturge et essayiste http://id.loc.gov/vocabulary/countries/fr Mondovi (Alg´erie) 1913-1960 foaf:name rda:dateOfBirth rda:dateOfDeath rda:biographicalInformation rda:countryAssociatedWithThePerson rda:placeOfBirth dc:dates frbrer:C1005 bne:XX1721208 Camus, Albert 1913-1960 Aut [...]1980 frber:P3039 frber:P3040 rda:sourceConsulted ≡ ≡ ≈ ≈ owl:sameAs J´erˆome Euzenat Data interlinking 30 / 0
  • 28. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Key-based interlinking methods Database keys allow for identifying entities: if they are aligned, this can be used for linking. Advantages they are logically grounded they allow to minimize the number of properties to compare (if we use minimal keys) Drawbacks Require alignment between properties and classes Very few key axioms are available, and they are not necessarily useful for interlinking We overcome these drawbacks by introducing link keys J´erˆome Euzenat Data interlinking 31 / 0
  • 29. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Link key A link key { p1, q1 , . . . , pn, qn }{ p1, q1 , . . . , pm, qm } linkkey c, d holds iff For all pairs of instances a and b belonging respectively to classes c and d of ontologies O and O , if a and b share at least one value (object) for each pairs of properties pi and qi respectively, and a and b share all their values (objects) for each pairs of properties pi and qi respectively, then they are the same ( a, owl:sameAs, b ). Example: { foaf:name, frbr:P3039 }{ dc:dates, frbr:P3040 } linkkey foaf:Person, frbr:C1005 J´erˆome Euzenat Data interlinking 32 / 0
  • 30. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Link key extraction Problem: How to induce such link keys from data? The number of set of pairs of properties is exponential Our approach: discover only candidate link keys. evaluate them in order to select only the “good” ones J´erˆome Euzenat Data interlinking 33 / 0
  • 31. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Candidate link key A candidate link key is a set of property pairs { p1, q1 , . . . , pk, qk } that 1. would generate at least one link if used as a link key 2. is maximal for at least one link, or is the intersection of several candidate link keys J´erˆome Euzenat Data interlinking 34 / 0
  • 32. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Supervised selection measures If a sample of reference links is available: Positive examples (L+) : a set of owl:sameAs links Negative examples (L−) : a set of owl:differentFrom links Idea: Approximate precision and recall on that sample Definition (Relative precision and recall) precision(K, L+ , L− ) = |L+ ∩ LD,D (K)| |(L+ ∪ L−) ∩ LD,D (K)| recall(K, L+ ) = |L+ ∩ LD,D (K)| |L+| J´erˆome Euzenat Data interlinking 35 / 0
  • 33. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Unsupervised selection measures When no reference link is available. Idea: measuring how close the extracted links would be from one-to-one and total. Definition (Discriminability) disc(K, D, D ) = min(|{a : a, b ∈ LD,D (K)}|, |{b : a, b ∈ LD,D (K)}|) |LD,D (K)| Definition (Coverage) cov(K, D, D ) = |{a : a, b ∈ LD,D (K)} ∪ {b : a, b ∈ LD,D (K)}| |{a : c(a) ∈ D} ∪ {b : d(b) ∈ D }| J´erˆome Euzenat Data interlinking 36 / 0
  • 34. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Experimental evaluation These selection measures were evaluated on public datasets. Finding links between French municipalities described in two different datasets: Insee dataset: 36700 instances; Geonames dataset: 36552 instances. The reference link set is composed of: Positive links: 36552 owl:sameAs statements; owl:differentFrom links derived from owl:sameAs links (closed world assumption). J´erˆome Euzenat Data interlinking 37 / 0
  • 35. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Evaluation The algorithm extracted 11 candidate link keys: {1} {2} {3, 4} {5, 6} {7, 1} {2, 1} {3, 4, 1} {3, 2, 4} {3, 7, 4, 1} {3, 2, 4, 1} {3, 7, 2, 4, 1} coverage discriminability 1 = nom, name 2 = nom, alternateName 3 = subdivisionDe, parentFeature 4 = subdivisionDe, parentADM3 5 = codeINSEE, population 6 = codeCommune, population 7 = nom, officialName J´erˆome Euzenat Data interlinking 38 / 0
  • 36. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Evaluation Correlation between the harmonic means of discriminability and coverage and F-measure: bad F-measure≈ 0 high F-measure≈ .99 good F-measure≈ 0.89 {1} {2} {3, 4} {5, 6} {7, 1} {2, 1} {3, 4, 1} {3, 2, 4} {3, 7, 4, 1} {3, 2, 4, 1} {3, 7, 2, 4, 1} h-mean(disc.,cov)≈ .99 h-mean(disc.,cov)≈ .89 h-mean(disc.,cov) ≈ 0 1 = nom, name 2 = nom, alternateName 3 = subdivisionDe, parentFeature 4 = subdivisionDe, parentADM3 5 = codeINSEE, population 6 = codeCommune, population 7 = nom, officialName J´erˆome Euzenat Data interlinking 38 / 0
  • 37. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Why using ontologies? Because it is obvious that we must compare the instances of equivalent classes based on equivalent properties. More precisely: For reducing the search space for finding link keys and similarities For reducing the scope of linkage specifications Because not the same linkage rules work for the same classes Because classes and properties are hint like others of the similarity between resources Ex. With similarity and with keys J´erˆome Euzenat Data interlinking 40 / 0
  • 38. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Data interlinking through a common ontology o URI1 URI2 Resource matching of datasets described by the same ontology owl:sameAs J´erˆome Euzenat Data interlinking 41 / 0
  • 39. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Matching with a common ontology + Focus the search: only match instances of the same class; – Not sufficient: it remains to identify corresponding entities + If keys are defined (OWL 2), this is done; + At least we know which properties to compare; – Inferring secondary keys may be useful; – Correcting discrepancies: record linkage. J´erˆome Euzenat Data interlinking 42 / 0
  • 40. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Record linkage Name Johann Date 1665-03-21 Place M¨unchen NameJohannes Date31/03/1665 PlaceMonaco di Bavaria Having a common ontology does not solve all problems. J´erˆome Euzenat Data interlinking 43 / 0
  • 41. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Different types of mismatch Different domains, connected (BIM, Energy demand) ⇒ few correspondences, any type Same domain, different models (engineer, policy maker) ⇒ many correspondences, mostly equivalence Same domain, different granularity (city management, building design) ⇒ many correspondences, mostly subsumption J´erˆome Euzenat Data interlinking 44 / 0
  • 42. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Data interlinking with different ontologies (implicit alignment) o o URI1 URI2 Resource matching of datasets described by different ontologies owl:sameAs J´erˆome Euzenat Data interlinking 45 / 0
  • 43. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Data interlinking with different ontologies (explicit alignment) o o URI1 URI2 A Resource matching of datasets described by different ontologies owl:sameAs J´erˆome Euzenat Data interlinking 46 / 0
  • 44. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Ontology matching for data interlinking o o URI1 URI2 Ontology matching A Data interlinking owl:sameAs J´erˆome Euzenat Data interlinking 47 / 0
  • 45. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Heterogeneity problem Resources being expressed in different ways must be reconciled before being used. Mismatch between formalized knowledge can occur when: different languages are used (OWL vs. Topic maps); different terminologies are used: English vs. Chinese; Book vs. Monograph. different models are used: different classes: Autobiography vs. Paperback; classes vs. property: Essay vs. literarygenre; classes vs. instances: One physical book as an instance vs. one work as an instance. different scopes and granularity are used. Only books vs. cultural items vs. any product; Books detailed to the print and translation level vs. books as works. J´erˆome Euzenat Data interlinking 48 / 0
  • 46. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Ontology alignment Item DVD Book Paperback Hardcover CD price title doi creator pp author integer string uri Person Monograph Essay Literary critics Politics Biography Autobiography Literature pages isbn author title subject Human Writer ≥ ≥ ≥ ≤ ≥ J´erˆome Euzenat Data interlinking 49 / 0
  • 47. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Expressive alignments (EDOAL) Pocket Book topic author = Volume size14 ≥ Autobiography = ∀x, Pocket(x) ⇐ Volume(x) ∧ size(x, y) ∧ y ≤ 14 ∀x, Book(x) ∧ author(x, y) ∧ topic(x, y) ≡ Autobiography(x) J´erˆome Euzenat Data interlinking 50 / 0
  • 48. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Example: INSEE dataset R´egion table: code nom chef-lieu 11 ˆIle-de-France 75056 21 Champagne-Ardenne 51108 22 Picardie 80021 Sous-r´egion table: r´egion d´epartement 11 75 11 77 11 78 11 91 11 92 11 93 J´erˆome Euzenat Data interlinking 51 / 0
  • 49. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Example: Administrative ontology Territoire FR Pays Region Departement Arrondissement Commune code nom chef-lieu subdivision integer string J´erˆome Euzenat Data interlinking 52 / 0
  • 50. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Example: NUTS dataset NUTSRegion table: level code name hasParentRegion 0 FR FRANCE 1 FR1 ˆILE DE FRANCE FR 2 FR10 ˆIle de France FR1 3 FR101 Paris FR10 3 FR104 Essonne FR10 J´erˆome Euzenat Data interlinking 53 / 0
  • 51. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Example: Linking INSEE and NUTS NUTS: Nomenclature of territorial units for statistics #INSEE INSEE name NUTS Level #NUTS 1 Pays 0 34 1 142 26 R´egion 2 344 100 D´epartement 3 1488 342 Arrondissement 4036 Canton 4 52422 Commune 5 J´erˆome Euzenat Data interlinking 54 / 0
  • 52. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Example: Linking INSEE and NUTS Territoire FR Pays Region Departement Commune PAYS FR REG 11 DEP 75 DEP 77 DEP 78 COM 75056 Region Country NUTSRegion LAURegion FR UK FR1 FR10 FR101 FR102 FR103 owl:sameAs owl:sameAs owl:sameAs owl:sameAs owl:sameAs J´erˆome Euzenat Data interlinking 55 / 0
  • 53. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Example: Linksets Specific data sets containing URIs. <http://www.example.org/linkset/INSEE-NUTS> a void:Linkset ; void:target <http://rdf.insee.fr/geo/regions-2011.rdf>; void:target <http://nuts.psi.enakting.org/id/>; insee:PAYS FR owl:sameAs nuts:FR insee:REG 11 owl:sameAs nuts:FR10 insee:DEP 75 owl:sameAs nuts:FR101 insee:DEP 77 owl:sameAs nuts:FR102 insee:DEP 78 owl:sameAs nuts:FR103 J´erˆome Euzenat Data interlinking 56 / 0
  • 54. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Example: interesting sets nuts onsordnance s. igninsee geonames dbpedia freebase J´erˆome Euzenat Data interlinking 57 / 0
  • 55. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools A simple algorithm Find matching concepts [concept matching]; For each of them, determine matching properties based on the similarity between their values in both datasets [property matching]; From them find property combinations identifying corresponding entities [key extraction]; Link corresponding entities [link generation]. For instance, nom/RegionINSEE ⊆ name/NUTSRegionNUTS and moreover they are unambiguous. J´erˆome Euzenat Data interlinking 58 / 0
  • 56. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools INSEE and NUTS: ontology alignment Territoire FR Pays Region Departement Arrondissement Canton Commune code nom chef-lieu subdivision integer string Region Country NUTSRegion LAURegion name level code hasSubRegion = ≤ ≤ ≤ ≤ ≤ = J´erˆome Euzenat Data interlinking 59 / 0
  • 57. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Simple alignments are not sufficient Territoire FR Region Departement Commune nom DEP 75 nom COM 75056 nom Region NUTSRegion name FR101 name Paris = = = ≤ ≤ ≤ = = = J´erˆome Euzenat Data interlinking 60 / 0
  • 58. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Expressive alignments are necessary Region NUTSRegion level hasParentRegion 2 = FR = = subdivision hasSubRegion = nom name = J´erˆome Euzenat Data interlinking 61 / 0
  • 59. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools What does this mean? Ontology alignments are schema-level expression of correspondences; They are useful for focussing the search; Expressive alignments are necessary; They can be turned into SPARQL-based link generators. but it is also necessary to express instance level constraints: for converting data (e.g., mph vs. m/s); for expressing matching constraint on data (e.g., similarity). J´erˆome Euzenat Data interlinking 62 / 0
  • 60. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Data interlinking and ontology matching d o d oMatcher A Generator l J´erˆome Euzenat Data interlinking 63 / 0
  • 61. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Tools for data interlinking Linkage spec extraction generation similarity LIMES Silk, LIMES, OpenRefine key LinkKeyDisco SPARQL J´erˆome Euzenat Data interlinking 65 / 0
  • 62. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Silk Silk is a robust software for interlinking data sets. It relies on an expressive specification of linking conditions: Declare data sources (DataSource); Circumscribe entities to compare (Source/TargetDataset); Describe how to compare them (LinkageRule): Select properties to compare through paths (Input); Compute distances between them (Compare+threshold); Aggregate all comparisons (Aggregate); Select those pairs of entities to be linked (Filter); Generate links (Output+thresholds). J´erˆome Euzenat Data interlinking 66 / 0
  • 63. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools A Silk script Consider a linking script between INSEE and NUTS: <Silk> <Prefix id="nuts" namespace="http://ec.europa.eu/.../geographic.rdf#" /> <Prefix id="insee" namespace="http://rdf.insee.fr/geo/" /> <DataSource id="nuts2008" type="sparqlEndpoint"> <Param name="endpointURI" value="http://localhost:9091/.../internal"/> <Param name="graph" value="http://localhost:9091/.../nuts2008-complete-1"/> </DataSource> <DataSource id="insee2010" type="sparqlEndpoint"> <Param name="endpointURI" value="http://localhost:9091/.../internal"/> <Param name="graph" value="http://localhost:9091/.../source/regions-2010-1"/> </DataSource> <Thresholds accept="0.9" verify="0.7" /> <Outputs> <Output type="sparul"> <Param name="graphUri" value="http://localhost:9091/.../source/insee-nuts-silk"/> <Param name="uri" value="http://localhost:9091/.../lifted/"/> <Param name="parameter" value="update"/> </Output> </Outputs> <Interlinks> <Interlink id="linkingNUTS"> <LinkType>owl:sameAs</LinkType> <SourceDataset dataSource="nuts2008" var="s"> <RestrictTo>?s rdf:type nuts:NUTSRegion. ?s nuts:level 2. </RestrictTo> </SourceDataset> <TargetDataset dataSource="insee2010" var="ss"> <RestrictTo>?ss rdf:type insee:Region</RestrictTo </TargetDataset> <LinkageRule> <Aggregate type="max"> <Compare metric="levenshteinDistance" threshold=".2"> <Input path="?s/nuts:name"/> <Input path="?ss/insee:nom"/> </Compare> </Aggregate> </LinkageRule> </Interlinks> </Interlink> </Silk> J´erˆome Euzenat Data interlinking 67 / 0
  • 64. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Silk: prefix and sources <Silk> <Prefix id="nuts" namespace="http://ec.europa.eu/.../geographic.rdf#" /> <Prefix id="insee" namespace="http://rdf.insee.fr/geo/" /> <DataSource id="nuts2008" type="sparqlEndpoint"> <Param name="endpointURI" value="http://localhost:9091/.../internal"/> <Param name="graph" value="http://localhost:9091/.../nuts2008-complete-1"/ </DataSource> <DataSource id="id1" type="file"> <Param name="file" value="/Skratch/TutoLinking/admin/regions-2010.rdf"/> <Param name="format" value="RDF/XML" /> </DataSource> Sources can be files or SPARQL endpoint. J´erˆome Euzenat Data interlinking 68 / 0
  • 65. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Silk rules <Interlinks> <Interlink id="linkingNUTS"> <LinkType>owl:sameAs</LinkType> <SourceDataset dataSource="nuts2008" var="s"> <RestrictTo>?s rdf:type nuts:NUTSRegion. ?s nuts:level 2. </RestrictTo> </SourceDataset> <TargetDataset dataSource="insee2010" var="ss"> <RestrictTo>?ss rdf:type insee:Region</RestrictTo> </TargetDataset> <Thresholds accept="0.9" verify="0.7" /> <Outputs> <Output type="sparul"> <Param name="graphUri" value="http://localhost:9091/.../source/insee-nut <Param name="uri" value="http://localhost:9091/.../lifted/"/> <Param name="parameter" value="update"/> </Output> </Outputs> J´erˆome Euzenat Data interlinking 69 / 0
  • 66. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Silk rules (cont’ed) <LinkageRule> <Aggregate type="max"> <Compare metric="levenshteinDistance" threshold=".2"> <Input path="?s/nuts:name"/> <Input path="?ss/insee:nom"/> </Compare> </Aggregate> </LinkageRule> </Interlinks> </Interlink> </Silk> They can: transform the data (lowercase, tokenize, to integers, etc.), use comparison metrics (equality, levenshtein, Jaro-Winkler, etc.), and aggregate their values (average, min, max, etc.). J´erˆome Euzenat Data interlinking 70 / 0
  • 67. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Silk workbench J´erˆome Euzenat Data interlinking 71 / 0
  • 68. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools EDOAL Alignments <Cell> <entity1><e:Class rdf:about="&insee;Region"/></entity1> <entity2> <e:Class> <e:and rdf:parseType="Collection"> <e:Class rdf:about="&nuts;NUTSRegion"/> <e:AttributeValueRestriction> <e:onAttribute><e:Property rdf:about="&nuts;level"/></e:onAttribute> <e:comparator rdf:resource="&edoal;equals"/> <e:value><e:Literal e:type="&xsd;integer" e:string="2" /></e:value> </e:AttributeValueRestriction> <e:AttributeValueRestriction> <e:onAttribute> <e:Relation rdf:about="&nuts;hasParentRegion" /> </e:onAttribute> <e:comparator rdf:resource="&edoal;equals"/> <e:value><e:Instance rdf:about="&esdata;FR" /></e:value> </e:AttributeValueRestriction> </e:and> </e:Class> </entity2> <relation>equivalence</relation>J´erˆome Euzenat Data interlinking 72 / 0
  • 69. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Link keys in the Alignment API <e:linkkey> <e:Linkkey> <e:binding> <e:Intersects> <e:property1><e:Property rdf:about="&insee;nom" /></e:property1> <e:property2><e:Property rdf:about="&nuts;name" /></e:property2> </e:Intersects> <e:Equals> <e:property1> <e:Property> <e:inverse><e:Property rdf:about="&insee;subdivision" /></e:inverse> </e:property1> <e:property2><e:Property rdf:about="&nuts;hasParentRegion" /></e:propert </e:Equals> </e:binding> </e:Linkkey> </e:linkkey> J´erˆome Euzenat Data interlinking 73 / 0
  • 70. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Query generation PREFIX insee: <http://rdf.insee.fr/ontologie-geo-2006.rdf#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT ?r FROM <http://rdf.insee.fr/geo/regions-2011.rdf> WHERE { ?r rdf:type insee:Region . } PREFIX nuts: <http://ec.europa.eu/eurostat/ramon/ontologies/geographi PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT ?n FROM <http://ec.europa.eu/eurostat/ramon/rdfdata/nuts2008/> WHERE { ?n rdf:type nuts:NUTSRegion . ?n nuts:level 2^^xsd:integer . ?n nuts:hasParentRegion nuts:FR . } J´erˆome Euzenat Data interlinking 74 / 0
  • 71. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Data transformation PREFIX insee: <http://rdf.insee.fr/ontologie-geo-2006.rdf#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX insee: <http://rdf.insee.fr/ontologie-geo-2006.rdf#> CONSTRUCT { ?r rdf:type nuts:NUTSRegion . ?r nuts:level 2^^xsd:integer . ?r nuts:hasParentRegion nuts:FR . } FROM <http://rdf.insee.fr/geo/regions-2011.rdf> WHERE { ?r rdf:type insee:Region . } J´erˆome Euzenat Data interlinking 75 / 0
  • 72. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools SameAs link generation generation PREFIX insee: <http://rdf.insee.fr/ontologie-geo-2006.rdf#> PREFIX nuts: <http://ec.europa.eu/eurostat/ramon/ontologies/geographi PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> CONSTRUCT { ?r owl:sameAs ?n . } FROM <http://rdf.insee.fr/geo/regions-2011.rdf> FROM <http://ec.europa.eu/eurostat/ramon/rdfdata/nuts2008/> WHERE { ?r rdf:type insee:Region . ?r insee:nom ?l . ?n rdf:type nuts:NUTSRegion . ?n nuts:name ?l . ?n nuts:level 2^^xsd:integer . ?n nuts:hasParentRegion nuts:FR . } J´erˆome Euzenat Data interlinking 76 / 0
  • 73. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Other issue: performances n × m n 3 × m 3 + n 3 × m 3 + n 3 × m 3 10 × 10 = 100 1000 × 1000 = 1000000 100000 × 100000 = 10000000000 J´erˆome Euzenat Data interlinking 77 / 0
  • 74. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Other issue: performances Blocking: index+cluster Dataset 1 Dataset 2 J´erˆome Euzenat Data interlinking 78 / 0
  • 75. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Other issue: performances Blocks can be obtained from: clustering values in index predefined block (based on equality) classes in an ontology (blocks are defined as class expressions) J´erˆome Euzenat Data interlinking 79 / 0
  • 76. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Other issue: evaluation d d interlinking l Reference links evaluation Precision Recall F-measure J´erˆome Euzenat Data interlinking 80 / 0
  • 77. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Other issue: learning d d Training links interlinking l evaluation J´erˆome Euzenat Data interlinking 81 / 0
  • 78. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Conclusion Data interlinking is one of the most critical task in linked data . . . but not only, e.g. smart cities If faces many problems due to: heterogeneity (format, languages, convention) size Interlinking can be based on similarities or keys There is active work to infer such interlinking pattern J´erˆome Euzenat Data interlinking 82 / 0
  • 79. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Further reading T. Heath, C. Bizer, Linked Data: Evolving the Web into a Global Data Space, Morgan & Claypool (US), 2011 http://linkeddatabook.com/ J. Euzenat, P. Shvaiko, Ontology matching, 2nd ed., Springer, Heildelberg (DE), 2013 http://book.ontologymatching.org K. Stefanidis, V. Efthymiou, M. Herschel, V. Christophides, Entity Resolution in the Web of Data, Tutorial, WWW conference, Seoul (KR), 2014 http://www.csd.uoc.gr/~vefthym/er/ Silk http://silk-framework.com/ Alignment API http://alignapi.gforge.inria.fr Al 4 SC http://al4sc.inrialpes.fr J´erˆome Euzenat Data interlinking 83 / 0
  • 80. Data interlinling Similarity-based approach Key-based interlinking Ontology matching & data interlinking Tools Thanks To my colleagues Manuel Atencia, J´erˆome David, Nicolas Guillouet and Fran¸cois Scharffe The Datalift and Lindicle projects The Ready4SmartCities project J´erˆome Euzenat Data interlinking 84 / 0