SlideShare a Scribd company logo
1 of 52
Download to read offline
Tommaso Soru, Edgard Marx, Axel-Cyrille Ngonga Ngomo
AKSW, Department of Computer Science
University of Leipzig, Germany
!
!
!
!
!
May 22, 2015
WWW 2015 — Florence, Italy
ROCKER
A Refinement Operator for Key Discovery
1
Text
State of the Linked Open Data cloud.
353 accessible RDF datasets; ~74 billion triples.
Sources: State of the LOD cloud, LODStats, 2015.
2
Decentral data publication.
Real-world entity “Florence, Italy” is described in:
3
DBpedia
Linked
GeoData
Geo
Names
Unique descriptions of resources.
Entity search.
Data integration.
Linked data compression.
Link discovery.
Question answering.
Data quality.
4
Unique descriptions of resources.
Entity search.
Data integration.
Linked data compression.
Link discovery.
Question answering.
Data quality.
4
Keys.
Background.
5
A key is a set of properties which can distinguish
all instances of a class in a knowledge base.
Background.
5
A key is a set of properties which can distinguish
all instances of a class in a knowledge base.
:Brad_Pitt
:Julia_Roberts
:Oceans_Eleven
:The_Mexican
:hasActor
:hasActor
:hasActor
:hasActor
“Ocean’s Eleven”
“Julia Roberts”
“The Mexican”
“Brad Pitt”
rdfs:label
rdfs:label rdfs:label
rdfs:label
6
A key is a minimal key
if none of its subsets is also a key.
Background.
candidate key distinguishable resources key? min-key?
{rdfs:label} 2 / 2 yes yes
{:hasActor} 1 / 2 no no
{rdfs:label, :hasActor} 2 / 2 yes no
dbpedia-owl:Film
7
A set of properties is called an n-almost-key for a class
if it can distinguish all except n instances of that class.
Background.
:Canada
:Iceland
:United_States
:filmedIn
:Interstellar
:United_States
:United_Kingdom
:filmedIn
:Blade_Runner
:United_States
:United_Kingdom
:filmedIn
:2001_A_Space_Odyssey
:WALL-E
7
A set of properties is called an n-almost-key for a class
if it can distinguish all except n instances of that class.
Background.
:Canada
:Iceland
:United_States
:filmedIn
:Interstellar
:United_States
:United_Kingdom
:filmedIn
:Blade_Runner
:United_States
:United_Kingdom
:filmedIn
:2001_A_Space_Odyssey
:WALL-E
✗
8
ROCKER’s score function.
The score function expresses
the rate of distinguishable instances in a class,
given a set of properties (i.e., a candidate key).
:Interstellar
:Blade_Runner
:2001_A_Space_Odyssey
:WALL-E
✗ } score({: filmedIn})
=
s ∈S :∀ ′s ∈S s ≠ ′s ⇒ discr(s, ′s ,{: filmedIn}){ }
S
= .75
8
ROCKER’s score function.
The score function expresses
the rate of distinguishable instances in a class,
given a set of properties (i.e., a candidate key).
:Interstellar
:Blade_Runner
:2001_A_Space_Odyssey
:WALL-E
✗ }
An n-almost-key has a score of at least	 	 	 	 .α =
S − n
S
score({: filmedIn})
=
s ∈S :∀ ′s ∈S s ≠ ′s ⇒ discr(s, ′s ,{: filmedIn}){ }
S
= .75
Contribution #1
A more complete definition of key.
All object values are considered (e.g., ).
Null values are accepted (e.g., ).
9
:United_States
:WALL-E
:Canada
:Iceland
:United_States
:filmedIn
:Interstellar
:United_States
:United_Kingdom
:filmedIn
:Blade_Runner
:WALL-E
10
Properties of a key.
Key monotonicity.
Adding a property to a key yields another key.
{:p1, :p2, :p3}{:p1, :p2}
⋃ {:p3}
10
Properties of a key.
Key monotonicity.
Adding a property to a key yields another key.
Non-key monotonicity.
Removing a property from a non-key yields another non-key.
{:p1, :p2, :p3}{:p1, :p2}
⋃ {:p3}
{:p1, :p4}{:p1, :p2, :p4}
∖ {:p2}
11
Proposed approach.
We adopt a refinement operator to refine candidates.
{:p1, :p2, :p3}
∅
{:p1, :p3}
{:p1} {:p3}
{:p1, :p2} {:p2, :p3}
{:p2}
12
Proposed approach.
Pro. The score function induces a quasi-ordering ‘≼’
over the set of all candidates.
P≼Q means score(p) ≤ score(q)
12
Proposed approach.
Pro. The score function induces a quasi-ordering ‘≼’
over the set of all candidates.
P≼Q means score(p) ≤ score(q)
Contra. Visiting the refinement tree is an intractable
problem!
n properties
2ⁿ–1 nodes
Solutions to intractability.
Prune branches using key monotonicity:
for all descendants of a key;
for all ancestors of a non-key.
Consider only a subset of popular properties.
Provide a “fast search” option which selects one of
the multiple discovery strategies.
13
Algorithm.
14
Frontier := {∅}
Algorithm.
14
Frontier := {∅}
Top el. score?
Algorithm.
14
Frontier := {∅}
Top el. score?
< α
≥ α
Halt
Sort by score
Algorithm.
14
Frontier := {∅}
Top el. score?
< α
≥ α
Halt
Sort by score
Refine pivot,

remove pivot & add
children to frontier
Algorithm.
14
Frontier := {∅}
Top el. score?
< α
≥ α
Halt
Sort by score
Refine pivot,

remove pivot & add
children to frontier
Has children?
Algorithm.
14
Frontier := {∅}
Top el. score?
< α
≥ α
Halt
Sort by score
Refine pivot,

remove pivot & add
children to frontier
Has children?
Next child
yes
no
Algorithm.
14
Frontier := {∅}
Top el. score?
< α
≥ α
Halt
Sort by score
Refine pivot,

remove pivot & add
children to frontier
Has children?
Next child
Ancestor
of !key?
yes
no
false true
Algorithm.
14
Frontier := {∅}
Top el. score?
< α
≥ α
Halt
Sort by score
Refine pivot,

remove pivot & add
children to frontier
Has children?
Next child
Add to !keys
Ancestor
of !key?
yes
no
false true
yes
Algorithm.
14
Frontier := {∅}
Top el. score?
< α
≥ α
Halt
Sort by score
Refine pivot,

remove pivot & add
children to frontier
Has children?
Next child
Add to !keys
Ancestor
of !key?
Descendant
of key?
yes
no
false true
noyes
Algorithm.
14
Frontier := {∅}
Top el. score?
< α
≥ α
Halt
Sort by score
Refine pivot,

remove pivot & add
children to frontier
Has children?
Next child
Add to keys
Add to !keys
Ancestor
of !key?
Descendant
of key?
yes
no
false true
no
yes
yes
Algorithm.
14
Frontier := {∅}
Top el. score?
< α
≥ α
Halt
Sort by score
Refine pivot,

remove pivot & add
children to frontier
Has children?
Next child
Add to keys
Add to !keys
Ancestor
of !key?
Descendant
of key?
Score?
yes
no
false true
no
no
yes
yes
Algorithm.
14
Frontier := {∅}
Top el. score?
< α
≥ α
Halt
< α
≥ α
Sort by score
Refine pivot,

remove pivot & add
children to frontier
Has children?
Next child
Add to keys
Add to !keys
Ancestor
of !key?
Descendant
of key?
Score?
yes
no
false true
no
no
yes
yes
15
{:p1, :p2, :p3}
∅
{:p1, :p3}
{:p1} {:p3}
{:p1, :p2} {:p2, :p3}
{:p2}
Refinement operator.
frontier
min-keys
max-non-keys
∅
unvisited nodes
visited nodes
15
{:p1, :p2, :p3}
∅
{:p1, :p3}
{:p1} {:p3}
{:p1, :p2} {:p2, :p3}
{:p2}
Refinement operator.
frontier
min-keys
max-non-keys
∅
unvisited nodes
visited nodes
{:p1, :p2, :p3}
15
{:p1, :p2, :p3}
∅
{:p1, :p3}
{:p1} {:p3}
{:p1, :p2} {:p2, :p3}
{:p2}
Refinement operator.
frontier
min-keys
max-non-keys
∅
unvisited nodes
visited nodes
{:p1, :p2, :p3}
{:p1, :p2, :p3}
15
{:p1, :p2, :p3}
∅
{:p1, :p3}
{:p1} {:p3}
{:p1, :p2} {:p2, :p3}
{:p2}
Refinement operator.
frontier
min-keys
max-non-keys
∅
unvisited nodes
visited nodes
{:p1} {:p2} {:p3}
{:p1, :p2, :p3}
{:p1}
{:p2}
15
{:p1, :p2, :p3}
∅
{:p1, :p3}
{:p1} {:p3}
{:p1, :p2} {:p2, :p3}
{:p2}
Refinement operator.
frontier
min-keys
max-non-keys
unvisited nodes
visited nodes
{:p1} {:p2} {:p3}
{:p2} {:p3}
{:p3}
{:p1, :p2, :p3}
{:p1}
{:p2, :p3}
15
{:p1, :p2, :p3}
∅
{:p1, :p3}
{:p1} {:p3}
{:p1, :p2} {:p2, :p3}
{:p2}
Refinement operator.
frontier
min-keys
max-non-keys
unvisited nodes
visited nodes
{:p1} {:p2} {:p3}
{:p3}
{:p2, :p3}
{:p1, :p2, :p3}
{:p2, :p3}
{:p1}
15
{:p1, :p2, :p3}
∅
{:p1, :p3}
{:p1} {:p3}
{:p1, :p2} {:p2, :p3}
{:p2}
Refinement operator.
frontier
min-keys
max-non-keys
unvisited nodes
visited nodes
{:p1} {:p2} {:p3}
{:p3}
{:p2, :p3}
{:p1, :p2, :p3}
{:p2, :p3}
{:p1}
15
{:p1, :p2, :p3}
∅
{:p1, :p3}
{:p1} {:p3}
{:p1, :p2} {:p2, :p3}
{:p2}
Refinement operator.
frontier
min-keys
max-non-keys
unvisited nodes
visited nodes
{:p1} {:p2} {:p3}
{:p2, :p3}
{:p1, :p2, :p3}
{:p2, :p3}
16
Related work on key discovery.
Linkkey (Atencia et al., 2014)
• Tool able to retrieve keys.
• Relies on an incomplete definition of key.
• State of the Art for small datasets.
SAKey (Symeonidou et al., 2014)
• Tool able to retrieve keys and n-almost keys.
• Relies on an incomplete definition of key.
• State of the Art on bigger datasets.
KD2R (Symeonidou et al., 2011)
• Tool able to retrieve keys.
• Relies on an incomplete definition of key.
17
Evaluation.
Runtime.
Memory consumption.
Quality of the keys found.
18
Results – Runtime.
ROCKER Linkkey SAKey
OAEI
Restaurant1 (10
1,880 1,698 1,028
DBpedia
Person Function (10
14,565 OutOfMem 6,221
DBpedia
Career Station (10
79,964 OutOfMem 2,199,854
DBPedia Organisation
Member (10
1,075,679 227,336 OutOfMem
DBpedia
Village (10
4,224,338 OutOfMem OutOfMem
DBpedia
Musical Work (10
2,524,120 OutOfMem OutOfMem
Dataset sizes in triples. Results in milliseconds.
19
Results – RAM consumption.
ROCKER Linkkey SAKey
OAEI
Restaurant1 (10
~5 MB ~2 MB ~2 MB
DBpedia
Person Function (10
2.5 GB > 16 GB 1.8 GB
DBpedia
Career Station (10
3.5 GB > 16 GB 14.0 GB
DBPedia Organisation
Member (10
3.8 GB 14.5 GB > 16 GB
DBpedia
Village (10
4.1 GB > 16 GB > 16 GB
DBpedia
Musical Work (10
5.0 GB > 16 GB > 16 GB
Dataset sizes in triples. Experiments were run on a 16 GB Ubuntu Linux machine.
Runtime by threshold.
20
Retrieve all candidates whose score is above a threshold α.
Results in milliseconds.
Runtime by threshold.
20
Retrieve all candidates whose score is above a threshold α.
α = 1 α = .999
Results in milliseconds.
21
Retrieve all candidates whose score is above a threshold α.
Results for dataset dbpedia:Monument.
Runtime by threshold.
21
Retrieve all candidates whose score is above a threshold α.
Results for dataset dbpedia:Monument.
Runtime by threshold.
runtime (ms)
22
Contributions.
Complete definition of keys by considering multi-
object properties and null values.
More scalability in terms of:
Faster execution on larger datasets.
Less memory consumption.
Running ROCKER without restrictions is guaranteed to
return minimal keys.
23
Info and future work.
ROCKER is part of LIMES – link discovery framework. Its
source code is online at http://github.com/AKSW/rocker.
23
Info and future work.
ROCKER is part of LIMES – link discovery framework. Its
source code is online at http://github.com/AKSW/rocker.
A demo is currently under development, to show how ROCKER
can improve data quality by searching for n-almost-keys.
23
Info and future work.
ROCKER is part of LIMES – link discovery framework. Its
source code is online at http://github.com/AKSW/rocker.
A demo is currently under development, to show how ROCKER
can improve data quality by searching for n-almost-keys.
We will evaluate ROCKER inside of the link discovery
workflow, i.e.: How can keys help find good link specifications?
Tommaso Soru
PhD student at University of Leipzig
Room P905, Fakultät für Mathematik und Informatik
Augustusplatz 10, D-04109 Leipzig, Germany
!
tsoru@informatik.uni-leipzig.de
http://tommaso-soru.it
!
Proceedings
http://www.www2015.it/documents/proceedings/proceedings/p1025.pdf
24

More Related Content

Similar to Slides for "ROCKER – A Refinement Operator for Key Discovery", WWW2015

Machine learning on Go Code
Machine learning on Go CodeMachine learning on Go Code
Machine learning on Go Codesource{d}
 
Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014PyData
 
Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout source{d}
 
Kotlin: forse è la volta buona (Trento)
Kotlin: forse è la volta buona (Trento)Kotlin: forse è la volta buona (Trento)
Kotlin: forse è la volta buona (Trento)Davide Cerbo
 
Machine Learning on Code - SF meetup
Machine Learning on Code - SF meetupMachine Learning on Code - SF meetup
Machine Learning on Code - SF meetupsource{d}
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...MongoDB
 
Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017
Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017 Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017
Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017 Codemotion
 
05-transformers.pdf
05-transformers.pdf05-transformers.pdf
05-transformers.pdfChaoYang81
 
The Ring programming language version 1.7 book - Part 14 of 196
The Ring programming language version 1.7 book - Part 14 of 196The Ring programming language version 1.7 book - Part 14 of 196
The Ring programming language version 1.7 book - Part 14 of 196Mahmoud Samir Fayed
 
Persistent Data Structures - partial::Conf
Persistent Data Structures - partial::ConfPersistent Data Structures - partial::Conf
Persistent Data Structures - partial::ConfIvan Vergiliev
 
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Simultaneous,Deep,Transfer,Across, Domains,and,Tasks
Simultaneous,Deep,Transfer,Across, Domains,and,TasksSimultaneous,Deep,Transfer,Across, Domains,and,Tasks
Simultaneous,Deep,Transfer,Across, Domains,and,TasksAlejandro Cartas
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.pptRahulTr22
 
Data science programming .ppt
Data science programming .pptData science programming .ppt
Data science programming .pptGanesh E
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.pptkalai75
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.pptAravind Reddy
 

Similar to Slides for "ROCKER – A Refinement Operator for Key Discovery", WWW2015 (20)

Machine learning on Go Code
Machine learning on Go CodeMachine learning on Go Code
Machine learning on Go Code
 
Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014
 
Quepy
QuepyQuepy
Quepy
 
Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout
 
Kotlin: forse è la volta buona (Trento)
Kotlin: forse è la volta buona (Trento)Kotlin: forse è la volta buona (Trento)
Kotlin: forse è la volta buona (Trento)
 
Machine Learning on Code - SF meetup
Machine Learning on Code - SF meetupMachine Learning on Code - SF meetup
Machine Learning on Code - SF meetup
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
 
Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017
Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017 Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017
Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017
 
05-transformers.pdf
05-transformers.pdf05-transformers.pdf
05-transformers.pdf
 
The Ring programming language version 1.7 book - Part 14 of 196
The Ring programming language version 1.7 book - Part 14 of 196The Ring programming language version 1.7 book - Part 14 of 196
The Ring programming language version 1.7 book - Part 14 of 196
 
Persistent Data Structures - partial::Conf
Persistent Data Structures - partial::ConfPersistent Data Structures - partial::Conf
Persistent Data Structures - partial::Conf
 
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
 
Simultaneous,Deep,Transfer,Across, Domains,and,Tasks
Simultaneous,Deep,Transfer,Across, Domains,and,TasksSimultaneous,Deep,Transfer,Across, Domains,and,Tasks
Simultaneous,Deep,Transfer,Across, Domains,and,Tasks
 
Python slide
Python slidePython slide
Python slide
 
Data Science
Data Science Data Science
Data Science
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
Data science programming .ppt
Data science programming .pptData science programming .ppt
Data science programming .ppt
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
data science
data sciencedata science
data science
 

Recently uploaded

Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Tamer Koksalan, PhD
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxnoordubaliya2003
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 

Recently uploaded (20)

Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptx
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 

Slides for "ROCKER – A Refinement Operator for Key Discovery", WWW2015

  • 1. Tommaso Soru, Edgard Marx, Axel-Cyrille Ngonga Ngomo AKSW, Department of Computer Science University of Leipzig, Germany ! ! ! ! ! May 22, 2015 WWW 2015 — Florence, Italy ROCKER A Refinement Operator for Key Discovery 1
  • 2. Text State of the Linked Open Data cloud. 353 accessible RDF datasets; ~74 billion triples. Sources: State of the LOD cloud, LODStats, 2015. 2
  • 3. Decentral data publication. Real-world entity “Florence, Italy” is described in: 3 DBpedia Linked GeoData Geo Names
  • 4. Unique descriptions of resources. Entity search. Data integration. Linked data compression. Link discovery. Question answering. Data quality. 4
  • 5. Unique descriptions of resources. Entity search. Data integration. Linked data compression. Link discovery. Question answering. Data quality. 4 Keys.
  • 6. Background. 5 A key is a set of properties which can distinguish all instances of a class in a knowledge base.
  • 7. Background. 5 A key is a set of properties which can distinguish all instances of a class in a knowledge base. :Brad_Pitt :Julia_Roberts :Oceans_Eleven :The_Mexican :hasActor :hasActor :hasActor :hasActor “Ocean’s Eleven” “Julia Roberts” “The Mexican” “Brad Pitt” rdfs:label rdfs:label rdfs:label rdfs:label
  • 8. 6 A key is a minimal key if none of its subsets is also a key. Background. candidate key distinguishable resources key? min-key? {rdfs:label} 2 / 2 yes yes {:hasActor} 1 / 2 no no {rdfs:label, :hasActor} 2 / 2 yes no dbpedia-owl:Film
  • 9. 7 A set of properties is called an n-almost-key for a class if it can distinguish all except n instances of that class. Background. :Canada :Iceland :United_States :filmedIn :Interstellar :United_States :United_Kingdom :filmedIn :Blade_Runner :United_States :United_Kingdom :filmedIn :2001_A_Space_Odyssey :WALL-E
  • 10. 7 A set of properties is called an n-almost-key for a class if it can distinguish all except n instances of that class. Background. :Canada :Iceland :United_States :filmedIn :Interstellar :United_States :United_Kingdom :filmedIn :Blade_Runner :United_States :United_Kingdom :filmedIn :2001_A_Space_Odyssey :WALL-E ✗
  • 11. 8 ROCKER’s score function. The score function expresses the rate of distinguishable instances in a class, given a set of properties (i.e., a candidate key). :Interstellar :Blade_Runner :2001_A_Space_Odyssey :WALL-E ✗ } score({: filmedIn}) = s ∈S :∀ ′s ∈S s ≠ ′s ⇒ discr(s, ′s ,{: filmedIn}){ } S = .75
  • 12. 8 ROCKER’s score function. The score function expresses the rate of distinguishable instances in a class, given a set of properties (i.e., a candidate key). :Interstellar :Blade_Runner :2001_A_Space_Odyssey :WALL-E ✗ } An n-almost-key has a score of at least .α = S − n S score({: filmedIn}) = s ∈S :∀ ′s ∈S s ≠ ′s ⇒ discr(s, ′s ,{: filmedIn}){ } S = .75
  • 13. Contribution #1 A more complete definition of key. All object values are considered (e.g., ). Null values are accepted (e.g., ). 9 :United_States :WALL-E :Canada :Iceland :United_States :filmedIn :Interstellar :United_States :United_Kingdom :filmedIn :Blade_Runner :WALL-E
  • 14. 10 Properties of a key. Key monotonicity. Adding a property to a key yields another key. {:p1, :p2, :p3}{:p1, :p2} ⋃ {:p3}
  • 15. 10 Properties of a key. Key monotonicity. Adding a property to a key yields another key. Non-key monotonicity. Removing a property from a non-key yields another non-key. {:p1, :p2, :p3}{:p1, :p2} ⋃ {:p3} {:p1, :p4}{:p1, :p2, :p4} ∖ {:p2}
  • 16. 11 Proposed approach. We adopt a refinement operator to refine candidates. {:p1, :p2, :p3} ∅ {:p1, :p3} {:p1} {:p3} {:p1, :p2} {:p2, :p3} {:p2}
  • 17. 12 Proposed approach. Pro. The score function induces a quasi-ordering ‘≼’ over the set of all candidates. P≼Q means score(p) ≤ score(q)
  • 18. 12 Proposed approach. Pro. The score function induces a quasi-ordering ‘≼’ over the set of all candidates. P≼Q means score(p) ≤ score(q) Contra. Visiting the refinement tree is an intractable problem! n properties 2ⁿ–1 nodes
  • 19. Solutions to intractability. Prune branches using key monotonicity: for all descendants of a key; for all ancestors of a non-key. Consider only a subset of popular properties. Provide a “fast search” option which selects one of the multiple discovery strategies. 13
  • 22. Algorithm. 14 Frontier := {∅} Top el. score? < α ≥ α Halt Sort by score
  • 23. Algorithm. 14 Frontier := {∅} Top el. score? < α ≥ α Halt Sort by score Refine pivot,
 remove pivot & add children to frontier
  • 24. Algorithm. 14 Frontier := {∅} Top el. score? < α ≥ α Halt Sort by score Refine pivot,
 remove pivot & add children to frontier Has children?
  • 25. Algorithm. 14 Frontier := {∅} Top el. score? < α ≥ α Halt Sort by score Refine pivot,
 remove pivot & add children to frontier Has children? Next child yes no
  • 26. Algorithm. 14 Frontier := {∅} Top el. score? < α ≥ α Halt Sort by score Refine pivot,
 remove pivot & add children to frontier Has children? Next child Ancestor of !key? yes no false true
  • 27. Algorithm. 14 Frontier := {∅} Top el. score? < α ≥ α Halt Sort by score Refine pivot,
 remove pivot & add children to frontier Has children? Next child Add to !keys Ancestor of !key? yes no false true yes
  • 28. Algorithm. 14 Frontier := {∅} Top el. score? < α ≥ α Halt Sort by score Refine pivot,
 remove pivot & add children to frontier Has children? Next child Add to !keys Ancestor of !key? Descendant of key? yes no false true noyes
  • 29. Algorithm. 14 Frontier := {∅} Top el. score? < α ≥ α Halt Sort by score Refine pivot,
 remove pivot & add children to frontier Has children? Next child Add to keys Add to !keys Ancestor of !key? Descendant of key? yes no false true no yes yes
  • 30. Algorithm. 14 Frontier := {∅} Top el. score? < α ≥ α Halt Sort by score Refine pivot,
 remove pivot & add children to frontier Has children? Next child Add to keys Add to !keys Ancestor of !key? Descendant of key? Score? yes no false true no no yes yes
  • 31. Algorithm. 14 Frontier := {∅} Top el. score? < α ≥ α Halt < α ≥ α Sort by score Refine pivot,
 remove pivot & add children to frontier Has children? Next child Add to keys Add to !keys Ancestor of !key? Descendant of key? Score? yes no false true no no yes yes
  • 32. 15 {:p1, :p2, :p3} ∅ {:p1, :p3} {:p1} {:p3} {:p1, :p2} {:p2, :p3} {:p2} Refinement operator. frontier min-keys max-non-keys ∅ unvisited nodes visited nodes
  • 33. 15 {:p1, :p2, :p3} ∅ {:p1, :p3} {:p1} {:p3} {:p1, :p2} {:p2, :p3} {:p2} Refinement operator. frontier min-keys max-non-keys ∅ unvisited nodes visited nodes
  • 34. {:p1, :p2, :p3} 15 {:p1, :p2, :p3} ∅ {:p1, :p3} {:p1} {:p3} {:p1, :p2} {:p2, :p3} {:p2} Refinement operator. frontier min-keys max-non-keys ∅ unvisited nodes visited nodes {:p1, :p2, :p3}
  • 35. {:p1, :p2, :p3} 15 {:p1, :p2, :p3} ∅ {:p1, :p3} {:p1} {:p3} {:p1, :p2} {:p2, :p3} {:p2} Refinement operator. frontier min-keys max-non-keys ∅ unvisited nodes visited nodes {:p1} {:p2} {:p3} {:p1, :p2, :p3}
  • 36. {:p1} {:p2} 15 {:p1, :p2, :p3} ∅ {:p1, :p3} {:p1} {:p3} {:p1, :p2} {:p2, :p3} {:p2} Refinement operator. frontier min-keys max-non-keys unvisited nodes visited nodes {:p1} {:p2} {:p3} {:p2} {:p3} {:p3} {:p1, :p2, :p3}
  • 37. {:p1} {:p2, :p3} 15 {:p1, :p2, :p3} ∅ {:p1, :p3} {:p1} {:p3} {:p1, :p2} {:p2, :p3} {:p2} Refinement operator. frontier min-keys max-non-keys unvisited nodes visited nodes {:p1} {:p2} {:p3} {:p3} {:p2, :p3} {:p1, :p2, :p3} {:p2, :p3}
  • 38. {:p1} 15 {:p1, :p2, :p3} ∅ {:p1, :p3} {:p1} {:p3} {:p1, :p2} {:p2, :p3} {:p2} Refinement operator. frontier min-keys max-non-keys unvisited nodes visited nodes {:p1} {:p2} {:p3} {:p3} {:p2, :p3} {:p1, :p2, :p3} {:p2, :p3}
  • 39. {:p1} 15 {:p1, :p2, :p3} ∅ {:p1, :p3} {:p1} {:p3} {:p1, :p2} {:p2, :p3} {:p2} Refinement operator. frontier min-keys max-non-keys unvisited nodes visited nodes {:p1} {:p2} {:p3} {:p2, :p3} {:p1, :p2, :p3} {:p2, :p3}
  • 40. 16 Related work on key discovery. Linkkey (Atencia et al., 2014) • Tool able to retrieve keys. • Relies on an incomplete definition of key. • State of the Art for small datasets. SAKey (Symeonidou et al., 2014) • Tool able to retrieve keys and n-almost keys. • Relies on an incomplete definition of key. • State of the Art on bigger datasets. KD2R (Symeonidou et al., 2011) • Tool able to retrieve keys. • Relies on an incomplete definition of key.
  • 42. 18 Results – Runtime. ROCKER Linkkey SAKey OAEI Restaurant1 (10 1,880 1,698 1,028 DBpedia Person Function (10 14,565 OutOfMem 6,221 DBpedia Career Station (10 79,964 OutOfMem 2,199,854 DBPedia Organisation Member (10 1,075,679 227,336 OutOfMem DBpedia Village (10 4,224,338 OutOfMem OutOfMem DBpedia Musical Work (10 2,524,120 OutOfMem OutOfMem Dataset sizes in triples. Results in milliseconds.
  • 43. 19 Results – RAM consumption. ROCKER Linkkey SAKey OAEI Restaurant1 (10 ~5 MB ~2 MB ~2 MB DBpedia Person Function (10 2.5 GB > 16 GB 1.8 GB DBpedia Career Station (10 3.5 GB > 16 GB 14.0 GB DBPedia Organisation Member (10 3.8 GB 14.5 GB > 16 GB DBpedia Village (10 4.1 GB > 16 GB > 16 GB DBpedia Musical Work (10 5.0 GB > 16 GB > 16 GB Dataset sizes in triples. Experiments were run on a 16 GB Ubuntu Linux machine.
  • 44. Runtime by threshold. 20 Retrieve all candidates whose score is above a threshold α. Results in milliseconds.
  • 45. Runtime by threshold. 20 Retrieve all candidates whose score is above a threshold α. α = 1 α = .999 Results in milliseconds.
  • 46. 21 Retrieve all candidates whose score is above a threshold α. Results for dataset dbpedia:Monument. Runtime by threshold.
  • 47. 21 Retrieve all candidates whose score is above a threshold α. Results for dataset dbpedia:Monument. Runtime by threshold. runtime (ms)
  • 48. 22 Contributions. Complete definition of keys by considering multi- object properties and null values. More scalability in terms of: Faster execution on larger datasets. Less memory consumption. Running ROCKER without restrictions is guaranteed to return minimal keys.
  • 49. 23 Info and future work. ROCKER is part of LIMES – link discovery framework. Its source code is online at http://github.com/AKSW/rocker.
  • 50. 23 Info and future work. ROCKER is part of LIMES – link discovery framework. Its source code is online at http://github.com/AKSW/rocker. A demo is currently under development, to show how ROCKER can improve data quality by searching for n-almost-keys.
  • 51. 23 Info and future work. ROCKER is part of LIMES – link discovery framework. Its source code is online at http://github.com/AKSW/rocker. A demo is currently under development, to show how ROCKER can improve data quality by searching for n-almost-keys. We will evaluate ROCKER inside of the link discovery workflow, i.e.: How can keys help find good link specifications?
  • 52. Tommaso Soru PhD student at University of Leipzig Room P905, Fakultät für Mathematik und Informatik Augustusplatz 10, D-04109 Leipzig, Germany ! tsoru@informatik.uni-leipzig.de http://tommaso-soru.it ! Proceedings http://www.www2015.it/documents/proceedings/proceedings/p1025.pdf 24