SlideShare une entreprise Scribd logo
1  sur  336
Télécharger pour lire hors ligne
The Linked Data Life-Cycle
Jens Lehmann

Quan Nguyen
Sebastian Hellmann
Claus Stadler

Lorenz Bühmann

contributors:

Sören Auer
Anja Jentzsch
Christina Unger

Richard Cyganiak
Dimitris Kontokostas

Daniel Gerber
Axel Ngonga

2013-08-23
Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

1 / 252
Outline
1

Introduction to Linked Data

2

Linked Dataset Example: DBpedia

3

Linked Data Life-Cycle Overview

4

Knowledge Extraction

5

Data Integration / Linking

Interlinking
/ Fusing
Manual
revision/
Authoring

Classification/
Enrichment

Linked Data
Lifecycle

Storage/
Querying

Evolution /
Repair

Extraction

6

Enrichment

7

Repair

8

Quality
Analysis

Knowledge Base Exploration / Querying

Lehmann, Bühmann (Univ. Leipzig)

Search/
Browsing/
Exploration

The Linked Data Life-Cycle

2013-08-23

2 / 252
Outline
1

Introduction to Linked Data

2

Linked Dataset Example: DBpedia

3

Linked Data Life-Cycle Overview

4

Knowledge Extraction

5

Data Integration / Linking

Interlinking
/ Fusing
Manual
revision/
Authoring

Classification/
Enrichment

Linked Data
Lifecycle

Storage/
Querying

Evolution /
Repair

Extraction

6

Enrichment

7

Repair

8

Quality
Analysis

Knowledge Base Exploration / Querying

Lehmann, Bühmann (Univ. Leipzig)

Search/
Browsing/
Exploration

The Linked Data Life-Cycle

2013-08-23

3 / 252
The Linked Data Principles

The term Linked Data refers to a set of best practices for publishing and
interlinking structured data on the Web.

Linked Data principles:
1

Use URIs as names for things.

2

Use HTTP URIs, so that people can look up those names.

3

When someone looks up a URI, provide useful information, using the
standards (RDF, SPARQL).

4

Include links to other URIs, so that they can discover more things.

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

4 / 252
LOD Cloud

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

5 / 252
Linked Data Principles Detailed: 1 + 2

1

URI references to identify not just Web documents and digital
content, but also real world objects and abstract concepts
tangible things: people, places
abstract things: relationship type of knowing somebody

2

HTTP URIs enable re-use of Web architecture  Linked Data gives
emphasis to the Web in Semantic Web
Resource dereferencing
Re-use of standard tools for security, load-balancing etc.

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

6 / 252
Principles Detailed: 3 Content Negotiation
Humans and machines should be able to retrieve appropirate
representations of resources:

machines

Lehmann, Bühmann (Univ. Leipzig)

HTML for humans, RDF for

The Linked Data Life-Cycle

2013-08-23

7 / 252
Principles Detailed: 3 Content Negotiation
Humans and machines should be able to retrieve appropirate
representations of resources:

machines

HTML for humans, RDF for

Achievable using an HTTP mechanism called

content negotiation

Basic idea: HTTP client sends HTTP headers with each request to
indicate what kinds of documents they prefer
Servers can inspect headers and select appropriate response

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

7 / 252
Principles Detailed: 3 Content Negotiation
Humans and machines should be able to retrieve appropirate
representations of resources:

machines

HTML for humans, RDF for

Achievable using an HTTP mechanism called

content negotiation

Basic idea: HTTP client sends HTTP headers with each request to
indicate what kinds of documents they prefer
Servers can inspect headers and select appropriate response
Two strategies:

303 URIs
Hash URIs

Both ensure that objects and the documents that describe them are
not confused + humans and machines can retrieve appropriate
representations

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

7 / 252
303 URIs
303 Redirect:

instead of sending the object itself over the network,

the server responds to the client with the HTTP response code

303

See Other and the URI of a Web document which describes the
real-world object

Second step: client dereferences new URI and gets a Web document
describing the real-world object

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

8 / 252
Hash URIs

Hash URI strategy builds on characteristic that URIs may contain a
special part (

fragment identier) separated from their base part by a

hash symbol (#)
HTTP protocol requires the fragment part to be stripped o before
requesting the URI from the server

→

a URI that includes a hash cannot be retrieved directly and

therefore does not necessarily identify a Web document

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

9 / 252
Hash versus 303

Hash Uris

(+) Reduced number of necessary HTTP round-trips → reduces access
latency

(-) Descriptions of all resources sharing the same non-fragment URI
part are always returned to the client together

→

can lead to large

amounts of data being unnecessarily transmitted to the client

303 Uris

(+) Flexible because the redirection target can be congured
separately for each resource (usually points to a single document for
each resource, but could also summarise several resources)

(-) Requires two HTTP requests to retrieve a single description of a
real-world object

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

10 / 252
Principles Detailed: 4 Links

If an RDF triple connects URIs in dierent namespaces/datasets, is is
called a

link (no unique syntactical denition of link

exists)

Basic idea of Linked Data: apply the general hyperlink-based
architecture of the World Wide Web to the task of sharing structured
data on global scale
Research challenge: ecient creation of links with high precision and
recall

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

11 / 252
Why Linked Data?
Problem:

Try to search for these things on the current Web:

Apartments near German-Russian bilingual childcare in Leipzig.
ERP service providers with oces in Vienna and London.
Researchers working on multimedia topics in Eastern Europe.

Information is available on the Web, but opaque to current Web
search.

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

12 / 252
Why Linked Data?
Problem:

Try to search for these things on the current Web:

Apartments near German-Russian bilingual childcare in Leipzig.
ERP service providers with oces in Vienna and London.
Researchers working on multimedia topics in Eastern Europe.

Information is available on the Web, but opaque to current Web
search.
Solution: complement text on Web pages with structured linked open data
 intelligently combine/integrate such structured information from
dierent sources:

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

13 / 252
How to get there?

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

14 / 252
Tim Berners-Lee's 5-star plan

Tim Berners-Lee's 5-star plan for an open web of data
Make data available on the Web under an open license
Make it available as structured data
Use a non-proprietary format
Use URIs to identify things
Link your data to other people's data to provide context

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

15 / 252
The 0th star
Data catalog with good metadata
Make your data ndable

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

16 / 252
Data on the Web, Open License

���������� ���� ��������

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

17 / 252
Data on the Web, Open License

Open vs. Closed:
Data used to be closed by default
In the future, it may be open by default.

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

18 / 252
Data on the Web, Open License
Publishers: sharing data to make it more visible

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

19 / 252
Data on the Web, Open License
E-Commerce: Data sharing for increasing trac

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

20 / 252
Data on the Web, Open License
Community: Collaboratively created databases

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

21 / 252
Good reasons against opening data

Privacy
Competitive advantage
Producing data and charging for it as business model
Can't get license from upstream

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

22 / 252
Structured Data

Enabling re-use:
Delivering data to end users in dierent forms
Combining data with other data
3rd party analysis of data

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

23 / 252
Structured Data

Formats:
Good for re-use / Structured: MS Excel, CSV, XML, JSON, Microdata
Not so good for re-use: Pure websites, MS Word
Bad for re-use: PDF
Really bad for re-use: Only charts/maps without numbers

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

24 / 252
�������� ��������������

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

25 / 252
Non-Proprietary Formats

Specialist tools often have specialist formats
Few people have the tools
Expensive
Dicult to re-use
(Geospatial tools, statistics packages, etc.)

Non-proprietary:
CSV (dead simple)
XML
JSON
RDF (good for 4+5 stars)

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

26 / 252
URIs as Identiers
������������������������������������������������������������������������

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

27 / 252
URIs as Identiers
�������������������������������������������������������

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

28 / 252
URIs as Identiers

URI-Design: prefer stable, implementation independent URIs

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

29 / 252
URIs as Identiers

Turning local identiers into URIsWhy?
Make them globally unique
Clarify auhority
Make them resolvable
Make them linkable

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

30 / 252
Links to Other Data
Hyperlinks are the soul of the Web. The Web of Data is no dierent.

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

31 / 252
Links to Other Data
Hyperlinks are the soul of the Web. The Web of Data is no dierent.

���� �����

������� �����������������������������

��������

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

31 / 252
Summary
Linked Data Principles:
1 Use URIs to name things (not only documents, but also people,
locations, concepts, etc.)
2

To enable agents (human users and machine agents alike) to look up
those names,

3

use HTTP URIs

When someone looks up a URI,

provide useful information

(structured data in RDF, SPARQL).
4

Include

links to other URIs allowing agents to discover more things

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

32 / 252
Summary
Linked Data Principles:
1 Use URIs to name things (not only documents, but also people,
locations, concepts, etc.)
2

To enable agents (human users and machine agents alike) to look up
those names,

3

use HTTP URIs

When someone looks up a URI,

provide useful information

(structured data in RDF, SPARQL).

links to other URIs allowing agents to discover more things
5-Star-Data:
4

Include

Five-star plan for realising an emerging web of data, dataset by
dataset
2 stars: re-usable data
3 stars: open standards
4+5 stars: connect data silos
Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

32 / 252
Outline
1

Introduction to Linked Data

2

Linked Dataset Example: DBpedia

3

Linked Data Life-Cycle Overview

4

Knowledge Extraction

5

Data Integration / Linking

Interlinking
/ Fusing
Manual
revision/
Authoring

Classification/
Enrichment

Linked Data
Lifecycle

Storage/
Querying

Evolution /
Repair

Extraction

6

Enrichment

7

Repair

8

Quality
Analysis

Knowledge Base Exploration / Querying

Lehmann, Bühmann (Univ. Leipzig)

Search/
Browsing/
Exploration

The Linked Data Life-Cycle

2013-08-23

33 / 252
DBpedia

Community eort to extract structured information from Wikipedia
and to make this information available on the Web
Allows to ask sophisticated queries against Wikipedia, and to link
other data sets on the Web to Wikipedia data

Semi-structured Wiki markup

Lehmann, Bühmann (Univ. Leipzig)

→

structured information

The Linked Data Life-Cycle

2013-08-23

34 / 252
Wikipedia Limitations

Simple Questions  hard to answer with Wikipedia:
What have Innsbruck and Leipzig in common?
Who are mayors of central European towns elevated more than
1000m?
Which movies are starring both Brad Pitt and Angelina Jolie?
All soccer players, who played as goalkeeper for a club that has a
stadium with more than 40.000 seats and who are born in a country
with more than 10 million inhabitants

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

35 / 252
Structure in Wikipedia
Title
Abstract
Infoboxes
Geo-coordinates
Categories
Images
Links
other language versions
other Wikipedia pages
To the Web
Redirects
Disambiguation

...

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

36 / 252
DBpedia Information Extraction Framework

DBpedia Information Extraction Framework (DIEF)
Started in 2007
Hosted on Sourceforge and Github
Initially written in PHP but fully re-written Written in Scala and Java
Around 40 Contributors
See

https://www.ohloh.net/p/dbpedia

for detailed overview

Can potentially be adapted to other MediaWikis
Currently Wiktionary

Lehmann, Bühmann (Univ. Leipzig)

http://wiktionary.dbpedia.org

The Linked Data Life-Cycle

2013-08-23

37 / 252
DIEF - Overview

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

38 / 252
DIEF - Raw Infobox Extractor
WikiText syntax
{{Infobox Korean settlement
|title = Busan Metropolitan City
...
|area_km2 = 763.46
|pop = 3635389
|region = [[Yeongnam]]
}}

RDF serialization
dbp:Busan dbp:title Busan Metropolitan City
dbp:Busan dbp:area_km2 763.46^xsd:oat
dbp:Busan dbp:pop 3635389^xsd:int
dbp:Busan dbp:region dbp:Yeongnam

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

39 / 252
DIEF - Raw Infobox Extractor/Diversity

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

40 / 252
DIEF - Raw Infobox extractor/Diversity

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

41 / 252
DIEF - Mapping-Based Infobox Extractor

Cleaner data:
Combine what belongs together (birth_place, birthplace)
Separate what is dierent (bornIn, birthplace)
Correct handling of datatypes
Mappings Wiki:

http://mappings.dbpedia.org
Everybody can contribute to new mappings or improve existing ones

≈

170 editors

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

42 / 252
DIEF - Mapping-Based Infobox Extractor

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

43 / 252
URI/IRI schemes
http://{lang.}dbpedia.org is the main domain
For every article there exists a DBpedia resource in the form:
http://lang.dbpedia.org/resource/{ArticleName}
Properties from the raw infobox extractor use the
http://{lang.}dbpedia.org/property/namespace
Ontology is global for all languages and under
http://dbpedia.org/ontology/namespace
Note: that for English language no language code is used
http://dbpedia.org as main domain
http://dbpedia.org/resource/{title} for articles
http://dbpedia.org/property/{title} for properties

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

44 / 252
Linked Data Publication via 303 Redirects
http://dbpedia.org/resource/Dresden

- URI of the city of

Dresden

http://dbpedia.org/page/Dresden

- information resource

describing the city of Dresden in HTML format

http://dbpedia.org/data/Dresden

- information resource

describing the city of Dresden in RDF/XML format
further formats supported,
e.g.

http://dbpedia.org/data/Dresden.n3

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

for N3

2013-08-23

45 / 252
DBpedia Links
Data set

Predicate

Amsterdam Museum

owl:sameAs

BBC Wildlife Finder

owl:sameAs

Book Mashup

rdf:type

Count

Tool

627

S

444

S

9 100

owl:sameAs
Bricklink

dc:publisher

10 100

CORDIS

owl:sameAs

314

S

Dailymed

owl:sameAs

894

S

DBLP Bibliography

owl:sameAs

196

S

DBTune

owl:sameAs

838

S

Diseasome

owl:sameAs

2 300

S

Drugbank

owl:sameAs

4 800

S

EUNIS

owl:sameAs

3 100

S

Eurostat (Linked Stats)

owl:sameAs

253

S

Eurostat (WBSG)

owl:sameAs

137

CIA World Factbook

owl:sameAs

545

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

S
2013-08-23

46 / 252
DBpedia Links
Data set

Predicate

ickr wrappr

dbp:hasPhoto-

Count

Tool

3 800 000

C

3 600 000

C

Collection
Freebase

owl:sameAs

GADM

owl:sameAs

1 900

GeoNames

owl:sameAs

86 500

S

GeoSpecies

owl:sameAs

16 000

S

GHO

owl:sameAs

196

L

Project Gutenberg

owl:sameAs

2 500

S

Italian Public Schools

owl:sameAs

5 800

S

LinkedGeoData

owl:sameAs

103 600

S

LinkedMDB

owl:sameAs

13 800

S

MusicBrainz

owl:sameAs

23 000

New York Times

owl:sameAs

9 700

OpenCyc

owl:sameAs

27 100

C

OpenEI (Open Energy)

owl:sameAs

678

S

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

47 / 252
DBpedia Links
Data set

Predicate

Revyu

owl:sameAs

6

Sider

owl:sameAs

2 000

TCMGeneDIT

owl:sameAs

904

UMBEL

rdf:type

US Census

owl:sameAs

WikiCompany

owl:sameAs

WordNet

dbp:wordnet_type

YAGO2

rdf:type

Sum

Count

Tool

S

896 400
12 600
8 300
467 100
18 100 000
27 211 732

(S: Silk, L: LIMES, C: custom script, missing: no regeneration)

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

48 / 252
DBpedia Links - Query Example
Compare funding per year (from FTS) and country with the gross domestic
product of a country (from DBpedia)
SELECT

∗

{

{

SELECT ? f t s y e a r

? dbpcountry

? com

rdf : type

? com

fts

? year

fts

−o : y e a r

? year

rdfs : label

(SUM( ? amount )

−o : Commitment
.

fts

? ftscountry

o w l : sameAs

SELECT ? d b p c o u n t r y

? dbpcountry

? gdpyear

.

}

? gdpnominal

.
.

{

? dbpcountry

rdf : type

? dbpcountry

dbp : g d p N o m i n a l

? dbpcountry
}

{

.

? ftsyear

−o : d e t a i l A m o u n t ? amount .
? b e n e f i t f t s −o : b e n e f i c i a r y ? b e n e f i c i a r y
? b e n e f i c i a r y f t s −o : c o u n t r y ? f t s c o u n t r y
? benefit

AS ? f u n d i n g )

.

d bo : C o u n t r y

dbp : g d p N o m i n a l Y e a r

}

{

.

? gdpnominal
? gdpyear

.
.

}

FILTER

((? ftsyear

Lehmann, Bühmann (Univ. Leipzig)

=

s t r (? gdpyear ) )

}

The Linked Data Life-Cycle

2013-08-23

49 / 252
Infrastructure
DBpedia has two extraction modes:
Wikipedia-database-dump-based extraction
DBpedia Live synchronisation (more later)
DBpedia Dumps:
The DBpedia Dump archive is located in:

http://downloads.dbpedia.org/
Latest downloads is described in: http://dbpedia.org/Downloads
Ocial Endpoint (by OpenLink): http://dbpedia.org/sparql

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

50 / 252
Query Answering

Back to our Wikipedia questions:
What have Innsbruck and Leipzig in common?
Who are mayors of central European towns elevated more than
1000m?
Which movies are starring both Brad Pitt and Angelina Jolie?
All soccer players, who played as goalkeeper for a club that has a
stadium with more than 40.000 seats and who are born in a country
with more than 10 million inhabitants
Using the data extracted from Wikipedia and the public SPARQL endpoint
DBpedia can answer these questions.

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

51 / 252
DBpedia Live

DBpedia dumps are generated on a bi-annual basis
Wikipedia has around 100,000  150,000 page edits per day
DBpedia Live pulls page updates in real-time and extraction results
update the triple store
In practice, a 5 minute update delay increases performance by 15%
Links

http://live.dbpedia.org/sparql
Documentation: http://wiki.dbpedia.org/DBpediaLive
Statistics: http://live.dbpedia.org/LiveStats/
SPARQL Endpoint:

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

52 / 252
DBpedia Live - Overview

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

53 / 252
DBpedia Internationalization (I18n)

DBpedia Internationalization Committee founded:

http://wiki.dbpedia.org/Internationalization
Available DBpedia language editions in:
Korean, Greek, German, Polish, Russian, Dutch, Portuguese, Spanish,
Italian, Japanese, French
Use the corresponding Wikipedia language edition for input

Mappings available for 23 languages

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

54 / 252
DBpedia I18n - Overview

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

55 / 252
Applications: Disambiguation

Named entity recognition and disambiguation Tools such as:

DBpedia

Spotlight, AlchemyAPI, Semantic API, Open Calais, Zemanta and Apache
Stanbol

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

56 / 252
Applications: Question Answering

DBpedia is the primary target for several QA systems in the Question
Answering over Linked Data (QALD) workshop series
IBM Watson relied also on DBpedia

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

57 / 252
Applications: Faceted Browsing

Neofonie Browser
gFacet
OpenLink faceted browser (fct)

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

58 / 252
Applications: Search and Querying

Query Builder
RelFinder
SemLens

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

59 / 252
Applications: Digital Libraries  Archives

Virtual International Authority Files (VIAF) project as Linked Data
VIAF added a total of 250,000 reciprocal authority links to Wikipedia.

DBpedia can also provide:
Context information for bibliographic and archive records (e.g. an
author's demographics, a lm's homepage, an image etc.)
Stable and curated identiers for linking.
The broad range of Wikipedia topics can form the basis for a thesaurus
for subject indexing.

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

60 / 252
Applications: DBpedia Mobile

DBpedia Mobile is a location-centric DBpedia client application for mobile
devices consisting of a map view, the Marbles Linked Data Browser and a
GPS-enabled launcher application.

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

61 / 252
Applications: DBpedia Wiktionary

Wiktionary is a Wikimedia project: http://wiktionary.org
171 languages, 3M words for English.

Extracted Using the DBpedia Information Extraction Framework
Easily congurable for every Wiktionary language edition
Pre-congured for German, Greek, English, Russian and French.
http://Wiktionary.dbpedia.org
100 milion triples
Lemon model

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

62 / 252
Other Applications

See

http://wiki.dbpedia.org/Applications

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

for a more complete list

2013-08-23

63 / 252
Outline
1

Introduction to Linked Data

2

Linked Dataset Example: DBpedia

3

Linked Data Life-Cycle Overview

4

Knowledge Extraction

5

Data Integration / Linking

Interlinking
/ Fusing
Manual
revision/
Authoring

Classification/
Enrichment

Linked Data
Lifecycle

Storage/
Querying

Evolution /
Repair

Extraction

6

Enrichment

7

Repair

8

Quality
Analysis

Knowledge Base Exploration / Querying

Lehmann, Bühmann (Univ. Leipzig)

Search/
Browsing/
Exploration

The Linked Data Life-Cycle

2013-08-23

64 / 252
Linked Data - Achievements and Challenges
Achievements:
1

2
3

data
commons (50B facts)
vibrant, global RTD community
Industrial uptake begins (e.g.
Extension of the Web with a

BBC, Thomson Reuters, Eli Lilly,

Challenges:
1 Coherence:
2

4
5

Governmental adoption in sight
Establishing Linked Data as a
deployment path for the Semantic
Web.

Quality:

partly low quality data

and inconsistencies
3

NY Times, Facebook, Google,
Yahoo)

Relatively few,

expensively maintained links

Performance:

Still substantial

penalties compared to relational
4

Data consumption:

large-scale

processing, schema mapping and
data fusion still in its infancy
5

Usability: Missing direct end-user
tools and network eect.

These issues are closely related and
should ultimately lead to an
ecosystem of interlinked knowledge!
Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

65 / 252
Interlinking
/ Fusing
Manual
revision/
Authoring

Classification/
Enrichment

Linked Data
Lifecycle

Storage/
Querying

Quality
Analysis

Evolution /
Repair

Extraction
Search/
Browsing/
Exploration

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

66 / 252
Extraction
Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

67 / 252
Extraction

From

unstructured sources

Formats: plain text
Methods: NLP, text mining, ontology learning

From

semi-structured sources

Formats: wiki markup, tags
Tools: DBpedia framework (Wikipedia, Wictionary)

From

structured sources

Formats: databases, spreadsheets, XML
RDB2RDF tools: Sparqlify, D2R, Triplify
CSV converters: RDF extension of Google Rene

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

68 / 252
Extraction Challenges
From

unstructured sources

Improve F-Measure of existing NLP approaches (OpenCalais, Ontos
API)
Develop standardized, LOD enabled interfaces between NLP tools
(NLP2RDF)
From

semi-structured sources

Ecient bi-directional synchronization
From

structured sources

Declarative syntax and semantics of data model transformations (W3C
WG RDB2RDF)
Orthogonal challenges
Using LOD as background knowledge
Provenance
Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

69 / 252
1234567859A8BC74DE96
Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

70 / 252
RDF Data Management
From unstructured sources
SPARQL RDF access still by a factor 2-10 slower than relational data
management
Performance increases steadily
Comprehensive, well-supported open-soure and commercial
implementations are available:
OpenLink's Virtuoso (os+commercial)
OWLIM-Lite (free), OWLIM-SE, OWLIM-Enterprise
Talis (hosted)
Bigdata (distributed)
Allegrograph (commercial)
Mulgara (os)

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

71 / 252
Storage and Querying Challenges

Reduce the performance gap between relational and RDF data
management
SPARQL Query extensions: Spatial/semantic/temporal data
management
View maintenance / adaptive reorganization based on common access
patterns
More realistic benchmarks

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

72 / 252
Authoring

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

73 / 252
Authoring

Integrated in Existing Environments: Tiki
Data oriented: RDFauthor, rdfEditor
Schema oriented: Protégé, TopBraid Composer, NeOn Toolkit,
Swoop, Neologism, Knoodl

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

74 / 252
Authoring: Semantic Wikis

1

Semantic (Text) Wikis
Authoring of semantically annotated
texts
Semantic MediaWiki, KiWi,
(Wikipedia+DBpedia)

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

75 / 252
Authoring: Semantic Wikis

1

Semantic (Text) Wikis
Authoring of semantically annotated
texts
Semantic MediaWiki, KiWi,
(Wikipedia+DBpedia)

2

Semantic Data Wikis
Direct authoring of structured
information (i.e. RDF, RDF-Schema,
OWL)
OntoWiki

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

75 / 252
1234235

123345647347829A2B8CDDB2EFCC22F

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

76 / 252
Interlinking

Data Web is an uncontrolled environment  proliferation of equivalent
or similar entities  need for links / merging
Currently only few RDF triples are links
Manual Link Discovery:
Sindice Integration, LODStats, Semantic Pingback

Tool supported / Semi-Automatic:
SILK, LIMES, COMA, RDF-AI
Usually via mapping specications / heuristics

Machine Learning / Automatic:
RAVEN, EAGLE, SILK GP

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

77 / 252
Interlinking Challenges

Apply work in the de-duplication/record linkage literature
Consider the open world nature of Linked Data
Use LOD background knowledge
Zero-conguration linking
Explore active learning approaches, which integrate users in a feedback
loop
Maintain a 24/7 linking service: Linked Open Data Around-The-Clock
project (http://latc-project.eu/)

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

78 / 252
1234567829

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

79 / 252
Enrichment

Currently, lack of knowledge bases with sophisticated schema
information and instance data adhering to this schema
Goal: powerful reasoning, consistency checking and querying
Manual:
Via ontology editors, DBpedia mappings

(Semi-)Automatic:
DL-Learner, Statistical Schema Induction

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

80 / 252
Enrichment: Example
Given: knowledge base with property birthPlace (i.e. triples using that
property) but no information on the semantics of birthPlace
Possibly enrichment:

ObjectProperty: birthPlace
Characteristics: Functional
Domain: Person
Range: Place
SubPropertyOf: hasBeenAt
Benets:
axioms serve as documentation for purpose and correct usage of
schema elements
additional implicit information can be inferred
improve the applicability of schema debugging techniques
Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

81 / 252
Repair

Ontology Debugging:

OWL reasoning to detect inconsistencies and

satisable classes + detect the most likely sources for the problems
basic task: provide feedback to user for resolving undesired entailments
justication J

⊆O

of an entailment is a minimal set of axioms from

which the entailment can be drawn

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

82 / 252
1234567
89347A5A
Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

83 / 252
Linked Data Quality Analysis

Quality on the Data Web is varying a lot
Hand crafted or expensively curated knowledge base (e.g. DBLP,
UMLS) vs. extracted from text or Web 2.0 sources (DBpedia)

Quality = Fitness for use
Often not necessary to x all problems, but to know about them
30+ quality dimensions dened in recent survey
Research Challenge
Establish measures for assessing the authority, provenance, reliability of
Data Web resources

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

84 / 252
Evolution
Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

© CC-BY-SA by alasis on flickr)

2013-08-23

85 / 252
KB Evolution
Tasks:
Performing knowledge base changes / refactoring
Ensuring consistency of related knowledge
Managing changes, e.g. undo operations
Update materialized inferred data upon changes
Update materialised links to other data upon changes
Tools:
Protégé - PROMPT and change management plugins
EvoPat - easily re-usable and sharable evolution patterns dened via
SPARQL
PatOMat - ontology transformation framework

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

86 / 252
1234567895A

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

87 / 252
Exploration

RDF data can be complex (as discussed by Pascal Hitzler)
Exploration phase aims to make data accessible to non-experts
Options:
Faceted Browsing
Question Answering
Query Builders
Visualisation of statistical or geospatial data
...

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

88 / 252
Catalogus Professorum Lipsiensis

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

89 / 252
Visual Query Builder

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

90 / 252
Relationship Finder in CPL

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

91 / 252
Interlinking
/ Fusing
Manual
revision/
Authoring

Classification/
Enrichment

Linked Data
Lifecycle

Storage/
Querying

Quality
Analysis

Evolution /
Repair

Extraction
Search/
Browsing/
Exploration

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

92 / 252
Make the Web a Linked Data Washing Machine

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

93 / 252
Tool Support for Life-Cycle?
Many SW tools support one or more life-cycle stages
Linked Data Stack (http://stack.linkeddata.org) provides a
consolidated repository of such tools
Each tool is a Debian package
Lightweight integration between tools via common vocabularies and
SPARQL
Demonstrator interfaces for showing tools in combination
Developed by LOD2 and GeoKnow EU projects

Geo

Know
Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

94 / 252
Outline
1

Introduction to Linked Data

2

Linked Dataset Example: DBpedia

3

Linked Data Life-Cycle Overview

4

Knowledge Extraction

5

Data Integration / Linking

Interlinking
/ Fusing
Manual
revision/
Authoring

Classification/
Enrichment

Linked Data
Lifecycle

Storage/
Querying

Evolution /
Repair

Extraction

6

Enrichment

7

Repair

8

Quality
Analysis

Knowledge Base Exploration / Querying

Lehmann, Bühmann (Univ. Leipzig)

Search/
Browsing/
Exploration

The Linked Data Life-Cycle

2013-08-23

95 / 252
Knowledge Extraction

Knowledge Extraction is the creation of knowledge from structured
(relational databases, XML) and unstructured (text, documents, images)
sources.
Resulting knowledge needs to be in a machine-readable and
machine-interpretable format and facilitate inferencing
Similar to Information Extraction (NLP) and ETL (Data Warehouse),
but main dierence: extraction result goes beyond the creation of
structured information or the transformation into a relational schema
Requires re-use of existing formal knowledge (reusing ontologies) or
the generation of a schema based on the source data

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

96 / 252
Categorisation of Approaches
Source - Examples: plain text, relational databases, XML, CSV
Exposition - How is the extracted knowledge made explicit? How can
you query and perform inference?

Synchronization - Is the knowledge extraction process executed once
to produce a dump or is the result synchronized with the source? Are
changes to the result written back (Bi-directional)?

Reuse of Vocabularies - Can popular ontologies (Good Relations,
FOAF, . . . ) be re-used to simplify global data integration?

Automatisation - manual, semi-automatic, automatic
Domain Ontology Required - Does the approach require a
pre-dened ontology or can it create a schema from the source
(e.g. ontology learning)?

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

97 / 252
Extraction from Structured Sources to RDF
Simple mappings from RDB tables/views to RDF
Direct mapping of the model of relational databases to RDF

→ OWL class
→ Instance s of

Table
Row

this class

→ Triple (s ,p ,o )
http://www.w3.org/TR/rdb-direct-mapping/

Cell with value o in column p
Details:

Complex mappings of relational databases to RDF

Additional renements can be employed to 1:1 mapping to improve the
usefulness of RDF output

Extract or learn an OWL schema from the given database schema
Map the schema and its contents to a pre-existing domain ontology
Powerful mapping languages: R2RML, SML

XML
XML tree structure can be directly converted to RDF graph structure
Complex mappings possible, e.g. via XSLT processors

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

98 / 252
Extraction from Natural Language Sources
80% of the information in business documents is in unstructured
natural language

1

(-) Increased complexity and decreased quality of extraction
(+) Potential for a massive acquisition of extracted knowledge
Traditional Information Extraction (IE)
Recognize and categorise elements in text
Techniques: Named Entity Recognition (NER), Coreference Resolution
(CO), . . .

Ontology Learning (OL) from Text
Learn whole ontologies from natural language text
Usually (semi-)automatic extracted

1

Wimalasuriya, Dou. Ontology-based information extraction: [. . . ] Journal of Information Science

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

99 / 252
LinkedGeoData + Sparqlify

Example: LinkedGeoData Knowledge Extraction Project using Sparqlify

Structure
Motivation
OpenStreetMap
LGD Architecture
Mapping
Access (How LinkedGeoData is published)
Use Cases

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

100 / 252
Motivation

Ease

information integration tasks that require spatial knowledge,

such as
Oerings of bakeries next door
Map of distributed branches of a company
Historical sights along a bicycle track

LOD cloud contains data sets with spatial features
e.g. Geonames, DBpedia, US census, EuroStat

But:

they are

restricted to popular or large entities like countries,

famous places etc. or specic regions

Therefore

they lack

Lehmann, Bühmann (Univ. Leipzig)

buildings, roads, mailboxes, etc.

The Linked Data Life-Cycle

2013-08-23

101 / 252
OpenStreetMap - Datamodel
Basic entities are:

Nodes Latitude, Longitude.
Ways Sequence of nodes.
Relations Associations between any number of nodes, ways and
relations. Every member in a relation plays a certain role.
Each entity may be described with tags (= key-value pairs)

A way is

closed if the ID of the last referenced node equals that of the

rst one.
Whether a closed way denotes a linear ring or a polygon (i.e. whether
the enclosed area is part of the respective OSM entity) depends on the
tags.

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

102 / 252
Example: Leipzig's Zoo

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

103 / 252
Comparison: Leipzig's Zoo (OpenStreetMap)

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

104 / 252
Comparison: Leipzig's Zoo (GoogleMaps)

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

105 / 252
LGD Architecture

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

106 / 252
Tag Mappings

Key-value pairs will be assigned to
RDF ressources
Each pair

(k , v )

can be annotated with

datatypes, language tags, classes
Mappings are themselves tables
Example table:

k

lgd_map_literal

name
name:en
alt_label
note
...

Lehmann, Bühmann (Univ. Leipzig)

property

rdfs:label
rdfs:label
skos:altLabel
rdfs:comment
...

The Linked Data Life-Cycle

lang
en
...

2013-08-23

107 / 252
View Denition

RDF mapping of the data from a
PostgreSQL database

Create View lgd_nodes As
Construct {
?n a lgdm:Node .
?n geom:geometry ?g .
?g ogc:asWKT ?o .
}
With
?n = uri(lgd:node, ?id)
?g = uri(lgd-geom:node, ?id)
?o = typedLiteral(?geom, ogc:wktLiteral)
From
nodes

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

108 / 252
Sparqlify

SPARQL-SQL Rewriter
Rewrites SPARQL Queries according
to the view denition
Platform module oers SPARQL
Endpoint and Linked Data interface

https:
//github.com/AKSW/Sparqlify

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

109 / 252
Rest-API

Oers REST methods for frequent
queries
Based on SPARQL (Virtuoso) endpoint

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

110 / 252
Downloads

RDF dataset for download
Generated using

Construct { ?s ?p ?o }

http:
//downloads.linkedgeodata.org

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

111 / 252
Ontology

Enriched

classes and properties with multilingual labels from

TranslateWiki

http://translatewiki.net
Imported

icons for 90 classes from the freely available icon

collection from the SJJB Management

http://www.sjjb.co.uk/mapicons/

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

112 / 252
SML Mapping Examples

The following slides demonstrate how to map relational data to RDF
with the Sparqlication Mapping Language (SML).
Thereby, these prexes are used:

prex

rdfs
ogc
geom
lgd
lgd-geom

IRI

Prexes

http://www.w3.org/2000/01/rdf-schema#
http://www.opengis.net/ont/geosparql#
http://geovocab.org/geometry#
http://linkedgeodata.org/triplify/
http://linkedgeodata.org/geometry/

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

113 / 252
SML - Mapping Example I: The Goal (1/4)
Input Table

id
1
2

How to map tables to RDF?

nodes

How to introduce the

geom

commonly used

POINT(0 0)
POINT(1 1)

distinction in GIS between
feature and geometry?

Aimed for RDF Output

@prefix rdfs: http://www.w3.org/2000/01/rdf-schema# .
...
lgd:node1 geom:geometry lgd-geom:node1 .
lgd:node2 geom:geometry lgd-geom:node2 .
lgd-geom:node1 ogc:asWKT POINT(0 0)^^ogc:wktLiteral .
lgd-geom:node2 ogc:asWKT POINT(1 1)^^ogc:wktLiteral .

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

114 / 252
SML - Mapping Example I: SML Syntax Outline (2/4)
Input Table

id
1
2

nodes

geom

POINT(0 0)
POINT(1 1)

Create View myNodesView As
Construct {
...
}
With
...
From
...

Aimed for RDF Output

@prefix rdfs: http://www.w3.org/2000/01/rdf-schema# .
...
lgd:node1 geom:geometry lgd-geom:node1 .
lgd:node2 geom:geometry lgd-geom:node2 .
lgd-geom:node1 ogc:asWKT POINT(0 0)^^ogc:wktLiteral .
lgd-geom:node2 ogc:asWKT POINT(1 1)^^ogc:wktLiteral .

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

115 / 252
SML - Mapping Example I: Construct and From (3/4)
Input Table

id
1
2

nodes

geom

POINT(0 0)
POINT(1 1)

Create View myNodesView As
Construct {
?n geom:geometry ?g .
?g ogc:asWKT ?o
}
With
...
From
nodes

Aimed for RDF Output

@prefix rdfs: http://www.w3.org/2000/01/rdf-schema# .
...
lgd:node1 geom:geometry lgd-geom:node1 .
lgd:node2 geom:geometry lgd-geom:node2 .
lgd-geom:node1 ogc:asWKT POINT(0 0)^^ogc:wktLiteral .
lgd-geom:node2 ogc:asWKT POINT(1 1)^^ogc:wktLiteral .
Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

116 / 252
SML - Mapping Example I: Complete! (4/4)
Input Table

id
1
2

nodes

geom

POINT(0 0)
POINT(1 1)

Create View myNodesView As
Construct {
?n geom:geometry ?g .
?g ogc:asWKT ?o
}
With
?n = uri(lgd:node, ?id)
?g = uri(lgd-geom:node, ?id)
?o = typedLiteral(?geom,
ogc:wktLiteral)
From
nodes

Aimed for RDF Output

@prefix rdfs: http://www.w3.org/2000/01/rdf-schema# .
...
lgd:node1 geom:geometry lgd-geom:node1 .
lgd:node2 geom:geometry lgd-geom:node2 .
lgd-geom:node1 ogc:asWKT POINT(0 0)^^ogc:wktLiteral .
lgd-geom:node2 ogc:asWKT POINT(1 1)^^ogc:wktLiteral .
Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

117 / 252
SML Mapping Examples

A more complex example, which demonstrates the use of an SQL
mapping table and an SQL helper view.

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

118 / 252
SML - Mapping Example II: The Goal (1/8)
Input Table

id
1
1
1
1
1

node_tags

k

name
name:en
amenity
addr:street
addr:city

v

Universitaet Leipzig
University of Leipzig
university
Augustusplatz
Leipzig

Aimed for RDF Output

@prefix rdfs: http://www.w3.org/2000/01/rdf-schema# .
@prefix lgd: http://linkedgeodata.org/triplify/ .
lgd:node1 rdfs:label Universitaet Leipzig .
lgd:node1 rdfs:label University of Leipzig@en .

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

119 / 252
SML - Mapping Example II: Source Data (2/8)

OSM Table

id
1
1
1
1
1

node_tags

k

name
name:en
amenity
addr:street
addr:city

v

Universitaet Leipzig
University of Leipzig
university
Augustusplatz
Leipzig

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

120 / 252
SML - Mapping Example II: Mapping Table (3/8)

OSM Table

id
1
1
1
1
1

node_tags

k

name
name:en
amenity
addr:street
addr:city

RDF Mapping Table

v

Universitaet Leipzig
University of Leipzig
university
Augustusplatz
Leipzig

Lehmann, Bühmann (Univ. Leipzig)

k

lgd_map_literal

name
name:en
alt_label
note
...

The Linked Data Life-Cycle

property

rdfs:label
rdfs:label
skos:altLabel
rdfs:comment
...

lang
en
...

2013-08-23

121 / 252
SML - Mapping Example II: Helper View (4/8)
OSM Table

id
1
1
1
1
1

node_tags

k

name
name:en
amenity
addr:street
addr:city

RDF Mapping Table

v

Universitaet Leipzig
University of Leipzig
university
Augustusplatz
Leipzig

k

lgd_map_literal

name
name:en
alt_label
note
...

property

rdfs:label
rdfs:label
skos:altLabel
rdfs:comment
...

lang
en
...

Helper View

lgd_node_tags_literal

id

property

v

lang

1
rdfs:label
Universitaet Leipzig
1
rdfs:label
University of Leipzig en
...
...
...
...
SELECT id, property, v, lang FROM node_tags, lgd_map_literal
WHERE node_tags.k = lgd_map_literal.k
Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

122 / 252
SML - Mapping Example II: SML View (5/8)

Logical Table

id

1
1
...

SML View

lgd_node_tags_literal

property

rdfs:label
rdfs:label
...

v

Univ. L.
Univ. of L.
...

Lehmann, Bühmann (Univ. Leipzig)

lang
en
...

Create View lgd_node_tags_text As
Construct {

The Linked Data Life-Cycle

2013-08-23

123 / 252
SML - Mapping Example II: SML View (6/8)

Logical Table

id

1
1
...

SML View

lgd_node_tags_literal

property

rdfs:label
rdfs:label
...

v

Univ. L.
Univ. of L.
...

Lehmann, Bühmann (Univ. Leipzig)

lang
en
...

Create View lgd_node_tags_text As
Construct {
?s ?p ?o .
}
With
...
From
lgd_node_tags_literal

The Linked Data Life-Cycle

2013-08-23

124 / 252
SML - Mapping Example II: SML View (7/8)

Logical Table

id

1
1
...

SML View

lgd_node_tags_literal

property

rdfs:label
rdfs:label
...

v

Univ. L.
Univ. of L.
...

Lehmann, Bühmann (Univ. Leipzig)

lang
en
...

Create View lgd_node_tags_text As
Construct {
?s ?p ?o .
}
With
?s = uri(lgd:node, ?id)
?p = uri(?property)
?o = plainLiteral(?v, ?lang)
From
lgd_node_tags_literal

The Linked Data Life-Cycle

2013-08-23

125 / 252
SML - Mapping Example II: SML View (8/8)
Logical Table

SML View

+

Create View lgd_node_tags_text As
Construct {
?s ?p ?o .
}
With
?s = uri(lgd:node, ?id)
?p = uri(?property)
?o = plainLiteral(?v, ?lang)
From
lgd_node_tags_literal

id

1
1
...

lgd_node_tags_literal

property

rdfs:label
rdfs:label
...

v

Univ. L.
Univ. of L.
...

lang
en
...

Resulting RDF

@prefix rdfs: http://www.w3.org/2000/01/rdf-schema# .
@prefix lgd: http://linkedgeodata.org/triplify/ .
lgd:node1 rdfs:label Universitaet Leipzig .
lgd:node1 rdfs:label University of Leipzig@en .

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

126 / 252
Further Tag Mappings
lgd_map_dataype

k

seats
unisex

k

datatype
integer
boolean

lgd_map_property

website

property

foaf:homepage

lgd_map_resource_k

k

highway

property
rdf:type

lgd_map_resource_kv

k

waterway

v

river

object

lgdo:HighwayThing

property
rdf:type

Lehmann, Bühmann (Univ. Leipzig)

object

lgdo:River

The Linked Data Life-Cycle

2013-08-23

127 / 252
LGD Edit Tool
Multi User Tag Mapping WebApp

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

128 / 252
Resources

Sparqlify
http://sparqlify.org

LinkedGeoData
http://linkedgeodata.org

Tag Mappings
https://github.com/GeoKnow/LinkedGeoData/blob/master/linkedgeodata-core/src/main/resources/
org/aksw/linkedgeodata/sql/Mappings.sql

SML View Denitions
https://github.com/GeoKnow/LinkedGeoData/blob/master/linkedgeodata-core/src/main/resources/
org/aksw/linkedgeodata/sml/LinkedGeoData-Triplify-IndividualViews.sml

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

129 / 252
Statistics (15 August 2013)

Complete OSM planet le corresponds to
Virtual access via Sparqlify

∼

20.000.000.000 triples

Downloads limited to selected classes.
292.780.188 Triples

153.613.243 triples of Nodes
139.166.945 triples of Ways
Relations not yet available for download
Among them

532.812 PlaceOfWorship
82.788 RailwayStation
72.091 Toilets
71.613 Town
19.937 City

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

130 / 252
Access
Materialized Sparql Endpoint (based on Virtuoso DB, download
datasets loaded)

http://linkedgeodata.org/sparql
http://linkedgeodata.org/snorql

Virtual Sparql Endpoint (based on Sparqlify, access to 20B triples,
limited SPARQL 1.0 support)

http://linkedgeodata.org/vsparql
http://linkedgeodata.org/vsnorql

Rest Interface (based on the Virtual Sparql Endpoint)
Supports limited queries (e.g. circular/rectangular area, ltering by
labels)

Downloads

http://downloads.linkedgeodata.org
Monthly updates on the above datasets envisioned

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

131 / 252
Use Cases Augmented Reality

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

132 / 252
Use Cases Generic Browsing

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

133 / 252
Use Cases Generic Browsing

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

134 / 252
Outline
1

Introduction to Linked Data

2

Linked Dataset Example: DBpedia

3

Linked Data Life-Cycle Overview

4

Knowledge Extraction

5

Data Integration / Linking

Interlinking
/ Fusing
Manual
revision/
Authoring

Classification/
Enrichment

Linked Data
Lifecycle

Storage/
Querying

Evolution /
Repair

Extraction

6

Enrichment

7

Repair

8

Quality
Analysis

Knowledge Base Exploration / Querying

Lehmann, Bühmann (Univ. Leipzig)

Search/
Browsing/
Exploration

The Linked Data Life-Cycle

2013-08-23

135 / 252
Why Link Discovery?
1

Fourth Linked Data
principle

2

Links are central for
Cross-ontology QA
Data Integration
Reasoning
Federated Queries
...

3

2011 topology of the
LOD Cloud:
31+ billion triples

≈ 0.5 billion links
owl:sameAs in most
cases

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

136 / 252
Why is it dicult?
1

Time complexity
Large number of triples
Quadratic a-priori runtime
69 days for mapping cities from
DBpedia to Geonames (1ms per
comparison)
decades for linking DBpedia and LGD
...

Denition (Link Discovery)
Given sets S and T of resources and relation
Task: Find M

= {(s , t ) ∈ S × T : R(s , t )}

R

Common approaches:
Find M
Find M

= {(s , t ) ∈ S × T : σ(s , t ) ≥ θ}
= {(s , t ) ∈ S × T : δ(s , t ) ≤ θ}

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

137 / 252
Why is it dicult?
2

Complexity of specications
Combination of several attributes required for high precision
Tedious discovery of most adequate mapping
Dataset-dependent similarity functions

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

138 / 252
LIMES Framework

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

139 / 252
Runtime Optimization
Reduce the number of comparisons C (A)
all

σ /θ

values for links)

≥ |M |

(assuming we need

Maximize reduction ratio:
RR (A)

Lehmann, Bühmann (Univ. Leipzig)

=1−

C (A)

|S ||T |

The Linked Data Life-Cycle

2013-08-23

140 / 252
Runtime Optimization
Reduce the number of comparisons C (A)
all

σ /θ

values for links)

≥ |M |

(assuming we need

Maximize reduction ratio:
RR (A)

=1−

C (A)

|S ||T |

Question
Can we devise lossless approaches with guaranteed RR?
Advantages
Space management
Runtime prediction
Resource scheduling

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

140 / 252
RR Guarantee

Best achievable reduction ratio: RRmax

Lehmann, Bühmann (Univ. Leipzig)

=1−

The Linked Data Life-Cycle

|M |
|S ||T |

2013-08-23

141 / 252
RR Guarantee

Best achievable reduction ratio: RRmax
Approach

H(α)

=1−

|M |
|S ||T |

fullls RR guarantee criterion, i:

∀r  RRmax , ∃α : RR (H(α)) ≥ r

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

141 / 252
RR Guarantee

Best achievable reduction ratio: RRmax
Approach

H(α)

=1−

|M |
|S ||T |

fullls RR guarantee criterion, i:

∀r  RRmax , ∃α : RR (H(α)) ≥ r
Here, we use relative reduction ratio (RRR ):

RRR (A)

Lehmann, Bühmann (Univ. Leipzig)

=

RRmax
RR (A)

The Linked Data Life-Cycle

2013-08-23

141 / 252
Goal

Formal Goal
Devise

H(α) : ∀r  1, ∃α : RRR (H(α)) ≤ r

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

142 / 252
Restrictions
Minkowski Distance
δ(s , t ) = p

n

1

i=

Lehmann, Bühmann (Univ. Leipzig)

|si − ti |p , p ≥ 2

The Linked Data Life-Cycle

2013-08-23

143 / 252
Space Tiling
HYPPO
δ(s , t ) ≤ θ

describes a hypersphere

Approximate hypersphere by using a hypercube
Easy to compute
No loss of recall (blocking)

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

144 / 252
Space Tiling
Set width of single hypercube to

Lehmann, Bühmann (Univ. Leipzig)

∆ = θ/α

The Linked Data Life-Cycle

2013-08-23

145 / 252
Space Tiling
Set width of single hypercube to
Tile

Ω=S ∪T

(c1 , . . . , c ) ∈ N
points ω ∈ Ω : ∀i ∈ {1 . . . n}, c ∆ ≤ ω  (c + 1)∆

Coordinates:
Contains

∆ = θ/α

into the adjacent cubes C

Lehmann, Bühmann (Univ. Leipzig)

n

n

i

The Linked Data Life-Cycle

i

i

2013-08-23

145 / 252
Space Tiling
Set width of single hypercube to
Tile

Ω=S ∪T

(c1 , . . . , c ) ∈ N
points ω ∈ Ω : ∀i ∈ {1 . . . n}, c ∆ ≤ ω  (c + 1)∆

Coordinates:
Contains

∆ = θ/α

into the adjacent cubes C

Lehmann, Bühmann (Univ. Leipzig)

n

n

i

The Linked Data Life-Cycle

i

i

2013-08-23

145 / 252
HYPPO
Combine

(2α + 1)n

hypercubes around C (ω) to approximate

hypersphere

RRR (HYPPO (α))

n
2
= (αα+(1))
nS n

lim RRR (HYPPO (α))

α→∞

Lehmann, Bühmann (Univ. Leipzig)

n
= S2 n)
(

The Linked Data Life-Cycle

2013-08-23

146 / 252
HYPPO
RRR(HYPPO) for p

Lehmann, Bühmann (Univ. Leipzig)

= 2,

n

= 2, 3, 4

and 2

≤ α ≤ 50

The Linked Data Life-Cycle

2013-08-23

147 / 252
HYPPO
RRR(HYPPO) for p

= 2,

lim RRR (HYPPO (α))

α→∞

lim RRR (HYPPO (α))

α→∞

lim RRR (HYPPO (α))

α→∞

Lehmann, Bühmann (Univ. Leipzig)

n

= 2, 3, 4

and 2

≤ α ≤ 50

4
= π ≈ 1.27 (n = 2)
6
= π ≈ 1.91 (n = 3)

32
= π2 ≈ 3.24 (n = 4)

The Linked Data Life-Cycle

2013-08-23

147 / 252
HR3 : Idea
index (C , ω)

=


0

if

n



i=

Lehmann, Bühmann (Univ. Leipzig)

∃i : |ci − c (ω)i | ≤ 1, 1 ≤ i ≤ n,

(|ci − c (ω)i | − 1)p

1

The Linked Data Life-Cycle

else,

2013-08-23

148 / 252
HR3 : Idea

Compare C (ω) with C i index (C , ω)

α = 4, p = 2

Lehmann, Bühmann (Univ. Leipzig)

≤ αp

The Linked Data Life-Cycle

2013-08-23

149 / 252
HR3 : Idea

Lemma
∀s ∈ S : index (C , s )  αp

implies that all t

∈C

are non-matches

Claims
No loss of recall

3 (α))

lim RRR (HR

α→∞

Lehmann, Bühmann (Univ. Leipzig)

=1

The Linked Data Life-Cycle

2013-08-23

150 / 252
HR3 : Lemma 3
Lemma
∀α  1
p

3 (2α))

RRR (HR

 RRR (HR3 (α))

= 2, α = 4

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

151 / 252
HR3 : Proof
Lemma
∀α  1
p

RRR (HR

3 (2α))

 RRR (HR 3 (α))

= 2, α = 8

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

152 / 252
HR3 : Proof
Lemma
∀α  1
p

RRR (HR

3 (2α))

 RRR (HR 3 (α))

= 2, α = 25

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

153 / 252
HR3 : Proof
Lemma
∀α  1
p

RRR (HR

3 (2α))

 RRR (HR 3 (α))

= 2, α = 50

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

154 / 252
HR3 : Idea

Theorem
3 (α))

lim RRR (HR

α→∞

=1

Claims
No loss of recall

3 (α))

lim RRR (HR

α→∞

Lehmann, Bühmann (Univ. Leipzig)

=1

The Linked Data Life-Cycle

2013-08-23

155 / 252
HR3 : Experiments

Compare

HR3

with LIMES 0.5's HYPPO and SILK 2.5.1

Experimental Setup:
Deduplicating DBpedia places by minimum elevation, elevation and
maximum elevation (θ

= 49m, 99m).

Geonames and LinkedGeoData by longitude and latitude (θ

= 1◦ , 9◦ )

64-bit computer with a 2.8GHz i7 processor with 8GB RAM.

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

156 / 252
HR3 : Experiments (Comparisons)
Experiment 2: Deduplicating DBpedia places,

6
0.64 × 10

θ = 99m

less comparisons

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

157 / 252
HR3 : Experiments (Comparisons)
Experiment 4: Linking Geonames and LinkedGeoData,
4.3

× 106

θ = 9◦

less comparisons

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

158 / 252
HR3 : Experiments (Runtime)
θ = 49, 99m
◦
Geonames and LGD, θ = 1, 9

Experiment 1, 2: DBpedia,
Experiment 3, 4:
10

Runtime (s)

10

10

10

10

4

3

HR3
HYPPO
SILK

2

1

0

Exp. 1

Lehmann, Bühmann (Univ. Leipzig)

Exp. 2

Exp. 3

The Linked Data Life-Cycle

Exp. 4

2013-08-23

159 / 252
HR3 : Summary

Mission
New category of algorithms for link discovery

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

160 / 252
HR3 : Summary

Mission
New category of algorithms for link discovery

Presented

HR3

Link discovery in ane spaces with Minkowski measures
Outperforms the state of the art (runtime, comparisons)
Optimal reduction ratio
Integrated in LIMES

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

160 / 252
Learning Complex Specications

Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)
Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

161 / 252
Learning Complex Specications

Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)
Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

161 / 252
Learning Complex Specications

Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)
Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

161 / 252
Learning Complex Specications

Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)
Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

162 / 252
Learning Complex Specications

Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)
Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

162 / 252
Learning Complex Specications

Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)
Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

162 / 252
Learning Complex Specications

Insight
Choice of right example is key for learning
So far, only use of informativeness

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

163 / 252
Learning Complex Specications

Insight
Choice of right example is key for learning
So far, only use of informativeness
Question
Can we do better by using more information?

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

163 / 252
Learning Complex Specications

Insight
Choice of right example is key for learning
So far, only use of informativeness
Question
Can we do better by using more information?

Higher F-measure
Often slower

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

163 / 252
Basic Idea
Use similarity of link candidates when selecting most informative
examples (intra + inter class similarity)

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

164 / 252
Basic Idea
Use similarity of link candidates when selecting most informative
examples (intra + inter class similarity)

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

164 / 252
Basic Idea
Use similarity of link candidates when selecting most informative
examples (intra + inter class similarity)

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

164 / 252
Similarity of Candidates

= (s , t ) can
(σ1 (x ), . . . , σn (x )) ∈ [0, 1]n .
Link candidate x

be regarded as vector

Similarity of link candidates x and y :
sim (x , y )

1

=
n

1

+
i=

.

(1)

(σi (x ) − σi (y ))2

1

Allows exploiting both intra- and inter-class similarity

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

165 / 252
Graph Clustering
Rationale:
Approach

Use intra-class similarity

Cluster elements of S

+

and S

−

independently

Choose one element per cluster as representative
Present oracle with most informative representatives

e

S+
0.9

a

0.25

0.8

c

0.8
0.8

b

h
0.8

f
Lehmann, Bühmann (Univ. Leipzig)

d

0.9

0.25

l

0.8

i

0.9
0.8

0.8

g

k
0.25

The Linked Data Life-Cycle

S2013-08-23

166 / 252
BorderFlow
G

= (V , E , ω)

with V

= S+

or V

= S−

ω(x , y ) = sim(x , y )
Keep best ec edges for each x

Lehmann, Bühmann (Univ. Leipzig)

∈V

The Linked Data Life-Cycle

2013-08-23

167 / 252
BorderFlow
Seed-based algorithm
Goal: Maximize borderow ratio bf (X )

Lehmann, Bühmann (Univ. Leipzig)

=

Ω(b (X ),X )
Ω(b (X ),n(X ))

The Linked Data Life-Cycle

2013-08-23

168 / 252
BorderFlow
Seed-based algorithm
Goal: Maximize borderow ratio bf (X )

Lehmann, Bühmann (Univ. Leipzig)

=

Ω(b (X ),X )
Ω(b (X ),n(X ))

The Linked Data Life-Cycle

2013-08-23

168 / 252
BorderFlow
Seed-based algorithm
Goal: Maximize borderow ratio bf (X )

=

Ω(b (X ),X )
Ω(b (X ),n(X ))

http://sourceforge.net/projects/cugar-framework/
Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

168 / 252
BorderFlow
Seed-based algorithm
Goal: Maximize borderow ratio bf (X )

Lehmann, Bühmann (Univ. Leipzig)

=

Ω(b (X ),X )
Ω(b (X ),n(X ))

The Linked Data Life-Cycle

2013-08-23

169 / 252
BorderFlow
Seed-based algorithm
Goal: Maximize borderow ratio bf (X )

=

Ω(b (X ),X )
Ω(b (X ),n(X ))

http://sourceforge.net/projects/cugar-framework/
Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

169 / 252
Conclusion

Can be combined with arbitrary active learning ML algorithms
Was experimentally combined with EAGLE (genetic programming) and
RAVEN (linear classier) and shown to outperform the plain
informativeness function in terms of F-measure
Choice of example important to minimise user eort
Contact me for detailed experimental results
Longer runtimes (up to 2×)

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

170 / 252
Summary
Linking crucial task in the web of data
Tow key problems
1

Ecient execution of link specications

2

Creation of link specication

Presented HR3 to handle the rst problem
Presented COALA as building block for the second problem

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

171 / 252
Outline
1

Introduction to Linked Data

2

Linked Dataset Example: DBpedia

3

Linked Data Life-Cycle Overview

4

Knowledge Extraction

5

Data Integration / Linking

Interlinking
/ Fusing
Manual
revision/
Authoring

Classification/
Enrichment

Linked Data
Lifecycle

Storage/
Querying

Evolution /
Repair

Extraction

6

Enrichment

7

Repair

8

Quality
Analysis

Knowledge Base Exploration / Querying

Lehmann, Bühmann (Univ. Leipzig)

Search/
Browsing/
Exploration

The Linked Data Life-Cycle

2013-08-23

172 / 252
Motivation

rise in the availability and usage of knowledge bases
still a lack of knowledge bases that consist of sophisticated schema
information and instance data adhering to this schema
e.g. in the life sciences several knowledge bases
only consist of schema information
to a large extent, a collection of facts without a clear structure
(e.g. information extracted from databases)

combination of sophisticated schema and instance data would allow
powerful reasoning, consistency checking, and improved querying

→

create schemata based on existing data

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

173 / 252
Example
dbr : Brad_Pitt

: birthPlace
a

dbr : Angela_Merkel

: birthPlace
a

: birthPlace
a

d b r : Shawnee , _Oklahoma
a

Suggestions:

a

a

: Place .

: Place .

birthPlace

ObjectProperty :

birthPlace

Characteristics :
Range :

d b r : Ulm ;

: Person .

: Place .

d b r : Hamburg

Domain :

d b r : Hamburg ;

: Person .

dbr : A l b e r t _ E i n s t e i n

d b r : Ulm

d b r : Shawnee , _Oklahoma ;

: Person .

Functional

Person
Place

SubPropertyOf :

hasBeenAt

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

174 / 252
Benets of an expressive schema

Axioms serve as documentation for the purpose and correct usage of
schema elements
Additional implicit information can be inferred
Improve querying optimisations
Improve/allow the application of schema debugging techniques

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

175 / 252
Each person was only born at one place?!

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

176 / 252
birthPlace

birthPlace
=
birthPlace

birthPlace
=
birthPlace

birthPlace

birthPlace is functional
=
birthPlace

birthPlace

birthPlace is functional
=
birthPlace

birthPlace

SELECT ? s WHERE {
? s dbo : b i r t h P l a c e ? o1 .
? s dbo : b i r t h P l a c e ? o2 .
FILTER ( ? o1 != ? o2 ) }
}
birthPlace is functional

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

177 / 252
Where was Julia Nannie Wallace born?

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

178 / 252
Julia Nannie Wallace was born in Lacrosse?

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

179 / 252
No, Julia Nannie Wallace was born in La Crosse!

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

180 / 252
birthPlace
rdf:type

birthPlace

Sport
rdf:type

birthPlace

birthPlace range Place

Sport
rdf:type

Sport

birthPlace

rdf:type

birthPlace range Place

Place
rdf:type

Sport

=

birthPlace

rdf:type

birthPlace range Place

Place disjointWith Sport

Place
rdf:type

Sport

=

birthPlace

rdf:type

birthPlace range Place

Place disjointWith Sport

Place
rdf:type

City

birthPlace

rdf:type

birthPlace range Place

Place disjointWith Sport

Place
rdf:type

City

birthPlace

rdf:type

Place

SELECT ? s ? p l a c e WHERE {
? s dbo : b i r t h P l a c e ? p l a c e .
? place r d f : type / r d f s : subClassOf ∗ ? type1 .
? t y p e 2 r d f s : s u b C l a birthPlace :range Place
s s O f ∗ dbo P l a c e .
? t y p e 1 owl : d i s j o i n t W i t h ? t y p e 2 .
}
Place disjointWith Sport

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

181 / 252
3 Steps to get a schema

3-Phase Enrichment
Learning Approach:

SPARQL
Endpoint

Input: Entity URI,
Axiom Type,
Knowledge Base
(SPARQL Endpoint)

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

182 / 252
3 Steps to get a schema

3-Phase Enrichment
Learning Approach:
(only executed once
per knowledge base)

SPARQL
Endpoint

Input: Entity URI,
1. obtain schema
Axiom Type,
information
Knowledge Base
(SPARQL Endpoint)

Lehmann, Bühmann (Univ. Leipzig)

Background
Knowledge

The Linked Data Life-Cycle

2013-08-23

183 / 252
3 Steps to get a schema

3-Phase Enrichment
Learning Approach:

Input: Entity URI,
1. obtain schema
Axiom Type,
information
Knowledge Base
(SPARQL Endpoint)

Lehmann, Bühmann (Univ. Leipzig)

(sample data
if necessary)

Reasoner
(optional
invocation)

(only executed once
per knowledge base)

SPARQL
Endpoint

Background
Knowledge

2. obtain axiom type
and entity specific data

Background
Knowledge
+ Relevant
Instance Data

The Linked Data Life-Cycle

2013-08-23

184 / 252
3 Steps to get a schema

3-Phase Enrichment
Learning Approach:

Input: Entity URI,
1. obtain schema
Axiom Type,
information
Knowledge Base
(SPARQL Endpoint)

Lehmann, Bühmann (Univ. Leipzig)

(sample data
if necessary)

Reasoner

Learner

DL-Learner

Enrichment
Ontology

(optional
invocation)

(only executed once
per knowledge base)

SPARQL
Endpoint

Background
Knowledge

2. obtain axiom type
and entity specific data

Background
3. run machine learning
Knowledge
algorithm
+ Relevant
Instance Data

The Linked Data Life-Cycle

2013-08-23

List of Axiom
Suggestions
+ Metadata

185 / 252
3 Steps to get a schema

3-Phase Enrichment
Learning Approach:

Input: Entity URI,
1. obtain schema
Axiom Type,
information
Knowledge Base
(SPARQL Endpoint)

Lehmann, Bühmann (Univ. Leipzig)

(sample data
if necessary)

Reasoner

Learner

DL-Learner

Enrichment
Ontology

(optional
invocation)

(only executed once
per knowledge base)

iterate over all axiom types
and schema entities for full
enrichment

SPARQL
Endpoint

Background
Knowledge

2. obtain axiom type
and entity specific data

Background
3. run machine learning
Knowledge
algorithm
+ Relevant
Instance Data

The Linked Data Life-Cycle

2013-08-23

List of Axiom
Suggestions
+ Metadata

186 / 252
Starting Point

http://dbpedia.org/sparql
http://dbpedia.org/ontology/author

SPARQL endpoint:
Entity URI:

Axiom Type: Object Property Domain

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

187 / 252
Step 1 - Obtaining Schema Information

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

188 / 252
Step 1 - Obtaining Schema Information
CONSTRUCT WHERE {
? sub r d f s : s u b C l a s s O f ? sup .
}
ORDER BY DESC( ? sub ) LIMIT 1000 OFFSET 1000

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

188 / 252
Step 1 - Obtaining Schema Information
CONSTRUCT WHERE {
? sub r d f s : s u b C l a s s O f ? sup .
}
ORDER BY DESC( ? sub ) LIMIT 1000 OFFSET 1000

dbo : D i s e a s e
dbo : Book
dbo : WrittenWork
dbo : Work
dbo : P h i l o s o p h e r
dbo : P e r s o n
dbo : Agent
dbo : S p o r t
dbo : A c t i v i t y
dbo : F i s h

rdfs
rdfs
rdfs
rdfs
rdfs
rdfs
rdfs
rdfs
rdfs
rdfs

Lehmann, Bühmann (Univ. Leipzig)

: subClassOf
: subClassOf
: subClassOf
: subClassOf
: subClassOf
: subClassOf
: subClassOf
: subClassOf
: subClassOf
: subClassOf

owl : Thing .
dbo : WrittenWork .
dbo : Work .
owl : Thing .
dbo : P e r s o n .
dbo : Agent .
owl : Thing .
dbo : A c t i v i t y .
owl : Thing .
dbo : Animal .

The Linked Data Life-Cycle

2013-08-23

188 / 252
Step 2 - Obtain axiom type and entity specic data

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

189 / 252
Step 2 - Obtain axiom type and entity specic data
SELECT ? t y p e (COUNT( DISTINCT ? s ) AS ? c n t ) WHERE {
? s dbo : a u t h o r ? o .
? s a ? type .
} GROUP BY ? t y p e ORDER BY DESC( ? c n t )

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

189 / 252
Step 2 - Obtain axiom type and entity specic data
SELECT ? t y p e (COUNT( DISTINCT ? s ) AS ? c n t ) WHERE {
? s dbo : a u t h o r ? o .
? s a ? type .
} GROUP BY ? t y p e ORDER BY DESC( ? c n t )
type

cnt

owl:Thing

30284

dbo:Work

30284

schema:CreativeWork

30284

dbo:WrittenWork

25730

dbo:Book

24673

schema:Book

24673

dbo:TelevisionShow

2567

dbo:Play

1057

.
.
.

Lehmann, Bühmann (Univ. Leipzig)

.
.
.

The Linked Data Life-Cycle

2013-08-23

189 / 252
Step 2 - Obtain axiom type and entity specic data
CONSTRUCT WHERE {
? i n d dbo : a u t h o r ? o .
? ind a ? type .
}
ORDER BY DESC( ? i n d ) LIMIT 1000 OFFSET 2000

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

189 / 252
Step 2 - Obtain axiom type and entity specic data
CONSTRUCT WHERE {
? i n d dbo : a u t h o r ? o .
? ind a ? type .
}
ORDER BY DESC( ? i n d ) LIMIT 1000 OFFSET 2000

.
.
.
d b p e d i a : The_Adventures_of_Tom_Sawyer
dbo : a u t h o r
d b p e d i a : Mark_Twain ;
rdf : type
dbo : Book .
d b p e d i a : The_Zombie_Survival_Guide
dbo : a u t h o r
d b p e d i a : Max_Brooks ;
rdf : type
dbo : WrittenWork .
d b p e d i a : Web_Therapy
dbo : a u t h o r
d b p e d i a : Lisa_Kudrow ;
rdf : type
dbo : Book .
.
.
.

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

189 / 252
Step 3 - Scoring
d b p e d i a : The_Adventures_of_Tom_Sawyer
dbo : a u t h o r
d b p e d i a : Mark_Twain ;
rdf : type
dbo : Book .
d b p e d i a : The_Zombie_Survival_Guide
dbo : a u t h o r
d b p e d i a : Max_Brooks ;
rdf : type
dbo : WrittenWork .
d b p e d i a : Web_Therapy
dbo : a u t h o r
d b p e d i a : Lisa_Kudrow ;
rdf : type
dbo : Book .

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

190 / 252
Step 3 - Scoring
d b p e d i a : The_Adventures_of_Tom_Sawyer
dbo : a u t h o r
d b p e d i a : Mark_Twain ;
rdf : type
dbo : Book .
d b p e d i a : The_Zombie_Survival_Guide
dbo : a u t h o r
d b p e d i a : Max_Brooks ;
rdf : type
dbo : WrittenWork .
d b p e d i a : Web_Therapy
dbo : a u t h o r
d b p e d i a : Lisa_Kudrow ;
rdf : type
dbo : Book .
Score(Domain(dbo:author, dbo:Book))=

2
3

≈ 66.7%

Score(Domain(dbo:author, dbo:WrittenWork))=

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

1
3

≈ 33.3%

2013-08-23

190 / 252
Step 3 - Scoring
d b p e d i a : The_Adventures_of_Tom_Sawyer
dbo : a u t h o r
d b p e d i a : Mark_Twain ;
rdf : type
dbo : Book .
d b p e d i a : The_Zombie_Survival_Guide
dbo : a u t h o r
d b p e d i a : Max_Brooks ;
rdf : type
dbo : WrittenWork .
d b p e d i a : Web_Therapy
dbo : a u t h o r
d b p e d i a : Lisa_Kudrow ;
rdf : type
dbo : Book .
Score(Domain(dbo:author, dbo:Book))=

2
3

≈ 66.7%

Score(Domain(dbo:author, dbo:WrittenWork))=

dbo : Book

1
3

≈ 33.3%

r d f s : s u b C l a s s O f dbo : WrittenWork .

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

190 / 252
Step 3 - Scoring
d b p e d i a : The_Adventures_of_Tom_Sawyer
dbo : a u t h o r
d b p e d i a : Mark_Twain ;
rdf : type
dbo : Book .
d b p e d i a : The_Zombie_Survival_Guide
dbo : a u t h o r
d b p e d i a : Max_Brooks ;
rdf : type
dbo : WrittenWork .
d b p e d i a : Web_Therapy
dbo : a u t h o r
d b p e d i a : Lisa_Kudrow ;
rdf : type
dbo : Book .
Score(Domain(dbo:author, dbo:Book))=

2
3

≈ 66.7%

Score(Domain(dbo:author, dbo:WrittenWork))=

dbo : Book

1
3

≈ 33.3%

r d f s : s u b C l a s s O f dbo : WrittenWork .

Score(Domain(dbo:author, dbo:WrittenWork))=

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

3
3

= 100%
2013-08-23

190 / 252
Step 3 - Scoring(2)
Problem:
support for axiom in KB not taken into account

→

no dierence between 3 out of 3 and 100 out of 100

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

191 / 252
Step 3 - Scoring(2)
Problem:
support for axiom in KB not taken into account

→

no dierence between 3 out of 3 and 100 out of 100

Solution:
Average of 95% condence interval (Wald method)

s
p = m+2
+4
min(1, p + 1.96 ·

p ·(1−p ) ) max(0, p − 1.96 ·
m +4

− #success
m − #total
s

p ·(1−p ) )
m +4

In 95% of the intervals the true value is between ... and ...

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

191 / 252
Step 3 - Scoring(2)
Problem:
support for axiom in KB not taken into account

→

no dierence between 3 out of 3 and 100 out of 100

Solution:
Average of 95% condence interval (Wald method)

s
p = m+2
+4
min(1, p + 1.96 ·

p ·(1−p ) ) max(0, p − 1.96 ·
m +4

− #success
m − #total
s

p ·(1−p ) )
m +4

In 95% of the intervals the true value is between ... and ...
Score(Domain(dbo:author, dbo:Book))≈ 57.3%

Score(Domain(dbo:author, dbo:WrittenWork))≈ 69.1%

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

191 / 252
More Complex Axioms

Pattern Based Knowledge Base Enrichment, ISWC 2013
Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

192 / 252
Outlook and Summary

Schema in the Linked Data Web often shallow
support knowledge engineers

→

tools needed to

Showed some techniques for learning OWL axioms on large knowledge
bases available as SPARQL endpoints
More complex aioms require:
OWL-SPARQL rewriting or
Fragment extraction

Small- and medium sized knowledge bases can be handled via
techniques from Inductive Logic Programming
All algorithms implemented in DL-Learner framework
(http://dl-learner.org)

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

193 / 252
Outline
1

Introduction to Linked Data

2

Linked Dataset Example: DBpedia

3

Linked Data Life-Cycle Overview

4

Knowledge Extraction

5

Data Integration / Linking

Interlinking
/ Fusing
Manual
revision/
Authoring

Classification/
Enrichment

Linked Data
Lifecycle

Storage/
Querying

Evolution /
Repair

Extraction

6

Enrichment

7

Repair

8

Quality
Analysis

Knowledge Base Exploration / Querying

Lehmann, Bühmann (Univ. Leipzig)

Search/
Browsing/
Exploration

The Linked Data Life-Cycle

2013-08-23

194 / 252
Motivation

increasing number of knowledge bases in the
Semantic Web (see e.g. LOD cloud)
maintenance of knowledge bases with
expressive semantics is challenging

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

195 / 252
(Automatically) Detectable Ontology Problems

Common problems:
Syntactic Problems
Structural Problems
Semantic Problems (focus of talk)
Task Based Problems:
Reasoning Related Problems
Linked Data Related Problems

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

196 / 252
Syntactic Problems
Syntactic errors are mainly violations of conventions of the language in
which the ontology is modelled.

Example (Validity of XML)
? x m l

v e r s i o n = 1 . 0  ?

r d f : R D F

x m l n s : r d f = h t t p : / /www . w3 . o r g /1999/02/22 − r d f −

s y n t a x −n s#

x m l n s : d c= h t t p : / / p u r l . o r g / d c / e l e m e n t s

/ 1 . 1 / 

r d f : D e s c r i p t i o n

r d f : a b o u t = h t t p : / /www . w3 . o r g / 

 d c : t i t l e W o r l d

Wide Web

C o n s o r t i u m/ d c : t i t l e 

/ r d f : R D F

FatalError: The element type rdf:Description must be terminated by the
matching end-tag /rdf:Description.[Line = 7, Column = 3]

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

197 / 252
Structural Problems

Problems in the taxonomy

Example (Circularities)
A

Lehmann, Bühmann (Univ. Leipzig)

B, B

C, C

A

The Linked Data Life-Cycle

2013-08-23

198 / 252
Reasoning Related Problems
Problems which negatively aect the performance of reasoning over
expressive knowledge bases

Example (A named concept is equivalent to an AllValues restriction)
A

≡ ∀r .C

Reasoning complexity:
Universal restriction does not require to have a property value but only
restricts the values for existing property values
Any concept B for which instances cannot have r -llers satises the
restriction, i.e. B

∀r .C ,

and becomes a subclass of A

Typically leads to unintended inferences and additional inferences may
eventually slow down reasoning performance
Can be checked via Pellint (part of Pellet)

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

199 / 252
Linked Data Related Problems

Problems which are the specic to publishing RDF using the Linked Data
principles
Incorrect implementation of content negotiation
Mixing up information and non-information resources

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

200 / 252
Semantic Problems

Logical contradictions in the underlying knowledge base

Example (Unsatisable classes)
O = {A

B

C, C

¬B } |= A

⊥

Example (Inconsistent ontology)
O = {A

B

C, C

¬B , A(x )} |=

⊥

Usually handled by Ontology Debugging

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

201 / 252
Ontology Debugging
Problem: We have undesirable entailments

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

202 / 252
Ontology Debugging
Problem: We have undesirable entailments
Solution:

Repair (Delete/Modify) responsible axioms

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

202 / 252
Ontology Debugging
Problem: We have undesirable entailments
Solution:

Repair (Delete/Modify) responsible axioms

Question: Which axioms?

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

202 / 252
Ontology Debugging
Problem: We have undesirable entailments
Solution:

Repair (Delete/Modify) responsible axioms

Question: Which axioms?

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

202 / 252
Justication
Justication
For an ontology

O and an entailment η where O |= η , a set of axioms J
η in O if J ⊆ O, J |= η and if J ⊂ J then J |= η .

is

a justication for

Minimal subsets of an ontology that are sucient for a given
entailment to hold
Synonyms: MUPS (Minimal Unsatisability Preserving Sub-TBoxes),
MinAs (Minimal Axiom sets), Kernels
Observations:
there can be multiple justications for a single entailment
an axiom can be part of multiple justications

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

203 / 252
Justication - Example

O={
B
B

∃r .D

(1)

∀r .¬D

A

B

B

¬C

A

¬E

(3)
(4)

E

A

C

(2)

(5)
F

(6)

}

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

204 / 252
Justication - Example

O={
B
B

∃r .D

(1)

∀r .¬D

A

B

B

¬C

A

¬E

(3)
(4)

E

A

C

(2)

|= A

⊥

(5)
F

(6)

}

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

204 / 252
Justication - Example

O={
B
B

∃r .D

(1)

∀r .¬D

A

B

B

¬C

A

¬E

(3)
(4)

E

A

C

J1 = {1, 2, 3}

(2)

|= A

⊥

(5)
F

(6)

}

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

204 / 252
Justication - Example

O={
B
B

∃r .D

(1)

∀r .¬D

A

B

B

¬C

A

¬E

(3)
(4)

E

A

C

J1 = {1, 2, 3}

(2)

|= A

⊥

J2 = {5, 6}

(5)
F

(6)

}

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

204 / 252
Justication - Example

O={
B
B

∃r .D

(1)

∀r .¬D

A

B

B

¬C

A

¬E

(3)
(4)

E

A

C

(5)
F

J1 = {1, 2, 3}

(2)

(6)

|= A

⊥

J2 = {5, 6}
J3 = {3, 4}

}

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

204 / 252
Justication Based Repair

For a repair, at least one axiom from every justication needs to be
removed.
For a repair plan, all justications are needed.

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

205 / 252
Justication Algorithms

Single justication:
Glass Box: Modifying underlying reasoning algorithm (tableau tracing)
Black-Box: Using reasoner as oracle
All justications:
Reiter's Hitting Set Tree Algorithm (HST)

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

206 / 252
Black-Box

Expansion-Contraction Strategy
Expansion: Add axioms to empty set until entailment holds
Contraction: Remove axioms from set such that set becomes minimal
CHAPTER 3. COMPUTING JUSTIFICATIONS
54
and entailment still can be derived.

Expansion

Contraction

Key:
Axiom
Axiom in justification
Selected axiom

Figure 3.1: A Depiction of a Black-Box Expand-Contract Strategy
Source: M. Horridge:Justication

3.2

Based Explanation
Black-Box Algorithms for Computing Sin- in Ontologies(PhD
Thesis)

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

207 / 252
Hitting Set Tree Algorithm

from eld of Model Based Diagnosis
given a faulty system (ontology), it constructs nite tree whose
nodes are labelled with conict sets (justications), and whose
edges are labelled with components (axioms)

nds all minimal hitting sets, which represent diagnoses for the
conict sets in the system
diagnosis = repair

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

208 / 252
CHAPTER 3. COMPUTING JUSTIFICATIONS

63

Hitting Set Tree Algorithm - Example
O = {A

B

Figure 3.2: An Example of a Hitting Set Tree

B

D

A

∃R .C

∃R .

J2 = {A

D}

J1 = {A

|= A

∃R.C, ∃R.
A

∃R.C

{}

B, B

D}

D

A

B

B

D

J2 = {A

D}

∃R.

{}

D

A

∃R.C

{}

∃R.

∃R.C, ∃R.

D}

D

{}
Source: M. Horridge:Justication
Based Explanation in Ontologies(PhD

bottom right hand successor to the node labelled with J2 and whose successor
Thesis)
Lehmann, Bühmann (Univ. Leipzig)
2013-08-23
209 / 252
edge is labelled with ∃R. The Linked Data Life-Cycle by considering O  S where
D was generated
Justication Scenarios

A user can be faced with the following situations:
Small number of small justications
Easy and pleasant to inspect

Small number of large justications
Better than alternatives

Large number of justications
Pretty hopeless with current mechanisms
Idea: Find source of unsatisability

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

210 / 252
Root Unsatisability - Denitions

A root UC is a class whose unsatisability does not depend on another
class, otherwise it is a derived UC.
A derived UC for which there is some justication that is not a strict
superset of a justication for another UC is a partial derived UC.

Root Unsatisable Class
A class A is a root unsatisable class if there is no justication
such that

J

J |= A

is a strict superset of a justication for some other

⊥

unsatisable class.

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

211 / 252
Root Unsatisability - Approaches

Approaches:
1: compute all justications for each unsatisable class and apply the
denition

→

computationally often too expensive

2: heuristics for structural analysis of axioms

Debugging Unsatisable Classes in OWL Ontologies, Kalyanpur, Parsia, Sirin, Hendler,
J. Web Sem, 2005.

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

212 / 252
Root Unsatisability - Example

O={
B
B

∃r .D

(1)

∀r .¬D

A

B

B

¬C

A

¬E

(3)
(4)

E

A

C

(2)

(5)
F

(6)

}

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

213 / 252
Root Unsatisability - Example

O={
B
B

∃r .D

(1)

∀r .¬D

A

B

B

¬C

A

¬E

|= A

⊥

(3)
(4)

E

A

C

(2)

(5)
F

(6)

}

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

213 / 252
Root Unsatisability - Example

O={
B
B

∃r .D

(1)

∀r .¬D

A

B

B

¬C

A

¬E

(3)

|= A

⊥

J2 = {5, 6}
J3 = {3, 4}

(4)

E

A

C

(2)

J1 = {1, 2, 3}

(5)
F

(6)

}

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

213 / 252
Root Unsatisability - Example

O={
B
B

∃r .D

(1)

∀r .¬D

A

B

B

¬C

A

¬E

|= A

⊥

|= B

J2 = {5, 6}

⊥

(3)

J3 = {3, 4}

(4)

E

A

C

(2)

J1 = {1, 2, 3}

(5)
F

(6)

}

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

213 / 252
Root Unsatisability - Example

O={
B
B

∃r .D

(1)

∀r .¬D

A

B

B

¬C

A

¬E

|= A

⊥

|= B

⊥

(3)

J2 = {5, 6}
J3 = {3, 4}

(4)

E

A

C

(2)

J1 = {1, 2, 3}

(5)
F

(6)

J4 = {1, 2}

}

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

213 / 252
Root Unsatisability - Example

O={
B
B

∃r .D

(1)

∀r .¬D

A

B

B

¬C

A

¬E

|= A

⊥

|= B

⊥

(3)

J2 = {5, 6}
J3 = {3, 4}

(4)

E

A

C

(2)

J1 = {1, 2, 3}

(5)
F

(6)

J4 = {1, 2}

root

}

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

213 / 252
Root Unsatisability - Example

O={
B
B

∃r .D

(1)

∀r .¬D

A

B

B

¬C

A

¬E

|= A

⊥

|= B

⊥

(3)

J2 = {5, 6}

partial

J4 = {1, 2}

root

J3 = {3, 4}

(4)

E

A

C

(2)

J1 = {1, 2, 3}

(5)
F

(6)

}

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

213 / 252
Root Unsatisability - Example

O={
B
B

∃r .D

(1)

∀r .¬D

A

B

B

¬C

A

¬E

|= A

⊥

|= B

⊥

(3)

J2 = {5, 6}
J3 = {3, 4}

partial
(J4

⊂ J1 )

(4)

E

A

C

(2)

J1 = {1, 2, 3}

(5)
F

(6)

J4 = {1, 2}

root

}

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

213 / 252
Axiom Relevance

resolving justication requires to delete or edit axioms
ranking methods highlight the most probable causes for problems
methods:
frequency
syntactic relevance
semantic relevance

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

214 / 252
Repair Consequences

after repairing process, axioms have been deleted or modied

→

desired entailments may be lost or new entailments obtained

→

user can decide to preserve them

(including inconsistencies!)

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

215 / 252
SPARQL Endpoint Support

Previously mentioned approaches are implemented in the ORE tool
(http://ore-tool.net)
ORE supports using SPARQL endpoints
implements an incremental load procedure
knowledge base is loaded in small chunks:
count number of axioms by type
priority based loading procedure

e.g. disjointness axioms have higher priority than class assertion axioms

uses Pellet incremental reasoning

Learning of OWL Class Descriptions on Very Large Knowledge Bases,
Hellmann, Lehmann, Auer, Int. Journal Semantic Web Inf. Syst, 2009

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

216 / 252
SPARQL Endpoint Support II

algorithm performs sanity checks, e.g. SPARQL queries which probe
for typical inconsistent axiom sets
can fetch additional Linked Data
dierent termination criteria

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

217 / 252
SPARQL Endpoint Support II

algorithm performs sanity checks, e.g. SPARQL queries which probe
for typical inconsistent axiom sets
can fetch additional Linked Data
dierent termination criteria
overall:
ORE allows to apply state-of-the-art ontology debugging methods on a
larger scale than was possible previously

aims at stronger support for the  web aspect of the Semantic Web
and the high popularity of Web of Data initiative

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

217 / 252
DBpedia Live Demo
Inconsistency in DBpedia Live:

Individual: dbr:Purify_(album)
Facts: dbo:artist dbr:Axis_of_Advance
Individual: dbr:Axis_of_Advance
Types: dbo:Organisation
Class: dbo:Organisation
DisjointWith dbo:Person
ObjectProperty: dbo:artist
Range: dbo:Person

Lehmann, Bühmann (Univ. Leipzig)

The Linked Data Life-Cycle

2013-08-23

218 / 252
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle
The Linked Data Lifecycle

Contenu connexe

Tendances

Linked Data and Services
Linked Data and ServicesLinked Data and Services
Linked Data and ServicesBarry Norton
 
Linked open data project
Linked open data projectLinked open data project
Linked open data projectFaathima Fayaza
 
Open hpi semweb-06-part5
Open hpi semweb-06-part5Open hpi semweb-06-part5
Open hpi semweb-06-part5Nadine Ludwig
 
Linking Big Data to Rich Process Descriptions
Linking Big Data to Rich Process DescriptionsLinking Big Data to Rich Process Descriptions
Linking Big Data to Rich Process DescriptionsChristoph Lange
 
A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...
A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...
A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...cscpconf
 
A semantic based approach for information retrieval from html documents using...
A semantic based approach for information retrieval from html documents using...A semantic based approach for information retrieval from html documents using...
A semantic based approach for information retrieval from html documents using...csandit
 
Integrating content search with structure analysis for hypermedia retrieval a...
Integrating content search with structure analysis for hypermedia retrieval a...Integrating content search with structure analysis for hypermedia retrieval a...
Integrating content search with structure analysis for hypermedia retrieval a...unyil96
 
Smart Crawler for Efficient Deep-Web Harvesting
Smart Crawler for Efficient Deep-Web HarvestingSmart Crawler for Efficient Deep-Web Harvesting
Smart Crawler for Efficient Deep-Web Harvestingpaperpublications3
 
Deeper Inside PageRank (NOTES)
Deeper Inside PageRank (NOTES)Deeper Inside PageRank (NOTES)
Deeper Inside PageRank (NOTES)Subhajit Sahu
 
Locah Project Show and Tell
Locah Project Show and TellLocah Project Show and Tell
Locah Project Show and TellAdrian Stevenson
 
The Graph Structure of the Web - Aggregated by Pay-Level Domain
The Graph Structure of the Web - Aggregated by Pay-Level DomainThe Graph Structure of the Web - Aggregated by Pay-Level Domain
The Graph Structure of the Web - Aggregated by Pay-Level Domainoli-unima
 
Efficient Record De-Duplication Identifying Using Febrl Framework
Efficient Record De-Duplication Identifying Using Febrl FrameworkEfficient Record De-Duplication Identifying Using Febrl Framework
Efficient Record De-Duplication Identifying Using Febrl FrameworkIOSR Journals
 
SUMMA: A Common API for Linked Data Entity Summaries
SUMMA: A Common API for Linked Data Entity SummariesSUMMA: A Common API for Linked Data Entity Summaries
SUMMA: A Common API for Linked Data Entity SummariesAndreas Thalhammer
 
LinkSUM: Using Link Analysis to Summarize Entity Data
LinkSUM: Using Link Analysis to Summarize Entity DataLinkSUM: Using Link Analysis to Summarize Entity Data
LinkSUM: Using Link Analysis to Summarize Entity DataAndreas Thalhammer
 
Linked Data for Czech Legislation
Linked Data for Czech LegislationLinked Data for Czech Legislation
Linked Data for Czech LegislationMartin Necasky
 

Tendances (20)

Linked Data and Services
Linked Data and ServicesLinked Data and Services
Linked Data and Services
 
Linked open data project
Linked open data projectLinked open data project
Linked open data project
 
G5234552
G5234552G5234552
G5234552
 
At33264269
At33264269At33264269
At33264269
 
Open hpi semweb-06-part5
Open hpi semweb-06-part5Open hpi semweb-06-part5
Open hpi semweb-06-part5
 
Linking Big Data to Rich Process Descriptions
Linking Big Data to Rich Process DescriptionsLinking Big Data to Rich Process Descriptions
Linking Big Data to Rich Process Descriptions
 
A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...
A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...
A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...
 
A semantic based approach for information retrieval from html documents using...
A semantic based approach for information retrieval from html documents using...A semantic based approach for information retrieval from html documents using...
A semantic based approach for information retrieval from html documents using...
 
Integrating content search with structure analysis for hypermedia retrieval a...
Integrating content search with structure analysis for hypermedia retrieval a...Integrating content search with structure analysis for hypermedia retrieval a...
Integrating content search with structure analysis for hypermedia retrieval a...
 
Smart Crawler for Efficient Deep-Web Harvesting
Smart Crawler for Efficient Deep-Web HarvestingSmart Crawler for Efficient Deep-Web Harvesting
Smart Crawler for Efficient Deep-Web Harvesting
 
Deeper Inside PageRank (NOTES)
Deeper Inside PageRank (NOTES)Deeper Inside PageRank (NOTES)
Deeper Inside PageRank (NOTES)
 
Locah Project Show and Tell
Locah Project Show and TellLocah Project Show and Tell
Locah Project Show and Tell
 
The Graph Structure of the Web - Aggregated by Pay-Level Domain
The Graph Structure of the Web - Aggregated by Pay-Level DomainThe Graph Structure of the Web - Aggregated by Pay-Level Domain
The Graph Structure of the Web - Aggregated by Pay-Level Domain
 
Library Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic ControlLibrary Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic Control
 
Web data mining
Web data miningWeb data mining
Web data mining
 
Efficient Record De-Duplication Identifying Using Febrl Framework
Efficient Record De-Duplication Identifying Using Febrl FrameworkEfficient Record De-Duplication Identifying Using Febrl Framework
Efficient Record De-Duplication Identifying Using Febrl Framework
 
SUMMA: A Common API for Linked Data Entity Summaries
SUMMA: A Common API for Linked Data Entity SummariesSUMMA: A Common API for Linked Data Entity Summaries
SUMMA: A Common API for Linked Data Entity Summaries
 
Jarrar: Linked Data
Jarrar: Linked DataJarrar: Linked Data
Jarrar: Linked Data
 
LinkSUM: Using Link Analysis to Summarize Entity Data
LinkSUM: Using Link Analysis to Summarize Entity DataLinkSUM: Using Link Analysis to Summarize Entity Data
LinkSUM: Using Link Analysis to Summarize Entity Data
 
Linked Data for Czech Legislation
Linked Data for Czech LegislationLinked Data for Czech Legislation
Linked Data for Czech Legislation
 

Similaire à The Linked Data Lifecycle

Bourne RDAP11 Data Publication Repositories
Bourne RDAP11 Data Publication RepositoriesBourne RDAP11 Data Publication Repositories
Bourne RDAP11 Data Publication RepositoriesASIS&T
 
Adoption of the Linked Data Best Practices in Different Topical Domains
Adoption of the Linked Data Best Practices in Different Topical DomainsAdoption of the Linked Data Best Practices in Different Topical Domains
Adoption of the Linked Data Best Practices in Different Topical DomainsChris Bizer
 
The linked data value chain atif
The linked data value chain atifThe linked data value chain atif
The linked data value chain atifAtif Latif
 
Linked Data: Opportunities for Entrepreneurs
Linked Data: Opportunities for EntrepreneursLinked Data: Opportunities for Entrepreneurs
Linked Data: Opportunities for Entrepreneurs3 Round Stones
 
Linking Media and Data using Apache Marmotta (LIME workshop keynote)
Linking Media and Data using Apache Marmotta  (LIME workshop keynote)Linking Media and Data using Apache Marmotta  (LIME workshop keynote)
Linking Media and Data using Apache Marmotta (LIME workshop keynote)LinkedTV
 
Linked Data Overview - AGI Technical SIG
Linked Data Overview - AGI Technical SIGLinked Data Overview - AGI Technical SIG
Linked Data Overview - AGI Technical SIGChris Ewing
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Anja Jentzsch
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseRDTF-Discovery
 
Linked Media and Data Using Apache Marmotta
Linked Media and Data Using Apache MarmottaLinked Media and Data Using Apache Marmotta
Linked Media and Data Using Apache MarmottaSebastian Schaffert
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale Bernadette Hyland-Wood
 
Llinked open data training for EU institutions
Llinked open data training for EU institutionsLlinked open data training for EU institutions
Llinked open data training for EU institutionsOpen Data Support
 
Discovering Resume Information using linked data  
Discovering Resume Information using linked data  Discovering Resume Information using linked data  
Discovering Resume Information using linked data  dannyijwest
 
Linked Data and Users in Library - Does the library communicate efficiently?
Linked Data and Users in Library - Does the library communicate efficiently?Linked Data and Users in Library - Does the library communicate efficiently?
Linked Data and Users in Library - Does the library communicate efficiently?Hansung University
 
Online Presentation
Online PresentationOnline Presentation
Online Presentationnw13
 
Linked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesLinked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesOpen Data Support
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
 
2011 05-02 linked data intro
2011 05-02 linked data intro2011 05-02 linked data intro
2011 05-02 linked data introvafopoulos
 
Omitola birmingham cityuniv
Omitola birmingham cityunivOmitola birmingham cityuniv
Omitola birmingham cityunivTope Omitola
 
Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database  Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database dannyijwest
 

Similaire à The Linked Data Lifecycle (20)

Bourne RDAP11 Data Publication Repositories
Bourne RDAP11 Data Publication RepositoriesBourne RDAP11 Data Publication Repositories
Bourne RDAP11 Data Publication Repositories
 
RDAP 033111
RDAP 033111RDAP 033111
RDAP 033111
 
Adoption of the Linked Data Best Practices in Different Topical Domains
Adoption of the Linked Data Best Practices in Different Topical DomainsAdoption of the Linked Data Best Practices in Different Topical Domains
Adoption of the Linked Data Best Practices in Different Topical Domains
 
The linked data value chain atif
The linked data value chain atifThe linked data value chain atif
The linked data value chain atif
 
Linked Data: Opportunities for Entrepreneurs
Linked Data: Opportunities for EntrepreneursLinked Data: Opportunities for Entrepreneurs
Linked Data: Opportunities for Entrepreneurs
 
Linking Media and Data using Apache Marmotta (LIME workshop keynote)
Linking Media and Data using Apache Marmotta  (LIME workshop keynote)Linking Media and Data using Apache Marmotta  (LIME workshop keynote)
Linking Media and Data using Apache Marmotta (LIME workshop keynote)
 
Linked Data Overview - AGI Technical SIG
Linked Data Overview - AGI Technical SIGLinked Data Overview - AGI Technical SIG
Linked Data Overview - AGI Technical SIG
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcase
 
Linked Media and Data Using Apache Marmotta
Linked Media and Data Using Apache MarmottaLinked Media and Data Using Apache Marmotta
Linked Media and Data Using Apache Marmotta
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale
 
Llinked open data training for EU institutions
Llinked open data training for EU institutionsLlinked open data training for EU institutions
Llinked open data training for EU institutions
 
Discovering Resume Information using linked data  
Discovering Resume Information using linked data  Discovering Resume Information using linked data  
Discovering Resume Information using linked data  
 
Linked Data and Users in Library - Does the library communicate efficiently?
Linked Data and Users in Library - Does the library communicate efficiently?Linked Data and Users in Library - Does the library communicate efficiently?
Linked Data and Users in Library - Does the library communicate efficiently?
 
Online Presentation
Online PresentationOnline Presentation
Online Presentation
 
Linked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesLinked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and Examples
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
 
2011 05-02 linked data intro
2011 05-02 linked data intro2011 05-02 linked data intro
2011 05-02 linked data intro
 
Omitola birmingham cityuniv
Omitola birmingham cityunivOmitola birmingham cityuniv
Omitola birmingham cityuniv
 
Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database  Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database
 

Plus de geoknow

Esta ld -exploring-spatio-temporal-linked-statistical-data
Esta ld -exploring-spatio-temporal-linked-statistical-dataEsta ld -exploring-spatio-temporal-linked-statistical-data
Esta ld -exploring-spatio-temporal-linked-statistical-datageoknow
 
Sdwwg experiences and outlook
Sdwwg experiences and outlookSdwwg experiences and outlook
Sdwwg experiences and outlookgeoknow
 
Spatial data web application in Suppliy Chain Management
Spatial data web application in Suppliy Chain ManagementSpatial data web application in Suppliy Chain Management
Spatial data web application in Suppliy Chain Managementgeoknow
 
Generator workbench
Generator workbenchGenerator workbench
Generator workbenchgeoknow
 
Geold2015 wauer
Geold2015 wauerGeold2015 wauer
Geold2015 wauergeoknow
 
Facete - Exploring the web of spatial data with facete
Facete - Exploring the web of spatial data with faceteFacete - Exploring the web of spatial data with facete
Facete - Exploring the web of spatial data with facetegeoknow
 
ESTA-LD exploring spatio-temporal linked statistical data
ESTA-LD exploring spatio-temporal linked statistical dataESTA-LD exploring spatio-temporal linked statistical data
ESTA-LD exploring spatio-temporal linked statistical datageoknow
 
Towards Transfer Learning of Link Specifications
Towards Transfer Learning of Link SpecificationsTowards Transfer Learning of Link Specifications
Towards Transfer Learning of Link Specificationsgeoknow
 
Can we crate better links playing games?
Can we crate better links playing games?Can we crate better links playing games?
Can we crate better links playing games?geoknow
 
LinkedGeoData and GeoKnow
LinkedGeoData and GeoKnowLinkedGeoData and GeoKnow
LinkedGeoData and GeoKnowgeoknow
 
LinkedGeodata (Deutsch)
LinkedGeodata (Deutsch)LinkedGeodata (Deutsch)
LinkedGeodata (Deutsch)geoknow
 
Geo know general presentation 2013
Geo know general presentation 2013Geo know general presentation 2013
Geo know general presentation 2013geoknow
 
Geo know odw13-presentation
Geo know odw13-presentationGeo know odw13-presentation
Geo know odw13-presentationgeoknow
 

Plus de geoknow (13)

Esta ld -exploring-spatio-temporal-linked-statistical-data
Esta ld -exploring-spatio-temporal-linked-statistical-dataEsta ld -exploring-spatio-temporal-linked-statistical-data
Esta ld -exploring-spatio-temporal-linked-statistical-data
 
Sdwwg experiences and outlook
Sdwwg experiences and outlookSdwwg experiences and outlook
Sdwwg experiences and outlook
 
Spatial data web application in Suppliy Chain Management
Spatial data web application in Suppliy Chain ManagementSpatial data web application in Suppliy Chain Management
Spatial data web application in Suppliy Chain Management
 
Generator workbench
Generator workbenchGenerator workbench
Generator workbench
 
Geold2015 wauer
Geold2015 wauerGeold2015 wauer
Geold2015 wauer
 
Facete - Exploring the web of spatial data with facete
Facete - Exploring the web of spatial data with faceteFacete - Exploring the web of spatial data with facete
Facete - Exploring the web of spatial data with facete
 
ESTA-LD exploring spatio-temporal linked statistical data
ESTA-LD exploring spatio-temporal linked statistical dataESTA-LD exploring spatio-temporal linked statistical data
ESTA-LD exploring spatio-temporal linked statistical data
 
Towards Transfer Learning of Link Specifications
Towards Transfer Learning of Link SpecificationsTowards Transfer Learning of Link Specifications
Towards Transfer Learning of Link Specifications
 
Can we crate better links playing games?
Can we crate better links playing games?Can we crate better links playing games?
Can we crate better links playing games?
 
LinkedGeoData and GeoKnow
LinkedGeoData and GeoKnowLinkedGeoData and GeoKnow
LinkedGeoData and GeoKnow
 
LinkedGeodata (Deutsch)
LinkedGeodata (Deutsch)LinkedGeodata (Deutsch)
LinkedGeodata (Deutsch)
 
Geo know general presentation 2013
Geo know general presentation 2013Geo know general presentation 2013
Geo know general presentation 2013
 
Geo know odw13-presentation
Geo know odw13-presentationGeo know odw13-presentation
Geo know odw13-presentation
 

Dernier

Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsRommel Regala
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
EMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxEMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxElton John Embodo
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxruthvilladarez
 

Dernier (20)

Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World Politics
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
EMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxEMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docx
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docx
 

The Linked Data Lifecycle

  • 1. The Linked Data Life-Cycle Jens Lehmann Quan Nguyen Sebastian Hellmann Claus Stadler Lorenz Bühmann contributors: Sören Auer Anja Jentzsch Christina Unger Richard Cyganiak Dimitris Kontokostas Daniel Gerber Axel Ngonga 2013-08-23 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 1 / 252
  • 2. Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking Interlinking / Fusing Manual revision/ Authoring Classification/ Enrichment Linked Data Lifecycle Storage/ Querying Evolution / Repair Extraction 6 Enrichment 7 Repair 8 Quality Analysis Knowledge Base Exploration / Querying Lehmann, Bühmann (Univ. Leipzig) Search/ Browsing/ Exploration The Linked Data Life-Cycle 2013-08-23 2 / 252
  • 3. Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking Interlinking / Fusing Manual revision/ Authoring Classification/ Enrichment Linked Data Lifecycle Storage/ Querying Evolution / Repair Extraction 6 Enrichment 7 Repair 8 Quality Analysis Knowledge Base Exploration / Querying Lehmann, Bühmann (Univ. Leipzig) Search/ Browsing/ Exploration The Linked Data Life-Cycle 2013-08-23 3 / 252
  • 4. The Linked Data Principles The term Linked Data refers to a set of best practices for publishing and interlinking structured data on the Web. Linked Data principles: 1 Use URIs as names for things. 2 Use HTTP URIs, so that people can look up those names. 3 When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL). 4 Include links to other URIs, so that they can discover more things. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 4 / 252
  • 5. LOD Cloud Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 5 / 252
  • 6. Linked Data Principles Detailed: 1 + 2 1 URI references to identify not just Web documents and digital content, but also real world objects and abstract concepts tangible things: people, places abstract things: relationship type of knowing somebody 2 HTTP URIs enable re-use of Web architecture Linked Data gives emphasis to the Web in Semantic Web Resource dereferencing Re-use of standard tools for security, load-balancing etc. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 6 / 252
  • 7. Principles Detailed: 3 Content Negotiation Humans and machines should be able to retrieve appropirate representations of resources: machines Lehmann, Bühmann (Univ. Leipzig) HTML for humans, RDF for The Linked Data Life-Cycle 2013-08-23 7 / 252
  • 8. Principles Detailed: 3 Content Negotiation Humans and machines should be able to retrieve appropirate representations of resources: machines HTML for humans, RDF for Achievable using an HTTP mechanism called content negotiation Basic idea: HTTP client sends HTTP headers with each request to indicate what kinds of documents they prefer Servers can inspect headers and select appropriate response Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 7 / 252
  • 9. Principles Detailed: 3 Content Negotiation Humans and machines should be able to retrieve appropirate representations of resources: machines HTML for humans, RDF for Achievable using an HTTP mechanism called content negotiation Basic idea: HTTP client sends HTTP headers with each request to indicate what kinds of documents they prefer Servers can inspect headers and select appropriate response Two strategies: 303 URIs Hash URIs Both ensure that objects and the documents that describe them are not confused + humans and machines can retrieve appropriate representations Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 7 / 252
  • 10. 303 URIs 303 Redirect: instead of sending the object itself over the network, the server responds to the client with the HTTP response code 303 See Other and the URI of a Web document which describes the real-world object Second step: client dereferences new URI and gets a Web document describing the real-world object Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 8 / 252
  • 11. Hash URIs Hash URI strategy builds on characteristic that URIs may contain a special part ( fragment identier) separated from their base part by a hash symbol (#) HTTP protocol requires the fragment part to be stripped o before requesting the URI from the server → a URI that includes a hash cannot be retrieved directly and therefore does not necessarily identify a Web document Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 9 / 252
  • 12. Hash versus 303 Hash Uris (+) Reduced number of necessary HTTP round-trips → reduces access latency (-) Descriptions of all resources sharing the same non-fragment URI part are always returned to the client together → can lead to large amounts of data being unnecessarily transmitted to the client 303 Uris (+) Flexible because the redirection target can be congured separately for each resource (usually points to a single document for each resource, but could also summarise several resources) (-) Requires two HTTP requests to retrieve a single description of a real-world object Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 10 / 252
  • 13. Principles Detailed: 4 Links If an RDF triple connects URIs in dierent namespaces/datasets, is is called a link (no unique syntactical denition of link exists) Basic idea of Linked Data: apply the general hyperlink-based architecture of the World Wide Web to the task of sharing structured data on global scale Research challenge: ecient creation of links with high precision and recall Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 11 / 252
  • 14. Why Linked Data? Problem: Try to search for these things on the current Web: Apartments near German-Russian bilingual childcare in Leipzig. ERP service providers with oces in Vienna and London. Researchers working on multimedia topics in Eastern Europe. Information is available on the Web, but opaque to current Web search. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 12 / 252
  • 15. Why Linked Data? Problem: Try to search for these things on the current Web: Apartments near German-Russian bilingual childcare in Leipzig. ERP service providers with oces in Vienna and London. Researchers working on multimedia topics in Eastern Europe. Information is available on the Web, but opaque to current Web search. Solution: complement text on Web pages with structured linked open data intelligently combine/integrate such structured information from dierent sources: Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 13 / 252
  • 16. How to get there? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 14 / 252
  • 17. Tim Berners-Lee's 5-star plan Tim Berners-Lee's 5-star plan for an open web of data Make data available on the Web under an open license Make it available as structured data Use a non-proprietary format Use URIs to identify things Link your data to other people's data to provide context Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 15 / 252
  • 18. The 0th star Data catalog with good metadata Make your data ndable Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 16 / 252
  • 19. Data on the Web, Open License ���������� ���� �������� Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 17 / 252
  • 20. Data on the Web, Open License Open vs. Closed: Data used to be closed by default In the future, it may be open by default. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 18 / 252
  • 21. Data on the Web, Open License Publishers: sharing data to make it more visible Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 19 / 252
  • 22. Data on the Web, Open License E-Commerce: Data sharing for increasing trac Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 20 / 252
  • 23. Data on the Web, Open License Community: Collaboratively created databases Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 21 / 252
  • 24. Good reasons against opening data Privacy Competitive advantage Producing data and charging for it as business model Can't get license from upstream Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 22 / 252
  • 25. Structured Data Enabling re-use: Delivering data to end users in dierent forms Combining data with other data 3rd party analysis of data Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 23 / 252
  • 26. Structured Data Formats: Good for re-use / Structured: MS Excel, CSV, XML, JSON, Microdata Not so good for re-use: Pure websites, MS Word Bad for re-use: PDF Really bad for re-use: Only charts/maps without numbers Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 24 / 252
  • 27. �������� �������������� Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 25 / 252
  • 28. Non-Proprietary Formats Specialist tools often have specialist formats Few people have the tools Expensive Dicult to re-use (Geospatial tools, statistics packages, etc.) Non-proprietary: CSV (dead simple) XML JSON RDF (good for 4+5 stars) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 26 / 252
  • 31. URIs as Identiers URI-Design: prefer stable, implementation independent URIs Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 29 / 252
  • 32. URIs as Identiers Turning local identiers into URIsWhy? Make them globally unique Clarify auhority Make them resolvable Make them linkable Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 30 / 252
  • 33. Links to Other Data Hyperlinks are the soul of the Web. The Web of Data is no dierent. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 31 / 252
  • 34. Links to Other Data Hyperlinks are the soul of the Web. The Web of Data is no dierent. ���� ����� ������� ����������������������������� �������� Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 31 / 252
  • 35. Summary Linked Data Principles: 1 Use URIs to name things (not only documents, but also people, locations, concepts, etc.) 2 To enable agents (human users and machine agents alike) to look up those names, 3 use HTTP URIs When someone looks up a URI, provide useful information (structured data in RDF, SPARQL). 4 Include links to other URIs allowing agents to discover more things Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 32 / 252
  • 36. Summary Linked Data Principles: 1 Use URIs to name things (not only documents, but also people, locations, concepts, etc.) 2 To enable agents (human users and machine agents alike) to look up those names, 3 use HTTP URIs When someone looks up a URI, provide useful information (structured data in RDF, SPARQL). links to other URIs allowing agents to discover more things 5-Star-Data: 4 Include Five-star plan for realising an emerging web of data, dataset by dataset 2 stars: re-usable data 3 stars: open standards 4+5 stars: connect data silos Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 32 / 252
  • 37. Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking Interlinking / Fusing Manual revision/ Authoring Classification/ Enrichment Linked Data Lifecycle Storage/ Querying Evolution / Repair Extraction 6 Enrichment 7 Repair 8 Quality Analysis Knowledge Base Exploration / Querying Lehmann, Bühmann (Univ. Leipzig) Search/ Browsing/ Exploration The Linked Data Life-Cycle 2013-08-23 33 / 252
  • 38. DBpedia Community eort to extract structured information from Wikipedia and to make this information available on the Web Allows to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data Semi-structured Wiki markup Lehmann, Bühmann (Univ. Leipzig) → structured information The Linked Data Life-Cycle 2013-08-23 34 / 252
  • 39. Wikipedia Limitations Simple Questions hard to answer with Wikipedia: What have Innsbruck and Leipzig in common? Who are mayors of central European towns elevated more than 1000m? Which movies are starring both Brad Pitt and Angelina Jolie? All soccer players, who played as goalkeeper for a club that has a stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 35 / 252
  • 40. Structure in Wikipedia Title Abstract Infoboxes Geo-coordinates Categories Images Links other language versions other Wikipedia pages To the Web Redirects Disambiguation ... Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 36 / 252
  • 41. DBpedia Information Extraction Framework DBpedia Information Extraction Framework (DIEF) Started in 2007 Hosted on Sourceforge and Github Initially written in PHP but fully re-written Written in Scala and Java Around 40 Contributors See https://www.ohloh.net/p/dbpedia for detailed overview Can potentially be adapted to other MediaWikis Currently Wiktionary Lehmann, Bühmann (Univ. Leipzig) http://wiktionary.dbpedia.org The Linked Data Life-Cycle 2013-08-23 37 / 252
  • 42. DIEF - Overview Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 38 / 252
  • 43. DIEF - Raw Infobox Extractor WikiText syntax {{Infobox Korean settlement |title = Busan Metropolitan City ... |area_km2 = 763.46 |pop = 3635389 |region = [[Yeongnam]] }} RDF serialization dbp:Busan dbp:title Busan Metropolitan City dbp:Busan dbp:area_km2 763.46^xsd:oat dbp:Busan dbp:pop 3635389^xsd:int dbp:Busan dbp:region dbp:Yeongnam Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 39 / 252
  • 44. DIEF - Raw Infobox Extractor/Diversity Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 40 / 252
  • 45. DIEF - Raw Infobox extractor/Diversity Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 41 / 252
  • 46. DIEF - Mapping-Based Infobox Extractor Cleaner data: Combine what belongs together (birth_place, birthplace) Separate what is dierent (bornIn, birthplace) Correct handling of datatypes Mappings Wiki: http://mappings.dbpedia.org Everybody can contribute to new mappings or improve existing ones ≈ 170 editors Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 42 / 252
  • 47. DIEF - Mapping-Based Infobox Extractor Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 43 / 252
  • 48. URI/IRI schemes http://{lang.}dbpedia.org is the main domain For every article there exists a DBpedia resource in the form: http://lang.dbpedia.org/resource/{ArticleName} Properties from the raw infobox extractor use the http://{lang.}dbpedia.org/property/namespace Ontology is global for all languages and under http://dbpedia.org/ontology/namespace Note: that for English language no language code is used http://dbpedia.org as main domain http://dbpedia.org/resource/{title} for articles http://dbpedia.org/property/{title} for properties Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 44 / 252
  • 49. Linked Data Publication via 303 Redirects http://dbpedia.org/resource/Dresden - URI of the city of Dresden http://dbpedia.org/page/Dresden - information resource describing the city of Dresden in HTML format http://dbpedia.org/data/Dresden - information resource describing the city of Dresden in RDF/XML format further formats supported, e.g. http://dbpedia.org/data/Dresden.n3 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle for N3 2013-08-23 45 / 252
  • 50. DBpedia Links Data set Predicate Amsterdam Museum owl:sameAs BBC Wildlife Finder owl:sameAs Book Mashup rdf:type Count Tool 627 S 444 S 9 100 owl:sameAs Bricklink dc:publisher 10 100 CORDIS owl:sameAs 314 S Dailymed owl:sameAs 894 S DBLP Bibliography owl:sameAs 196 S DBTune owl:sameAs 838 S Diseasome owl:sameAs 2 300 S Drugbank owl:sameAs 4 800 S EUNIS owl:sameAs 3 100 S Eurostat (Linked Stats) owl:sameAs 253 S Eurostat (WBSG) owl:sameAs 137 CIA World Factbook owl:sameAs 545 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle S 2013-08-23 46 / 252
  • 51. DBpedia Links Data set Predicate ickr wrappr dbp:hasPhoto- Count Tool 3 800 000 C 3 600 000 C Collection Freebase owl:sameAs GADM owl:sameAs 1 900 GeoNames owl:sameAs 86 500 S GeoSpecies owl:sameAs 16 000 S GHO owl:sameAs 196 L Project Gutenberg owl:sameAs 2 500 S Italian Public Schools owl:sameAs 5 800 S LinkedGeoData owl:sameAs 103 600 S LinkedMDB owl:sameAs 13 800 S MusicBrainz owl:sameAs 23 000 New York Times owl:sameAs 9 700 OpenCyc owl:sameAs 27 100 C OpenEI (Open Energy) owl:sameAs 678 S Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 47 / 252
  • 52. DBpedia Links Data set Predicate Revyu owl:sameAs 6 Sider owl:sameAs 2 000 TCMGeneDIT owl:sameAs 904 UMBEL rdf:type US Census owl:sameAs WikiCompany owl:sameAs WordNet dbp:wordnet_type YAGO2 rdf:type Sum Count Tool S 896 400 12 600 8 300 467 100 18 100 000 27 211 732 (S: Silk, L: LIMES, C: custom script, missing: no regeneration) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 48 / 252
  • 53. DBpedia Links - Query Example Compare funding per year (from FTS) and country with the gross domestic product of a country (from DBpedia) SELECT ∗ { { SELECT ? f t s y e a r ? dbpcountry ? com rdf : type ? com fts ? year fts −o : y e a r ? year rdfs : label (SUM( ? amount ) −o : Commitment . fts ? ftscountry o w l : sameAs SELECT ? d b p c o u n t r y ? dbpcountry ? gdpyear . } ? gdpnominal . . { ? dbpcountry rdf : type ? dbpcountry dbp : g d p N o m i n a l ? dbpcountry } { . ? ftsyear −o : d e t a i l A m o u n t ? amount . ? b e n e f i t f t s −o : b e n e f i c i a r y ? b e n e f i c i a r y ? b e n e f i c i a r y f t s −o : c o u n t r y ? f t s c o u n t r y ? benefit AS ? f u n d i n g ) . d bo : C o u n t r y dbp : g d p N o m i n a l Y e a r } { . ? gdpnominal ? gdpyear . . } FILTER ((? ftsyear Lehmann, Bühmann (Univ. Leipzig) = s t r (? gdpyear ) ) } The Linked Data Life-Cycle 2013-08-23 49 / 252
  • 54. Infrastructure DBpedia has two extraction modes: Wikipedia-database-dump-based extraction DBpedia Live synchronisation (more later) DBpedia Dumps: The DBpedia Dump archive is located in: http://downloads.dbpedia.org/ Latest downloads is described in: http://dbpedia.org/Downloads Ocial Endpoint (by OpenLink): http://dbpedia.org/sparql Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 50 / 252
  • 55. Query Answering Back to our Wikipedia questions: What have Innsbruck and Leipzig in common? Who are mayors of central European towns elevated more than 1000m? Which movies are starring both Brad Pitt and Angelina Jolie? All soccer players, who played as goalkeeper for a club that has a stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants Using the data extracted from Wikipedia and the public SPARQL endpoint DBpedia can answer these questions. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 51 / 252
  • 56. DBpedia Live DBpedia dumps are generated on a bi-annual basis Wikipedia has around 100,000 150,000 page edits per day DBpedia Live pulls page updates in real-time and extraction results update the triple store In practice, a 5 minute update delay increases performance by 15% Links http://live.dbpedia.org/sparql Documentation: http://wiki.dbpedia.org/DBpediaLive Statistics: http://live.dbpedia.org/LiveStats/ SPARQL Endpoint: Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 52 / 252
  • 57. DBpedia Live - Overview Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 53 / 252
  • 58. DBpedia Internationalization (I18n) DBpedia Internationalization Committee founded: http://wiki.dbpedia.org/Internationalization Available DBpedia language editions in: Korean, Greek, German, Polish, Russian, Dutch, Portuguese, Spanish, Italian, Japanese, French Use the corresponding Wikipedia language edition for input Mappings available for 23 languages Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 54 / 252
  • 59. DBpedia I18n - Overview Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 55 / 252
  • 60. Applications: Disambiguation Named entity recognition and disambiguation Tools such as: DBpedia Spotlight, AlchemyAPI, Semantic API, Open Calais, Zemanta and Apache Stanbol Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 56 / 252
  • 61. Applications: Question Answering DBpedia is the primary target for several QA systems in the Question Answering over Linked Data (QALD) workshop series IBM Watson relied also on DBpedia Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 57 / 252
  • 62. Applications: Faceted Browsing Neofonie Browser gFacet OpenLink faceted browser (fct) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 58 / 252
  • 63. Applications: Search and Querying Query Builder RelFinder SemLens Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 59 / 252
  • 64. Applications: Digital Libraries Archives Virtual International Authority Files (VIAF) project as Linked Data VIAF added a total of 250,000 reciprocal authority links to Wikipedia. DBpedia can also provide: Context information for bibliographic and archive records (e.g. an author's demographics, a lm's homepage, an image etc.) Stable and curated identiers for linking. The broad range of Wikipedia topics can form the basis for a thesaurus for subject indexing. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 60 / 252
  • 65. Applications: DBpedia Mobile DBpedia Mobile is a location-centric DBpedia client application for mobile devices consisting of a map view, the Marbles Linked Data Browser and a GPS-enabled launcher application. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 61 / 252
  • 66. Applications: DBpedia Wiktionary Wiktionary is a Wikimedia project: http://wiktionary.org 171 languages, 3M words for English. Extracted Using the DBpedia Information Extraction Framework Easily congurable for every Wiktionary language edition Pre-congured for German, Greek, English, Russian and French. http://Wiktionary.dbpedia.org 100 milion triples Lemon model Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 62 / 252
  • 67. Other Applications See http://wiki.dbpedia.org/Applications Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle for a more complete list 2013-08-23 63 / 252
  • 68. Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking Interlinking / Fusing Manual revision/ Authoring Classification/ Enrichment Linked Data Lifecycle Storage/ Querying Evolution / Repair Extraction 6 Enrichment 7 Repair 8 Quality Analysis Knowledge Base Exploration / Querying Lehmann, Bühmann (Univ. Leipzig) Search/ Browsing/ Exploration The Linked Data Life-Cycle 2013-08-23 64 / 252
  • 69. Linked Data - Achievements and Challenges Achievements: 1 2 3 data commons (50B facts) vibrant, global RTD community Industrial uptake begins (e.g. Extension of the Web with a BBC, Thomson Reuters, Eli Lilly, Challenges: 1 Coherence: 2 4 5 Governmental adoption in sight Establishing Linked Data as a deployment path for the Semantic Web. Quality: partly low quality data and inconsistencies 3 NY Times, Facebook, Google, Yahoo) Relatively few, expensively maintained links Performance: Still substantial penalties compared to relational 4 Data consumption: large-scale processing, schema mapping and data fusion still in its infancy 5 Usability: Missing direct end-user tools and network eect. These issues are closely related and should ultimately lead to an ecosystem of interlinked knowledge! Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 65 / 252
  • 70. Interlinking / Fusing Manual revision/ Authoring Classification/ Enrichment Linked Data Lifecycle Storage/ Querying Quality Analysis Evolution / Repair Extraction Search/ Browsing/ Exploration Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 66 / 252
  • 71. Extraction Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 67 / 252
  • 72. Extraction From unstructured sources Formats: plain text Methods: NLP, text mining, ontology learning From semi-structured sources Formats: wiki markup, tags Tools: DBpedia framework (Wikipedia, Wictionary) From structured sources Formats: databases, spreadsheets, XML RDB2RDF tools: Sparqlify, D2R, Triplify CSV converters: RDF extension of Google Rene Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 68 / 252
  • 73. Extraction Challenges From unstructured sources Improve F-Measure of existing NLP approaches (OpenCalais, Ontos API) Develop standardized, LOD enabled interfaces between NLP tools (NLP2RDF) From semi-structured sources Ecient bi-directional synchronization From structured sources Declarative syntax and semantics of data model transformations (W3C WG RDB2RDF) Orthogonal challenges Using LOD as background knowledge Provenance Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 69 / 252
  • 74. 1234567859A8BC74DE96 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 70 / 252
  • 75. RDF Data Management From unstructured sources SPARQL RDF access still by a factor 2-10 slower than relational data management Performance increases steadily Comprehensive, well-supported open-soure and commercial implementations are available: OpenLink's Virtuoso (os+commercial) OWLIM-Lite (free), OWLIM-SE, OWLIM-Enterprise Talis (hosted) Bigdata (distributed) Allegrograph (commercial) Mulgara (os) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 71 / 252
  • 76. Storage and Querying Challenges Reduce the performance gap between relational and RDF data management SPARQL Query extensions: Spatial/semantic/temporal data management View maintenance / adaptive reorganization based on common access patterns More realistic benchmarks Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 72 / 252
  • 77. Authoring Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 73 / 252
  • 78. Authoring Integrated in Existing Environments: Tiki Data oriented: RDFauthor, rdfEditor Schema oriented: Protégé, TopBraid Composer, NeOn Toolkit, Swoop, Neologism, Knoodl Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 74 / 252
  • 79. Authoring: Semantic Wikis 1 Semantic (Text) Wikis Authoring of semantically annotated texts Semantic MediaWiki, KiWi, (Wikipedia+DBpedia) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 75 / 252
  • 80. Authoring: Semantic Wikis 1 Semantic (Text) Wikis Authoring of semantically annotated texts Semantic MediaWiki, KiWi, (Wikipedia+DBpedia) 2 Semantic Data Wikis Direct authoring of structured information (i.e. RDF, RDF-Schema, OWL) OntoWiki Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 75 / 252
  • 81. 1234235 123345647347829A2B8CDDB2EFCC22F Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 76 / 252
  • 82. Interlinking Data Web is an uncontrolled environment proliferation of equivalent or similar entities need for links / merging Currently only few RDF triples are links Manual Link Discovery: Sindice Integration, LODStats, Semantic Pingback Tool supported / Semi-Automatic: SILK, LIMES, COMA, RDF-AI Usually via mapping specications / heuristics Machine Learning / Automatic: RAVEN, EAGLE, SILK GP Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 77 / 252
  • 83. Interlinking Challenges Apply work in the de-duplication/record linkage literature Consider the open world nature of Linked Data Use LOD background knowledge Zero-conguration linking Explore active learning approaches, which integrate users in a feedback loop Maintain a 24/7 linking service: Linked Open Data Around-The-Clock project (http://latc-project.eu/) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 78 / 252
  • 84. 1234567829 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 79 / 252
  • 85. Enrichment Currently, lack of knowledge bases with sophisticated schema information and instance data adhering to this schema Goal: powerful reasoning, consistency checking and querying Manual: Via ontology editors, DBpedia mappings (Semi-)Automatic: DL-Learner, Statistical Schema Induction Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 80 / 252
  • 86. Enrichment: Example Given: knowledge base with property birthPlace (i.e. triples using that property) but no information on the semantics of birthPlace Possibly enrichment: ObjectProperty: birthPlace Characteristics: Functional Domain: Person Range: Place SubPropertyOf: hasBeenAt Benets: axioms serve as documentation for purpose and correct usage of schema elements additional implicit information can be inferred improve the applicability of schema debugging techniques Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 81 / 252
  • 87. Repair Ontology Debugging: OWL reasoning to detect inconsistencies and satisable classes + detect the most likely sources for the problems basic task: provide feedback to user for resolving undesired entailments justication J ⊆O of an entailment is a minimal set of axioms from which the entailment can be drawn Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 82 / 252
  • 88. 1234567 89347A5A Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 83 / 252
  • 89. Linked Data Quality Analysis Quality on the Data Web is varying a lot Hand crafted or expensively curated knowledge base (e.g. DBLP, UMLS) vs. extracted from text or Web 2.0 sources (DBpedia) Quality = Fitness for use Often not necessary to x all problems, but to know about them 30+ quality dimensions dened in recent survey Research Challenge Establish measures for assessing the authority, provenance, reliability of Data Web resources Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 84 / 252
  • 90. Evolution Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle © CC-BY-SA by alasis on flickr) 2013-08-23 85 / 252
  • 91. KB Evolution Tasks: Performing knowledge base changes / refactoring Ensuring consistency of related knowledge Managing changes, e.g. undo operations Update materialized inferred data upon changes Update materialised links to other data upon changes Tools: Protégé - PROMPT and change management plugins EvoPat - easily re-usable and sharable evolution patterns dened via SPARQL PatOMat - ontology transformation framework Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 86 / 252
  • 92. 1234567895A Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 87 / 252
  • 93. Exploration RDF data can be complex (as discussed by Pascal Hitzler) Exploration phase aims to make data accessible to non-experts Options: Faceted Browsing Question Answering Query Builders Visualisation of statistical or geospatial data ... Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 88 / 252
  • 94. Catalogus Professorum Lipsiensis Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 89 / 252
  • 95. Visual Query Builder Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 90 / 252
  • 96. Relationship Finder in CPL Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 91 / 252
  • 97. Interlinking / Fusing Manual revision/ Authoring Classification/ Enrichment Linked Data Lifecycle Storage/ Querying Quality Analysis Evolution / Repair Extraction Search/ Browsing/ Exploration Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 92 / 252
  • 98. Make the Web a Linked Data Washing Machine Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 93 / 252
  • 99. Tool Support for Life-Cycle? Many SW tools support one or more life-cycle stages Linked Data Stack (http://stack.linkeddata.org) provides a consolidated repository of such tools Each tool is a Debian package Lightweight integration between tools via common vocabularies and SPARQL Demonstrator interfaces for showing tools in combination Developed by LOD2 and GeoKnow EU projects Geo Know Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 94 / 252
  • 100. Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking Interlinking / Fusing Manual revision/ Authoring Classification/ Enrichment Linked Data Lifecycle Storage/ Querying Evolution / Repair Extraction 6 Enrichment 7 Repair 8 Quality Analysis Knowledge Base Exploration / Querying Lehmann, Bühmann (Univ. Leipzig) Search/ Browsing/ Exploration The Linked Data Life-Cycle 2013-08-23 95 / 252
  • 101. Knowledge Extraction Knowledge Extraction is the creation of knowledge from structured (relational databases, XML) and unstructured (text, documents, images) sources. Resulting knowledge needs to be in a machine-readable and machine-interpretable format and facilitate inferencing Similar to Information Extraction (NLP) and ETL (Data Warehouse), but main dierence: extraction result goes beyond the creation of structured information or the transformation into a relational schema Requires re-use of existing formal knowledge (reusing ontologies) or the generation of a schema based on the source data Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 96 / 252
  • 102. Categorisation of Approaches Source - Examples: plain text, relational databases, XML, CSV Exposition - How is the extracted knowledge made explicit? How can you query and perform inference? Synchronization - Is the knowledge extraction process executed once to produce a dump or is the result synchronized with the source? Are changes to the result written back (Bi-directional)? Reuse of Vocabularies - Can popular ontologies (Good Relations, FOAF, . . . ) be re-used to simplify global data integration? Automatisation - manual, semi-automatic, automatic Domain Ontology Required - Does the approach require a pre-dened ontology or can it create a schema from the source (e.g. ontology learning)? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 97 / 252
  • 103. Extraction from Structured Sources to RDF Simple mappings from RDB tables/views to RDF Direct mapping of the model of relational databases to RDF → OWL class → Instance s of Table Row this class → Triple (s ,p ,o ) http://www.w3.org/TR/rdb-direct-mapping/ Cell with value o in column p Details: Complex mappings of relational databases to RDF Additional renements can be employed to 1:1 mapping to improve the usefulness of RDF output Extract or learn an OWL schema from the given database schema Map the schema and its contents to a pre-existing domain ontology Powerful mapping languages: R2RML, SML XML XML tree structure can be directly converted to RDF graph structure Complex mappings possible, e.g. via XSLT processors Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 98 / 252
  • 104. Extraction from Natural Language Sources 80% of the information in business documents is in unstructured natural language 1 (-) Increased complexity and decreased quality of extraction (+) Potential for a massive acquisition of extracted knowledge Traditional Information Extraction (IE) Recognize and categorise elements in text Techniques: Named Entity Recognition (NER), Coreference Resolution (CO), . . . Ontology Learning (OL) from Text Learn whole ontologies from natural language text Usually (semi-)automatic extracted 1 Wimalasuriya, Dou. Ontology-based information extraction: [. . . ] Journal of Information Science Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 99 / 252
  • 105. LinkedGeoData + Sparqlify Example: LinkedGeoData Knowledge Extraction Project using Sparqlify Structure Motivation OpenStreetMap LGD Architecture Mapping Access (How LinkedGeoData is published) Use Cases Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 100 / 252
  • 106. Motivation Ease information integration tasks that require spatial knowledge, such as Oerings of bakeries next door Map of distributed branches of a company Historical sights along a bicycle track LOD cloud contains data sets with spatial features e.g. Geonames, DBpedia, US census, EuroStat But: they are restricted to popular or large entities like countries, famous places etc. or specic regions Therefore they lack Lehmann, Bühmann (Univ. Leipzig) buildings, roads, mailboxes, etc. The Linked Data Life-Cycle 2013-08-23 101 / 252
  • 107. OpenStreetMap - Datamodel Basic entities are: Nodes Latitude, Longitude. Ways Sequence of nodes. Relations Associations between any number of nodes, ways and relations. Every member in a relation plays a certain role. Each entity may be described with tags (= key-value pairs) A way is closed if the ID of the last referenced node equals that of the rst one. Whether a closed way denotes a linear ring or a polygon (i.e. whether the enclosed area is part of the respective OSM entity) depends on the tags. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 102 / 252
  • 108. Example: Leipzig's Zoo Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 103 / 252
  • 109. Comparison: Leipzig's Zoo (OpenStreetMap) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 104 / 252
  • 110. Comparison: Leipzig's Zoo (GoogleMaps) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 105 / 252
  • 111. LGD Architecture Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 106 / 252
  • 112. Tag Mappings Key-value pairs will be assigned to RDF ressources Each pair (k , v ) can be annotated with datatypes, language tags, classes Mappings are themselves tables Example table: k lgd_map_literal name name:en alt_label note ... Lehmann, Bühmann (Univ. Leipzig) property rdfs:label rdfs:label skos:altLabel rdfs:comment ... The Linked Data Life-Cycle lang en ... 2013-08-23 107 / 252
  • 113. View Denition RDF mapping of the data from a PostgreSQL database Create View lgd_nodes As Construct { ?n a lgdm:Node . ?n geom:geometry ?g . ?g ogc:asWKT ?o . } With ?n = uri(lgd:node, ?id) ?g = uri(lgd-geom:node, ?id) ?o = typedLiteral(?geom, ogc:wktLiteral) From nodes Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 108 / 252
  • 114. Sparqlify SPARQL-SQL Rewriter Rewrites SPARQL Queries according to the view denition Platform module oers SPARQL Endpoint and Linked Data interface https: //github.com/AKSW/Sparqlify Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 109 / 252
  • 115. Rest-API Oers REST methods for frequent queries Based on SPARQL (Virtuoso) endpoint Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 110 / 252
  • 116. Downloads RDF dataset for download Generated using Construct { ?s ?p ?o } http: //downloads.linkedgeodata.org Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 111 / 252
  • 117. Ontology Enriched classes and properties with multilingual labels from TranslateWiki http://translatewiki.net Imported icons for 90 classes from the freely available icon collection from the SJJB Management http://www.sjjb.co.uk/mapicons/ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 112 / 252
  • 118. SML Mapping Examples The following slides demonstrate how to map relational data to RDF with the Sparqlication Mapping Language (SML). Thereby, these prexes are used: prex rdfs ogc geom lgd lgd-geom IRI Prexes http://www.w3.org/2000/01/rdf-schema# http://www.opengis.net/ont/geosparql# http://geovocab.org/geometry# http://linkedgeodata.org/triplify/ http://linkedgeodata.org/geometry/ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 113 / 252
  • 119. SML - Mapping Example I: The Goal (1/4) Input Table id 1 2 How to map tables to RDF? nodes How to introduce the geom commonly used POINT(0 0) POINT(1 1) distinction in GIS between feature and geometry? Aimed for RDF Output @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . ... lgd:node1 geom:geometry lgd-geom:node1 . lgd:node2 geom:geometry lgd-geom:node2 . lgd-geom:node1 ogc:asWKT POINT(0 0)^^ogc:wktLiteral . lgd-geom:node2 ogc:asWKT POINT(1 1)^^ogc:wktLiteral . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 114 / 252
  • 120. SML - Mapping Example I: SML Syntax Outline (2/4) Input Table id 1 2 nodes geom POINT(0 0) POINT(1 1) Create View myNodesView As Construct { ... } With ... From ... Aimed for RDF Output @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . ... lgd:node1 geom:geometry lgd-geom:node1 . lgd:node2 geom:geometry lgd-geom:node2 . lgd-geom:node1 ogc:asWKT POINT(0 0)^^ogc:wktLiteral . lgd-geom:node2 ogc:asWKT POINT(1 1)^^ogc:wktLiteral . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 115 / 252
  • 121. SML - Mapping Example I: Construct and From (3/4) Input Table id 1 2 nodes geom POINT(0 0) POINT(1 1) Create View myNodesView As Construct { ?n geom:geometry ?g . ?g ogc:asWKT ?o } With ... From nodes Aimed for RDF Output @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . ... lgd:node1 geom:geometry lgd-geom:node1 . lgd:node2 geom:geometry lgd-geom:node2 . lgd-geom:node1 ogc:asWKT POINT(0 0)^^ogc:wktLiteral . lgd-geom:node2 ogc:asWKT POINT(1 1)^^ogc:wktLiteral . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 116 / 252
  • 122. SML - Mapping Example I: Complete! (4/4) Input Table id 1 2 nodes geom POINT(0 0) POINT(1 1) Create View myNodesView As Construct { ?n geom:geometry ?g . ?g ogc:asWKT ?o } With ?n = uri(lgd:node, ?id) ?g = uri(lgd-geom:node, ?id) ?o = typedLiteral(?geom, ogc:wktLiteral) From nodes Aimed for RDF Output @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . ... lgd:node1 geom:geometry lgd-geom:node1 . lgd:node2 geom:geometry lgd-geom:node2 . lgd-geom:node1 ogc:asWKT POINT(0 0)^^ogc:wktLiteral . lgd-geom:node2 ogc:asWKT POINT(1 1)^^ogc:wktLiteral . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 117 / 252
  • 123. SML Mapping Examples A more complex example, which demonstrates the use of an SQL mapping table and an SQL helper view. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 118 / 252
  • 124. SML - Mapping Example II: The Goal (1/8) Input Table id 1 1 1 1 1 node_tags k name name:en amenity addr:street addr:city v Universitaet Leipzig University of Leipzig university Augustusplatz Leipzig Aimed for RDF Output @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . @prefix lgd: http://linkedgeodata.org/triplify/ . lgd:node1 rdfs:label Universitaet Leipzig . lgd:node1 rdfs:label University of Leipzig@en . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 119 / 252
  • 125. SML - Mapping Example II: Source Data (2/8) OSM Table id 1 1 1 1 1 node_tags k name name:en amenity addr:street addr:city v Universitaet Leipzig University of Leipzig university Augustusplatz Leipzig Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 120 / 252
  • 126. SML - Mapping Example II: Mapping Table (3/8) OSM Table id 1 1 1 1 1 node_tags k name name:en amenity addr:street addr:city RDF Mapping Table v Universitaet Leipzig University of Leipzig university Augustusplatz Leipzig Lehmann, Bühmann (Univ. Leipzig) k lgd_map_literal name name:en alt_label note ... The Linked Data Life-Cycle property rdfs:label rdfs:label skos:altLabel rdfs:comment ... lang en ... 2013-08-23 121 / 252
  • 127. SML - Mapping Example II: Helper View (4/8) OSM Table id 1 1 1 1 1 node_tags k name name:en amenity addr:street addr:city RDF Mapping Table v Universitaet Leipzig University of Leipzig university Augustusplatz Leipzig k lgd_map_literal name name:en alt_label note ... property rdfs:label rdfs:label skos:altLabel rdfs:comment ... lang en ... Helper View lgd_node_tags_literal id property v lang 1 rdfs:label Universitaet Leipzig 1 rdfs:label University of Leipzig en ... ... ... ... SELECT id, property, v, lang FROM node_tags, lgd_map_literal WHERE node_tags.k = lgd_map_literal.k Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 122 / 252
  • 128. SML - Mapping Example II: SML View (5/8) Logical Table id 1 1 ... SML View lgd_node_tags_literal property rdfs:label rdfs:label ... v Univ. L. Univ. of L. ... Lehmann, Bühmann (Univ. Leipzig) lang en ... Create View lgd_node_tags_text As Construct { The Linked Data Life-Cycle 2013-08-23 123 / 252
  • 129. SML - Mapping Example II: SML View (6/8) Logical Table id 1 1 ... SML View lgd_node_tags_literal property rdfs:label rdfs:label ... v Univ. L. Univ. of L. ... Lehmann, Bühmann (Univ. Leipzig) lang en ... Create View lgd_node_tags_text As Construct { ?s ?p ?o . } With ... From lgd_node_tags_literal The Linked Data Life-Cycle 2013-08-23 124 / 252
  • 130. SML - Mapping Example II: SML View (7/8) Logical Table id 1 1 ... SML View lgd_node_tags_literal property rdfs:label rdfs:label ... v Univ. L. Univ. of L. ... Lehmann, Bühmann (Univ. Leipzig) lang en ... Create View lgd_node_tags_text As Construct { ?s ?p ?o . } With ?s = uri(lgd:node, ?id) ?p = uri(?property) ?o = plainLiteral(?v, ?lang) From lgd_node_tags_literal The Linked Data Life-Cycle 2013-08-23 125 / 252
  • 131. SML - Mapping Example II: SML View (8/8) Logical Table SML View + Create View lgd_node_tags_text As Construct { ?s ?p ?o . } With ?s = uri(lgd:node, ?id) ?p = uri(?property) ?o = plainLiteral(?v, ?lang) From lgd_node_tags_literal id 1 1 ... lgd_node_tags_literal property rdfs:label rdfs:label ... v Univ. L. Univ. of L. ... lang en ... Resulting RDF @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . @prefix lgd: http://linkedgeodata.org/triplify/ . lgd:node1 rdfs:label Universitaet Leipzig . lgd:node1 rdfs:label University of Leipzig@en . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 126 / 252
  • 133. LGD Edit Tool Multi User Tag Mapping WebApp Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 128 / 252
  • 134. Resources Sparqlify http://sparqlify.org LinkedGeoData http://linkedgeodata.org Tag Mappings https://github.com/GeoKnow/LinkedGeoData/blob/master/linkedgeodata-core/src/main/resources/ org/aksw/linkedgeodata/sql/Mappings.sql SML View Denitions https://github.com/GeoKnow/LinkedGeoData/blob/master/linkedgeodata-core/src/main/resources/ org/aksw/linkedgeodata/sml/LinkedGeoData-Triplify-IndividualViews.sml Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 129 / 252
  • 135. Statistics (15 August 2013) Complete OSM planet le corresponds to Virtual access via Sparqlify ∼ 20.000.000.000 triples Downloads limited to selected classes. 292.780.188 Triples 153.613.243 triples of Nodes 139.166.945 triples of Ways Relations not yet available for download Among them 532.812 PlaceOfWorship 82.788 RailwayStation 72.091 Toilets 71.613 Town 19.937 City Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 130 / 252
  • 136. Access Materialized Sparql Endpoint (based on Virtuoso DB, download datasets loaded) http://linkedgeodata.org/sparql http://linkedgeodata.org/snorql Virtual Sparql Endpoint (based on Sparqlify, access to 20B triples, limited SPARQL 1.0 support) http://linkedgeodata.org/vsparql http://linkedgeodata.org/vsnorql Rest Interface (based on the Virtual Sparql Endpoint) Supports limited queries (e.g. circular/rectangular area, ltering by labels) Downloads http://downloads.linkedgeodata.org Monthly updates on the above datasets envisioned Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 131 / 252
  • 137. Use Cases Augmented Reality Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 132 / 252
  • 138. Use Cases Generic Browsing Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 133 / 252
  • 139. Use Cases Generic Browsing Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 134 / 252
  • 140. Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking Interlinking / Fusing Manual revision/ Authoring Classification/ Enrichment Linked Data Lifecycle Storage/ Querying Evolution / Repair Extraction 6 Enrichment 7 Repair 8 Quality Analysis Knowledge Base Exploration / Querying Lehmann, Bühmann (Univ. Leipzig) Search/ Browsing/ Exploration The Linked Data Life-Cycle 2013-08-23 135 / 252
  • 141. Why Link Discovery? 1 Fourth Linked Data principle 2 Links are central for Cross-ontology QA Data Integration Reasoning Federated Queries ... 3 2011 topology of the LOD Cloud: 31+ billion triples ≈ 0.5 billion links owl:sameAs in most cases Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 136 / 252
  • 142. Why is it dicult? 1 Time complexity Large number of triples Quadratic a-priori runtime 69 days for mapping cities from DBpedia to Geonames (1ms per comparison) decades for linking DBpedia and LGD ... Denition (Link Discovery) Given sets S and T of resources and relation Task: Find M = {(s , t ) ∈ S × T : R(s , t )} R Common approaches: Find M Find M = {(s , t ) ∈ S × T : σ(s , t ) ≥ θ} = {(s , t ) ∈ S × T : δ(s , t ) ≤ θ} Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 137 / 252
  • 143. Why is it dicult? 2 Complexity of specications Combination of several attributes required for high precision Tedious discovery of most adequate mapping Dataset-dependent similarity functions Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 138 / 252
  • 144. LIMES Framework Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 139 / 252
  • 145. Runtime Optimization Reduce the number of comparisons C (A) all σ /θ values for links) ≥ |M | (assuming we need Maximize reduction ratio: RR (A) Lehmann, Bühmann (Univ. Leipzig) =1− C (A) |S ||T | The Linked Data Life-Cycle 2013-08-23 140 / 252
  • 146. Runtime Optimization Reduce the number of comparisons C (A) all σ /θ values for links) ≥ |M | (assuming we need Maximize reduction ratio: RR (A) =1− C (A) |S ||T | Question Can we devise lossless approaches with guaranteed RR? Advantages Space management Runtime prediction Resource scheduling Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 140 / 252
  • 147. RR Guarantee Best achievable reduction ratio: RRmax Lehmann, Bühmann (Univ. Leipzig) =1− The Linked Data Life-Cycle |M | |S ||T | 2013-08-23 141 / 252
  • 148. RR Guarantee Best achievable reduction ratio: RRmax Approach H(α) =1− |M | |S ||T | fullls RR guarantee criterion, i: ∀r RRmax , ∃α : RR (H(α)) ≥ r Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 141 / 252
  • 149. RR Guarantee Best achievable reduction ratio: RRmax Approach H(α) =1− |M | |S ||T | fullls RR guarantee criterion, i: ∀r RRmax , ∃α : RR (H(α)) ≥ r Here, we use relative reduction ratio (RRR ): RRR (A) Lehmann, Bühmann (Univ. Leipzig) = RRmax RR (A) The Linked Data Life-Cycle 2013-08-23 141 / 252
  • 150. Goal Formal Goal Devise H(α) : ∀r 1, ∃α : RRR (H(α)) ≤ r Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 142 / 252
  • 151. Restrictions Minkowski Distance δ(s , t ) = p n 1 i= Lehmann, Bühmann (Univ. Leipzig) |si − ti |p , p ≥ 2 The Linked Data Life-Cycle 2013-08-23 143 / 252
  • 152. Space Tiling HYPPO δ(s , t ) ≤ θ describes a hypersphere Approximate hypersphere by using a hypercube Easy to compute No loss of recall (blocking) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 144 / 252
  • 153. Space Tiling Set width of single hypercube to Lehmann, Bühmann (Univ. Leipzig) ∆ = θ/α The Linked Data Life-Cycle 2013-08-23 145 / 252
  • 154. Space Tiling Set width of single hypercube to Tile Ω=S ∪T (c1 , . . . , c ) ∈ N points ω ∈ Ω : ∀i ∈ {1 . . . n}, c ∆ ≤ ω (c + 1)∆ Coordinates: Contains ∆ = θ/α into the adjacent cubes C Lehmann, Bühmann (Univ. Leipzig) n n i The Linked Data Life-Cycle i i 2013-08-23 145 / 252
  • 155. Space Tiling Set width of single hypercube to Tile Ω=S ∪T (c1 , . . . , c ) ∈ N points ω ∈ Ω : ∀i ∈ {1 . . . n}, c ∆ ≤ ω (c + 1)∆ Coordinates: Contains ∆ = θ/α into the adjacent cubes C Lehmann, Bühmann (Univ. Leipzig) n n i The Linked Data Life-Cycle i i 2013-08-23 145 / 252
  • 156. HYPPO Combine (2α + 1)n hypercubes around C (ω) to approximate hypersphere RRR (HYPPO (α)) n 2 = (αα+(1)) nS n lim RRR (HYPPO (α)) α→∞ Lehmann, Bühmann (Univ. Leipzig) n = S2 n) ( The Linked Data Life-Cycle 2013-08-23 146 / 252
  • 157. HYPPO RRR(HYPPO) for p Lehmann, Bühmann (Univ. Leipzig) = 2, n = 2, 3, 4 and 2 ≤ α ≤ 50 The Linked Data Life-Cycle 2013-08-23 147 / 252
  • 158. HYPPO RRR(HYPPO) for p = 2, lim RRR (HYPPO (α)) α→∞ lim RRR (HYPPO (α)) α→∞ lim RRR (HYPPO (α)) α→∞ Lehmann, Bühmann (Univ. Leipzig) n = 2, 3, 4 and 2 ≤ α ≤ 50 4 = π ≈ 1.27 (n = 2) 6 = π ≈ 1.91 (n = 3) 32 = π2 ≈ 3.24 (n = 4) The Linked Data Life-Cycle 2013-08-23 147 / 252
  • 159. HR3 : Idea index (C , ω) =  0 if n  i= Lehmann, Bühmann (Univ. Leipzig) ∃i : |ci − c (ω)i | ≤ 1, 1 ≤ i ≤ n, (|ci − c (ω)i | − 1)p 1 The Linked Data Life-Cycle else, 2013-08-23 148 / 252
  • 160. HR3 : Idea Compare C (ω) with C i index (C , ω) α = 4, p = 2 Lehmann, Bühmann (Univ. Leipzig) ≤ αp The Linked Data Life-Cycle 2013-08-23 149 / 252
  • 161. HR3 : Idea Lemma ∀s ∈ S : index (C , s ) αp implies that all t ∈C are non-matches Claims No loss of recall 3 (α)) lim RRR (HR α→∞ Lehmann, Bühmann (Univ. Leipzig) =1 The Linked Data Life-Cycle 2013-08-23 150 / 252
  • 162. HR3 : Lemma 3 Lemma ∀α 1 p 3 (2α)) RRR (HR RRR (HR3 (α)) = 2, α = 4 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 151 / 252
  • 163. HR3 : Proof Lemma ∀α 1 p RRR (HR 3 (2α)) RRR (HR 3 (α)) = 2, α = 8 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 152 / 252
  • 164. HR3 : Proof Lemma ∀α 1 p RRR (HR 3 (2α)) RRR (HR 3 (α)) = 2, α = 25 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 153 / 252
  • 165. HR3 : Proof Lemma ∀α 1 p RRR (HR 3 (2α)) RRR (HR 3 (α)) = 2, α = 50 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 154 / 252
  • 166. HR3 : Idea Theorem 3 (α)) lim RRR (HR α→∞ =1 Claims No loss of recall 3 (α)) lim RRR (HR α→∞ Lehmann, Bühmann (Univ. Leipzig) =1 The Linked Data Life-Cycle 2013-08-23 155 / 252
  • 167. HR3 : Experiments Compare HR3 with LIMES 0.5's HYPPO and SILK 2.5.1 Experimental Setup: Deduplicating DBpedia places by minimum elevation, elevation and maximum elevation (θ = 49m, 99m). Geonames and LinkedGeoData by longitude and latitude (θ = 1◦ , 9◦ ) 64-bit computer with a 2.8GHz i7 processor with 8GB RAM. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 156 / 252
  • 168. HR3 : Experiments (Comparisons) Experiment 2: Deduplicating DBpedia places, 6 0.64 × 10 θ = 99m less comparisons Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 157 / 252
  • 169. HR3 : Experiments (Comparisons) Experiment 4: Linking Geonames and LinkedGeoData, 4.3 × 106 θ = 9◦ less comparisons Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 158 / 252
  • 170. HR3 : Experiments (Runtime) θ = 49, 99m ◦ Geonames and LGD, θ = 1, 9 Experiment 1, 2: DBpedia, Experiment 3, 4: 10 Runtime (s) 10 10 10 10 4 3 HR3 HYPPO SILK 2 1 0 Exp. 1 Lehmann, Bühmann (Univ. Leipzig) Exp. 2 Exp. 3 The Linked Data Life-Cycle Exp. 4 2013-08-23 159 / 252
  • 171. HR3 : Summary Mission New category of algorithms for link discovery Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 160 / 252
  • 172. HR3 : Summary Mission New category of algorithms for link discovery Presented HR3 Link discovery in ane spaces with Minkowski measures Outperforms the state of the art (runtime, comparisons) Optimal reduction ratio Integrated in LIMES Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 160 / 252
  • 173. Learning Complex Specications Supervised (mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 161 / 252
  • 174. Learning Complex Specications Supervised (mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 161 / 252
  • 175. Learning Complex Specications Supervised (mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 161 / 252
  • 176. Learning Complex Specications Supervised (mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 162 / 252
  • 177. Learning Complex Specications Supervised (mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 162 / 252
  • 178. Learning Complex Specications Supervised (mostly active, e.g., RAVEN, EAGLE, SILK) Unsupervised (e.g., KnoFuss, EUCLID, EAGLE) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 162 / 252
  • 179. Learning Complex Specications Insight Choice of right example is key for learning So far, only use of informativeness Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 163 / 252
  • 180. Learning Complex Specications Insight Choice of right example is key for learning So far, only use of informativeness Question Can we do better by using more information? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 163 / 252
  • 181. Learning Complex Specications Insight Choice of right example is key for learning So far, only use of informativeness Question Can we do better by using more information? Higher F-measure Often slower Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 163 / 252
  • 182. Basic Idea Use similarity of link candidates when selecting most informative examples (intra + inter class similarity) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 164 / 252
  • 183. Basic Idea Use similarity of link candidates when selecting most informative examples (intra + inter class similarity) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 164 / 252
  • 184. Basic Idea Use similarity of link candidates when selecting most informative examples (intra + inter class similarity) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 164 / 252
  • 185. Similarity of Candidates = (s , t ) can (σ1 (x ), . . . , σn (x )) ∈ [0, 1]n . Link candidate x be regarded as vector Similarity of link candidates x and y : sim (x , y ) 1 = n 1 + i= . (1) (σi (x ) − σi (y ))2 1 Allows exploiting both intra- and inter-class similarity Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 165 / 252
  • 186. Graph Clustering Rationale: Approach Use intra-class similarity Cluster elements of S + and S − independently Choose one element per cluster as representative Present oracle with most informative representatives e S+ 0.9 a 0.25 0.8 c 0.8 0.8 b h 0.8 f Lehmann, Bühmann (Univ. Leipzig) d 0.9 0.25 l 0.8 i 0.9 0.8 0.8 g k 0.25 The Linked Data Life-Cycle S2013-08-23 166 / 252
  • 187. BorderFlow G = (V , E , ω) with V = S+ or V = S− ω(x , y ) = sim(x , y ) Keep best ec edges for each x Lehmann, Bühmann (Univ. Leipzig) ∈V The Linked Data Life-Cycle 2013-08-23 167 / 252
  • 188. BorderFlow Seed-based algorithm Goal: Maximize borderow ratio bf (X ) Lehmann, Bühmann (Univ. Leipzig) = Ω(b (X ),X ) Ω(b (X ),n(X )) The Linked Data Life-Cycle 2013-08-23 168 / 252
  • 189. BorderFlow Seed-based algorithm Goal: Maximize borderow ratio bf (X ) Lehmann, Bühmann (Univ. Leipzig) = Ω(b (X ),X ) Ω(b (X ),n(X )) The Linked Data Life-Cycle 2013-08-23 168 / 252
  • 190. BorderFlow Seed-based algorithm Goal: Maximize borderow ratio bf (X ) = Ω(b (X ),X ) Ω(b (X ),n(X )) http://sourceforge.net/projects/cugar-framework/ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 168 / 252
  • 191. BorderFlow Seed-based algorithm Goal: Maximize borderow ratio bf (X ) Lehmann, Bühmann (Univ. Leipzig) = Ω(b (X ),X ) Ω(b (X ),n(X )) The Linked Data Life-Cycle 2013-08-23 169 / 252
  • 192. BorderFlow Seed-based algorithm Goal: Maximize borderow ratio bf (X ) = Ω(b (X ),X ) Ω(b (X ),n(X )) http://sourceforge.net/projects/cugar-framework/ Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 169 / 252
  • 193. Conclusion Can be combined with arbitrary active learning ML algorithms Was experimentally combined with EAGLE (genetic programming) and RAVEN (linear classier) and shown to outperform the plain informativeness function in terms of F-measure Choice of example important to minimise user eort Contact me for detailed experimental results Longer runtimes (up to 2×) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 170 / 252
  • 194. Summary Linking crucial task in the web of data Tow key problems 1 Ecient execution of link specications 2 Creation of link specication Presented HR3 to handle the rst problem Presented COALA as building block for the second problem Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 171 / 252
  • 195. Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking Interlinking / Fusing Manual revision/ Authoring Classification/ Enrichment Linked Data Lifecycle Storage/ Querying Evolution / Repair Extraction 6 Enrichment 7 Repair 8 Quality Analysis Knowledge Base Exploration / Querying Lehmann, Bühmann (Univ. Leipzig) Search/ Browsing/ Exploration The Linked Data Life-Cycle 2013-08-23 172 / 252
  • 196. Motivation rise in the availability and usage of knowledge bases still a lack of knowledge bases that consist of sophisticated schema information and instance data adhering to this schema e.g. in the life sciences several knowledge bases only consist of schema information to a large extent, a collection of facts without a clear structure (e.g. information extracted from databases) combination of sophisticated schema and instance data would allow powerful reasoning, consistency checking, and improved querying → create schemata based on existing data Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 173 / 252
  • 197. Example dbr : Brad_Pitt : birthPlace a dbr : Angela_Merkel : birthPlace a : birthPlace a d b r : Shawnee , _Oklahoma a Suggestions: a a : Place . : Place . birthPlace ObjectProperty : birthPlace Characteristics : Range : d b r : Ulm ; : Person . : Place . d b r : Hamburg Domain : d b r : Hamburg ; : Person . dbr : A l b e r t _ E i n s t e i n d b r : Ulm d b r : Shawnee , _Oklahoma ; : Person . Functional Person Place SubPropertyOf : hasBeenAt Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 174 / 252
  • 198. Benets of an expressive schema Axioms serve as documentation for the purpose and correct usage of schema elements Additional implicit information can be inferred Improve querying optimisations Improve/allow the application of schema debugging techniques Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 175 / 252
  • 199. Each person was only born at one place?! Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 176 / 252
  • 200.
  • 205. = birthPlace birthPlace SELECT ? s WHERE { ? s dbo : b i r t h P l a c e ? o1 . ? s dbo : b i r t h P l a c e ? o2 . FILTER ( ? o1 != ? o2 ) } } birthPlace is functional Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 177 / 252
  • 206. Where was Julia Nannie Wallace born? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 178 / 252
  • 207. Julia Nannie Wallace was born in Lacrosse? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 179 / 252
  • 208. No, Julia Nannie Wallace was born in La Crosse! Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 180 / 252
  • 209.
  • 217. rdf:type City birthPlace rdf:type Place SELECT ? s ? p l a c e WHERE { ? s dbo : b i r t h P l a c e ? p l a c e . ? place r d f : type / r d f s : subClassOf ∗ ? type1 . ? t y p e 2 r d f s : s u b C l a birthPlace :range Place s s O f ∗ dbo P l a c e . ? t y p e 1 owl : d i s j o i n t W i t h ? t y p e 2 . } Place disjointWith Sport Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 181 / 252
  • 218. 3 Steps to get a schema 3-Phase Enrichment Learning Approach: SPARQL Endpoint Input: Entity URI, Axiom Type, Knowledge Base (SPARQL Endpoint) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 182 / 252
  • 219. 3 Steps to get a schema 3-Phase Enrichment Learning Approach: (only executed once per knowledge base) SPARQL Endpoint Input: Entity URI, 1. obtain schema Axiom Type, information Knowledge Base (SPARQL Endpoint) Lehmann, Bühmann (Univ. Leipzig) Background Knowledge The Linked Data Life-Cycle 2013-08-23 183 / 252
  • 220. 3 Steps to get a schema 3-Phase Enrichment Learning Approach: Input: Entity URI, 1. obtain schema Axiom Type, information Knowledge Base (SPARQL Endpoint) Lehmann, Bühmann (Univ. Leipzig) (sample data if necessary) Reasoner (optional invocation) (only executed once per knowledge base) SPARQL Endpoint Background Knowledge 2. obtain axiom type and entity specific data Background Knowledge + Relevant Instance Data The Linked Data Life-Cycle 2013-08-23 184 / 252
  • 221. 3 Steps to get a schema 3-Phase Enrichment Learning Approach: Input: Entity URI, 1. obtain schema Axiom Type, information Knowledge Base (SPARQL Endpoint) Lehmann, Bühmann (Univ. Leipzig) (sample data if necessary) Reasoner Learner DL-Learner Enrichment Ontology (optional invocation) (only executed once per knowledge base) SPARQL Endpoint Background Knowledge 2. obtain axiom type and entity specific data Background 3. run machine learning Knowledge algorithm + Relevant Instance Data The Linked Data Life-Cycle 2013-08-23 List of Axiom Suggestions + Metadata 185 / 252
  • 222. 3 Steps to get a schema 3-Phase Enrichment Learning Approach: Input: Entity URI, 1. obtain schema Axiom Type, information Knowledge Base (SPARQL Endpoint) Lehmann, Bühmann (Univ. Leipzig) (sample data if necessary) Reasoner Learner DL-Learner Enrichment Ontology (optional invocation) (only executed once per knowledge base) iterate over all axiom types and schema entities for full enrichment SPARQL Endpoint Background Knowledge 2. obtain axiom type and entity specific data Background 3. run machine learning Knowledge algorithm + Relevant Instance Data The Linked Data Life-Cycle 2013-08-23 List of Axiom Suggestions + Metadata 186 / 252
  • 223. Starting Point http://dbpedia.org/sparql http://dbpedia.org/ontology/author SPARQL endpoint: Entity URI: Axiom Type: Object Property Domain Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 187 / 252
  • 224. Step 1 - Obtaining Schema Information Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 188 / 252
  • 225. Step 1 - Obtaining Schema Information CONSTRUCT WHERE { ? sub r d f s : s u b C l a s s O f ? sup . } ORDER BY DESC( ? sub ) LIMIT 1000 OFFSET 1000 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 188 / 252
  • 226. Step 1 - Obtaining Schema Information CONSTRUCT WHERE { ? sub r d f s : s u b C l a s s O f ? sup . } ORDER BY DESC( ? sub ) LIMIT 1000 OFFSET 1000 dbo : D i s e a s e dbo : Book dbo : WrittenWork dbo : Work dbo : P h i l o s o p h e r dbo : P e r s o n dbo : Agent dbo : S p o r t dbo : A c t i v i t y dbo : F i s h rdfs rdfs rdfs rdfs rdfs rdfs rdfs rdfs rdfs rdfs Lehmann, Bühmann (Univ. Leipzig) : subClassOf : subClassOf : subClassOf : subClassOf : subClassOf : subClassOf : subClassOf : subClassOf : subClassOf : subClassOf owl : Thing . dbo : WrittenWork . dbo : Work . owl : Thing . dbo : P e r s o n . dbo : Agent . owl : Thing . dbo : A c t i v i t y . owl : Thing . dbo : Animal . The Linked Data Life-Cycle 2013-08-23 188 / 252
  • 227. Step 2 - Obtain axiom type and entity specic data Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 189 / 252
  • 228. Step 2 - Obtain axiom type and entity specic data SELECT ? t y p e (COUNT( DISTINCT ? s ) AS ? c n t ) WHERE { ? s dbo : a u t h o r ? o . ? s a ? type . } GROUP BY ? t y p e ORDER BY DESC( ? c n t ) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 189 / 252
  • 229. Step 2 - Obtain axiom type and entity specic data SELECT ? t y p e (COUNT( DISTINCT ? s ) AS ? c n t ) WHERE { ? s dbo : a u t h o r ? o . ? s a ? type . } GROUP BY ? t y p e ORDER BY DESC( ? c n t ) type cnt owl:Thing 30284 dbo:Work 30284 schema:CreativeWork 30284 dbo:WrittenWork 25730 dbo:Book 24673 schema:Book 24673 dbo:TelevisionShow 2567 dbo:Play 1057 . . . Lehmann, Bühmann (Univ. Leipzig) . . . The Linked Data Life-Cycle 2013-08-23 189 / 252
  • 230. Step 2 - Obtain axiom type and entity specic data CONSTRUCT WHERE { ? i n d dbo : a u t h o r ? o . ? ind a ? type . } ORDER BY DESC( ? i n d ) LIMIT 1000 OFFSET 2000 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 189 / 252
  • 231. Step 2 - Obtain axiom type and entity specic data CONSTRUCT WHERE { ? i n d dbo : a u t h o r ? o . ? ind a ? type . } ORDER BY DESC( ? i n d ) LIMIT 1000 OFFSET 2000 . . . d b p e d i a : The_Adventures_of_Tom_Sawyer dbo : a u t h o r d b p e d i a : Mark_Twain ; rdf : type dbo : Book . d b p e d i a : The_Zombie_Survival_Guide dbo : a u t h o r d b p e d i a : Max_Brooks ; rdf : type dbo : WrittenWork . d b p e d i a : Web_Therapy dbo : a u t h o r d b p e d i a : Lisa_Kudrow ; rdf : type dbo : Book . . . . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 189 / 252
  • 232. Step 3 - Scoring d b p e d i a : The_Adventures_of_Tom_Sawyer dbo : a u t h o r d b p e d i a : Mark_Twain ; rdf : type dbo : Book . d b p e d i a : The_Zombie_Survival_Guide dbo : a u t h o r d b p e d i a : Max_Brooks ; rdf : type dbo : WrittenWork . d b p e d i a : Web_Therapy dbo : a u t h o r d b p e d i a : Lisa_Kudrow ; rdf : type dbo : Book . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 190 / 252
  • 233. Step 3 - Scoring d b p e d i a : The_Adventures_of_Tom_Sawyer dbo : a u t h o r d b p e d i a : Mark_Twain ; rdf : type dbo : Book . d b p e d i a : The_Zombie_Survival_Guide dbo : a u t h o r d b p e d i a : Max_Brooks ; rdf : type dbo : WrittenWork . d b p e d i a : Web_Therapy dbo : a u t h o r d b p e d i a : Lisa_Kudrow ; rdf : type dbo : Book . Score(Domain(dbo:author, dbo:Book))= 2 3 ≈ 66.7% Score(Domain(dbo:author, dbo:WrittenWork))= Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 1 3 ≈ 33.3% 2013-08-23 190 / 252
  • 234. Step 3 - Scoring d b p e d i a : The_Adventures_of_Tom_Sawyer dbo : a u t h o r d b p e d i a : Mark_Twain ; rdf : type dbo : Book . d b p e d i a : The_Zombie_Survival_Guide dbo : a u t h o r d b p e d i a : Max_Brooks ; rdf : type dbo : WrittenWork . d b p e d i a : Web_Therapy dbo : a u t h o r d b p e d i a : Lisa_Kudrow ; rdf : type dbo : Book . Score(Domain(dbo:author, dbo:Book))= 2 3 ≈ 66.7% Score(Domain(dbo:author, dbo:WrittenWork))= dbo : Book 1 3 ≈ 33.3% r d f s : s u b C l a s s O f dbo : WrittenWork . Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 190 / 252
  • 235. Step 3 - Scoring d b p e d i a : The_Adventures_of_Tom_Sawyer dbo : a u t h o r d b p e d i a : Mark_Twain ; rdf : type dbo : Book . d b p e d i a : The_Zombie_Survival_Guide dbo : a u t h o r d b p e d i a : Max_Brooks ; rdf : type dbo : WrittenWork . d b p e d i a : Web_Therapy dbo : a u t h o r d b p e d i a : Lisa_Kudrow ; rdf : type dbo : Book . Score(Domain(dbo:author, dbo:Book))= 2 3 ≈ 66.7% Score(Domain(dbo:author, dbo:WrittenWork))= dbo : Book 1 3 ≈ 33.3% r d f s : s u b C l a s s O f dbo : WrittenWork . Score(Domain(dbo:author, dbo:WrittenWork))= Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 3 3 = 100% 2013-08-23 190 / 252
  • 236. Step 3 - Scoring(2) Problem: support for axiom in KB not taken into account → no dierence between 3 out of 3 and 100 out of 100 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 191 / 252
  • 237. Step 3 - Scoring(2) Problem: support for axiom in KB not taken into account → no dierence between 3 out of 3 and 100 out of 100 Solution: Average of 95% condence interval (Wald method) s p = m+2 +4 min(1, p + 1.96 · p ·(1−p ) ) max(0, p − 1.96 · m +4 − #success m − #total s p ·(1−p ) ) m +4 In 95% of the intervals the true value is between ... and ... Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 191 / 252
  • 238. Step 3 - Scoring(2) Problem: support for axiom in KB not taken into account → no dierence between 3 out of 3 and 100 out of 100 Solution: Average of 95% condence interval (Wald method) s p = m+2 +4 min(1, p + 1.96 · p ·(1−p ) ) max(0, p − 1.96 · m +4 − #success m − #total s p ·(1−p ) ) m +4 In 95% of the intervals the true value is between ... and ... Score(Domain(dbo:author, dbo:Book))≈ 57.3% Score(Domain(dbo:author, dbo:WrittenWork))≈ 69.1% Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 191 / 252
  • 239. More Complex Axioms Pattern Based Knowledge Base Enrichment, ISWC 2013 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 192 / 252
  • 240. Outlook and Summary Schema in the Linked Data Web often shallow support knowledge engineers → tools needed to Showed some techniques for learning OWL axioms on large knowledge bases available as SPARQL endpoints More complex aioms require: OWL-SPARQL rewriting or Fragment extraction Small- and medium sized knowledge bases can be handled via techniques from Inductive Logic Programming All algorithms implemented in DL-Learner framework (http://dl-learner.org) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 193 / 252
  • 241. Outline 1 Introduction to Linked Data 2 Linked Dataset Example: DBpedia 3 Linked Data Life-Cycle Overview 4 Knowledge Extraction 5 Data Integration / Linking Interlinking / Fusing Manual revision/ Authoring Classification/ Enrichment Linked Data Lifecycle Storage/ Querying Evolution / Repair Extraction 6 Enrichment 7 Repair 8 Quality Analysis Knowledge Base Exploration / Querying Lehmann, Bühmann (Univ. Leipzig) Search/ Browsing/ Exploration The Linked Data Life-Cycle 2013-08-23 194 / 252
  • 242. Motivation increasing number of knowledge bases in the Semantic Web (see e.g. LOD cloud) maintenance of knowledge bases with expressive semantics is challenging Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 195 / 252
  • 243. (Automatically) Detectable Ontology Problems Common problems: Syntactic Problems Structural Problems Semantic Problems (focus of talk) Task Based Problems: Reasoning Related Problems Linked Data Related Problems Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 196 / 252
  • 244. Syntactic Problems Syntactic errors are mainly violations of conventions of the language in which the ontology is modelled. Example (Validity of XML) ? x m l v e r s i o n = 1 . 0 ? r d f : R D F x m l n s : r d f = h t t p : / /www . w3 . o r g /1999/02/22 − r d f − s y n t a x −n s# x m l n s : d c= h t t p : / / p u r l . o r g / d c / e l e m e n t s / 1 . 1 / r d f : D e s c r i p t i o n r d f : a b o u t = h t t p : / /www . w3 . o r g / d c : t i t l e W o r l d Wide Web C o n s o r t i u m/ d c : t i t l e / r d f : R D F FatalError: The element type rdf:Description must be terminated by the matching end-tag /rdf:Description.[Line = 7, Column = 3] Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 197 / 252
  • 245. Structural Problems Problems in the taxonomy Example (Circularities) A Lehmann, Bühmann (Univ. Leipzig) B, B C, C A The Linked Data Life-Cycle 2013-08-23 198 / 252
  • 246. Reasoning Related Problems Problems which negatively aect the performance of reasoning over expressive knowledge bases Example (A named concept is equivalent to an AllValues restriction) A ≡ ∀r .C Reasoning complexity: Universal restriction does not require to have a property value but only restricts the values for existing property values Any concept B for which instances cannot have r -llers satises the restriction, i.e. B ∀r .C , and becomes a subclass of A Typically leads to unintended inferences and additional inferences may eventually slow down reasoning performance Can be checked via Pellint (part of Pellet) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 199 / 252
  • 247. Linked Data Related Problems Problems which are the specic to publishing RDF using the Linked Data principles Incorrect implementation of content negotiation Mixing up information and non-information resources Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 200 / 252
  • 248. Semantic Problems Logical contradictions in the underlying knowledge base Example (Unsatisable classes) O = {A B C, C ¬B } |= A ⊥ Example (Inconsistent ontology) O = {A B C, C ¬B , A(x )} |= ⊥ Usually handled by Ontology Debugging Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 201 / 252
  • 249. Ontology Debugging Problem: We have undesirable entailments Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 202 / 252
  • 250. Ontology Debugging Problem: We have undesirable entailments Solution: Repair (Delete/Modify) responsible axioms Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 202 / 252
  • 251. Ontology Debugging Problem: We have undesirable entailments Solution: Repair (Delete/Modify) responsible axioms Question: Which axioms? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 202 / 252
  • 252. Ontology Debugging Problem: We have undesirable entailments Solution: Repair (Delete/Modify) responsible axioms Question: Which axioms? Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 202 / 252
  • 253. Justication Justication For an ontology O and an entailment η where O |= η , a set of axioms J η in O if J ⊆ O, J |= η and if J ⊂ J then J |= η . is a justication for Minimal subsets of an ontology that are sucient for a given entailment to hold Synonyms: MUPS (Minimal Unsatisability Preserving Sub-TBoxes), MinAs (Minimal Axiom sets), Kernels Observations: there can be multiple justications for a single entailment an axiom can be part of multiple justications Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 203 / 252
  • 254. Justication - Example O={ B B ∃r .D (1) ∀r .¬D A B B ¬C A ¬E (3) (4) E A C (2) (5) F (6) } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 204 / 252
  • 255. Justication - Example O={ B B ∃r .D (1) ∀r .¬D A B B ¬C A ¬E (3) (4) E A C (2) |= A ⊥ (5) F (6) } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 204 / 252
  • 256. Justication - Example O={ B B ∃r .D (1) ∀r .¬D A B B ¬C A ¬E (3) (4) E A C J1 = {1, 2, 3} (2) |= A ⊥ (5) F (6) } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 204 / 252
  • 257. Justication - Example O={ B B ∃r .D (1) ∀r .¬D A B B ¬C A ¬E (3) (4) E A C J1 = {1, 2, 3} (2) |= A ⊥ J2 = {5, 6} (5) F (6) } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 204 / 252
  • 258. Justication - Example O={ B B ∃r .D (1) ∀r .¬D A B B ¬C A ¬E (3) (4) E A C (5) F J1 = {1, 2, 3} (2) (6) |= A ⊥ J2 = {5, 6} J3 = {3, 4} } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 204 / 252
  • 259. Justication Based Repair For a repair, at least one axiom from every justication needs to be removed. For a repair plan, all justications are needed. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 205 / 252
  • 260. Justication Algorithms Single justication: Glass Box: Modifying underlying reasoning algorithm (tableau tracing) Black-Box: Using reasoner as oracle All justications: Reiter's Hitting Set Tree Algorithm (HST) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 206 / 252
  • 261. Black-Box Expansion-Contraction Strategy Expansion: Add axioms to empty set until entailment holds Contraction: Remove axioms from set such that set becomes minimal CHAPTER 3. COMPUTING JUSTIFICATIONS 54 and entailment still can be derived. Expansion Contraction Key: Axiom Axiom in justification Selected axiom Figure 3.1: A Depiction of a Black-Box Expand-Contract Strategy Source: M. Horridge:Justication 3.2 Based Explanation Black-Box Algorithms for Computing Sin- in Ontologies(PhD Thesis) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 207 / 252
  • 262. Hitting Set Tree Algorithm from eld of Model Based Diagnosis given a faulty system (ontology), it constructs nite tree whose nodes are labelled with conict sets (justications), and whose edges are labelled with components (axioms) nds all minimal hitting sets, which represent diagnoses for the conict sets in the system diagnosis = repair Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 208 / 252
  • 263. CHAPTER 3. COMPUTING JUSTIFICATIONS 63 Hitting Set Tree Algorithm - Example O = {A B Figure 3.2: An Example of a Hitting Set Tree B D A ∃R .C ∃R . J2 = {A D} J1 = {A |= A ∃R.C, ∃R. A ∃R.C {} B, B D} D A B B D J2 = {A D} ∃R. {} D A ∃R.C {} ∃R. ∃R.C, ∃R. D} D {} Source: M. Horridge:Justication Based Explanation in Ontologies(PhD bottom right hand successor to the node labelled with J2 and whose successor Thesis) Lehmann, Bühmann (Univ. Leipzig) 2013-08-23 209 / 252 edge is labelled with ∃R. The Linked Data Life-Cycle by considering O S where D was generated
  • 264. Justication Scenarios A user can be faced with the following situations: Small number of small justications Easy and pleasant to inspect Small number of large justications Better than alternatives Large number of justications Pretty hopeless with current mechanisms Idea: Find source of unsatisability Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 210 / 252
  • 265. Root Unsatisability - Denitions A root UC is a class whose unsatisability does not depend on another class, otherwise it is a derived UC. A derived UC for which there is some justication that is not a strict superset of a justication for another UC is a partial derived UC. Root Unsatisable Class A class A is a root unsatisable class if there is no justication such that J J |= A is a strict superset of a justication for some other ⊥ unsatisable class. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 211 / 252
  • 266. Root Unsatisability - Approaches Approaches: 1: compute all justications for each unsatisable class and apply the denition → computationally often too expensive 2: heuristics for structural analysis of axioms Debugging Unsatisable Classes in OWL Ontologies, Kalyanpur, Parsia, Sirin, Hendler, J. Web Sem, 2005. Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 212 / 252
  • 267. Root Unsatisability - Example O={ B B ∃r .D (1) ∀r .¬D A B B ¬C A ¬E (3) (4) E A C (2) (5) F (6) } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  • 268. Root Unsatisability - Example O={ B B ∃r .D (1) ∀r .¬D A B B ¬C A ¬E |= A ⊥ (3) (4) E A C (2) (5) F (6) } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  • 269. Root Unsatisability - Example O={ B B ∃r .D (1) ∀r .¬D A B B ¬C A ¬E (3) |= A ⊥ J2 = {5, 6} J3 = {3, 4} (4) E A C (2) J1 = {1, 2, 3} (5) F (6) } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  • 270. Root Unsatisability - Example O={ B B ∃r .D (1) ∀r .¬D A B B ¬C A ¬E |= A ⊥ |= B J2 = {5, 6} ⊥ (3) J3 = {3, 4} (4) E A C (2) J1 = {1, 2, 3} (5) F (6) } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  • 271. Root Unsatisability - Example O={ B B ∃r .D (1) ∀r .¬D A B B ¬C A ¬E |= A ⊥ |= B ⊥ (3) J2 = {5, 6} J3 = {3, 4} (4) E A C (2) J1 = {1, 2, 3} (5) F (6) J4 = {1, 2} } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  • 272. Root Unsatisability - Example O={ B B ∃r .D (1) ∀r .¬D A B B ¬C A ¬E |= A ⊥ |= B ⊥ (3) J2 = {5, 6} J3 = {3, 4} (4) E A C (2) J1 = {1, 2, 3} (5) F (6) J4 = {1, 2} root } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  • 273. Root Unsatisability - Example O={ B B ∃r .D (1) ∀r .¬D A B B ¬C A ¬E |= A ⊥ |= B ⊥ (3) J2 = {5, 6} partial J4 = {1, 2} root J3 = {3, 4} (4) E A C (2) J1 = {1, 2, 3} (5) F (6) } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  • 274. Root Unsatisability - Example O={ B B ∃r .D (1) ∀r .¬D A B B ¬C A ¬E |= A ⊥ |= B ⊥ (3) J2 = {5, 6} J3 = {3, 4} partial (J4 ⊂ J1 ) (4) E A C (2) J1 = {1, 2, 3} (5) F (6) J4 = {1, 2} root } Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 213 / 252
  • 275. Axiom Relevance resolving justication requires to delete or edit axioms ranking methods highlight the most probable causes for problems methods: frequency syntactic relevance semantic relevance Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 214 / 252
  • 276. Repair Consequences after repairing process, axioms have been deleted or modied → desired entailments may be lost or new entailments obtained → user can decide to preserve them (including inconsistencies!) Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 215 / 252
  • 277. SPARQL Endpoint Support Previously mentioned approaches are implemented in the ORE tool (http://ore-tool.net) ORE supports using SPARQL endpoints implements an incremental load procedure knowledge base is loaded in small chunks: count number of axioms by type priority based loading procedure e.g. disjointness axioms have higher priority than class assertion axioms uses Pellet incremental reasoning Learning of OWL Class Descriptions on Very Large Knowledge Bases, Hellmann, Lehmann, Auer, Int. Journal Semantic Web Inf. Syst, 2009 Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 216 / 252
  • 278. SPARQL Endpoint Support II algorithm performs sanity checks, e.g. SPARQL queries which probe for typical inconsistent axiom sets can fetch additional Linked Data dierent termination criteria Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 217 / 252
  • 279. SPARQL Endpoint Support II algorithm performs sanity checks, e.g. SPARQL queries which probe for typical inconsistent axiom sets can fetch additional Linked Data dierent termination criteria overall: ORE allows to apply state-of-the-art ontology debugging methods on a larger scale than was possible previously aims at stronger support for the web aspect of the Semantic Web and the high popularity of Web of Data initiative Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 217 / 252
  • 280. DBpedia Live Demo Inconsistency in DBpedia Live: Individual: dbr:Purify_(album) Facts: dbo:artist dbr:Axis_of_Advance Individual: dbr:Axis_of_Advance Types: dbo:Organisation Class: dbo:Organisation DisjointWith dbo:Person ObjectProperty: dbo:artist Range: dbo:Person Lehmann, Bühmann (Univ. Leipzig) The Linked Data Life-Cycle 2013-08-23 218 / 252