FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
Linked Data and Services
1. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Institute AIFB
www.kit.edu
Linked Data and Services
Andreas Harth and Barry Norton
2. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Outline
Motivation
Linked Data Principles
Query Processing over Linked Data
Linked Data Services (LIDS) and Linked Open
Services (LOS)
Conclusion
3. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Motivation
Semantic Web/Linked Data technologies are well-suited
for data integration
30.01.2015
Data
Integration
Interactive Data
Exploration
Common Data
Format/Access
Protocol
!?
4. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Linked Data Principles*
1. Use URIs to name things; not only documents, but
also people, locations, concepts, etc.
2. To enable agents (human users and machine agents
alike) to look up those names, use HTTP URIs
3. When someone looks up a URI we provide useful
information; with 'useful' in the strict sense we usually
mean structured data in RDF.
4. Include links to other URIs allowing agents (machines
and humans) to discover more things
(*) http://www.w3.org/DesignIssues/LinkedData.html
5. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Correspondence between thing-URI and
source-URI
5
User Agent
Web Server
http://www.polleres.net/foaf.rdf#me
http://www.polleres.net/foaf.rdf
HTTP
GET
RDF
6. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Correspondence between thing-URI and
source-URI
6
User Agent
Web Server
http://dbpedia.org/resource/Gordon_Brown
http://dbpedia.org/data/Gordon_Brown
HTTP
GET
303 HTTP
GET
RDF
http://dbpedia.org/page/Gordon_Brown
7. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
8. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Queries over Linked Data
SELECT ?f ?n WHERE {
an:f#ah foaf:knows ?f.
?f foaf:name ?n.
}
?f ?n
SELECT ?x1 ?x2 WHERE {
dblppub:HoganHP08 dc:creator ?a1.
?x1 owl:sameAs ?a1.
?x2 foaf:knows ?x1.
}
9. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Data warehousing or materialisation-based approaches
(MAT)
Querying Data Across Sources
9 15.03.2010
CRAWL INDEX SERVE
SELECT *
FROM…
R S
Distributed query processing approaches (DQP)
R S
10. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
DQP on Linked Data
10 15.03.2010
SELECT *
FROM…
R S
R S
SELECT ?s
WHERE…
TP TP
TP TP
HTTP
GET
HTTP
GET
ODBCODBC
11. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Query Processing Overview
Andreas Harth
Data Summaries for On-Demand Queries over Linked Data
11 15.03.2010
TP
(an:f#ah foaf:knows ?f)
SELECT ?f ?n WHERE {
an:f#ah foaf:knows ?f.
?f foaf:name ?n.
}
TP
(?f foaf:name ?n)
?f ?n
http://danbri.org/foaf.rdf#danbri Dan Brickley
Select
source(s)
Select
source(s)
12. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Problem: Source Selection for Triple Patterns
12 15.03.2010
(?s ?p ?o)
(#s ?p ?o)
(?s #p ?o)
(?s ?p #o)
(#s #p ?o)
(#s ?p #o)
(?s #p #o)
(#s #p #o)
Given a triple pattern, which source can contribute bindings
for the triple pattern?
13. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Keep index of properties and/or classes contained in
sources
(?s #p ?o), (?s rdf:type #o)
Covers only queries containing schema-level elements
Commonly used properties select potentially too many
sources
Schema-Level Indices [Stuckenschmidt et al.
2004]
13 15.03.2010
SELECT ?f ?n WHERE {
an:f#ah foaf:knows ?f.
?f foaf:name ?n.
}
SELECT ?x1 ?x2 WHERE {
dblppub:HoganHP08 dc:creator ?a1.
?x1 owl:sameAs ?a1.
?x2 foaf:knows ?x1.
}
14. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Exploits correspondence between thing-URI and source-URI
Linked Data sources (aka RDF files) return typically triples with a
subject corresponding to the source
Sometimes the sources return triples with object corresponding to the
source
(#s ?p ?o), (#s #p ?o), (#s #p #o)
(?s ?p #o), (?s #p #o)
Incomplete wrt. patterns but also wrt. to URI reuse across sources
Limited parallelism, unclear how to schedule lookups
Direct Lookup (DL) [Hartig et al. 2009]
SELECT ?f ?n WHERE {
an:f#ah foaf:knows ?f.
?f foaf:name ?n.
}
SELECT ?x1 ?x2 WHERE {
dblppub:HoganHP08 dc:creator ?a1.
?x1 owl:sameAs ?a1.
?x2 foaf:knows ?x1.
}
15. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Combined description of schema-level and instance-level
Use approximation to reduce index size (incurs false positives)
Possible to use entire query for source selection
Parallel lookups since sources can be determined for the entire query
(?s ?p ?o), (#s ?p ?o), (?s #p ?o), (?s ?p #o), (#s #p
?o), (#s ?p #o), (?s #p #o), (#s #p #o)
and combinations of triple patterns
Approximate Data Summaries
15 15.03.2010
SELECT ?f ?n WHERE {
an:f#ah foaf:knows ?f.
?f foaf:name ?n.
}
SELECT ?x1 ?x2 WHERE {
dblppub:HoganHP08 dc:creator ?a1.
?x1 owl:sameAs ?a1.
?x2 foaf:knows ?x1.
}
16. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Implementation
Deploy wrappers „in the cloud“
Google App Engine: hosting of Java and Python
webapps on Google’s Cloud infrastructure
Limited amount of processing time (6hrs/day)
Single-threaded applications
Suited for deploying wrappers
e.g. http://twitter2foaf.appspot.com/ converts Twitter
user data to RDF
17. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Linking Open Data Cloud 2007
18. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Linking Open Data Cloud 2008
19. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Linking Open Data Cloud 2009
20. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Linking Open Data Cloud 2010
21. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Geonames Services
22. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Geonames Services
23. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Geonames Services
{"weatherObservation":
{"clouds":"broken clouds",
"weatherCondition":"drizzle",
"observation":"LESO 251300Z 03007KT
340V040 CAVOK 23/15 Q1010",
"windDirection":30,
"ICAO":"LESO", ...
24. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
{"weatherObservation":
{"clouds":"broken clouds",
"weatherCondition":"drizzle",
"observation":"LESO 251300Z 03007KT
340V040 CAVOK 23/15 Q1010",
"windDirection":30,
"ICAO":"LESO", ...
Geonames Services
25. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Linked Open Service Principles
REST Principles
1. Application state and functionality is divided into resources
2. Every resource is uniquely addressable
3. All resources share a uniform interface:
a) A constrained set of well-defined operations
b) A constrained set of content types
Linked Data Principles
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information, using
the standards (RDF*, SPARQL)
4. Include links to other URIs. so that they can discover more things.
Linked Open Service Principles
1. Describe services as LOD prosumers with input and output
descriptions as SPARQL graph patterns
2. Communicate RDF by RESTful content negotiation
3. The output should make explicit its relation with the input
26. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
LOS Weather Service
27. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
LOS Geo Resources
28. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Resource-Based Linked Open Services
GET
Accept: text/html
303 REDIRECT /page
GET
Accept:
application/rdf+xml
(or text/n3)
303 REDIRECT /data
LinkedDataLinkedService
GET /weather
Accept:
application/rdf+xml
(or text/n3)
200 <rdf:Description>
29. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Interlinking Data with Data from Services?
30. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Linked Data Services
We’d like to integrate data services with Linked Data
1. LIDS need to adhere to Linked Data principles
We’d like to use data services in software programs
2. LIDS need machine-readable descriptions of input and
output
Compared to naïve approach: assign URI to service output
Relationship between input and output is explicitly
described
Dynamicity is supported
Multiple or no output resources can be linked to input
31. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Interlink LIDS and Linked Data
Generate service URIs
with input bindings, from
evaluating :
select Xi where Ti
sameAs: binding for i
32. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Query Answering using LIDS and Linked Data
Query execution resolves
URIs
=> enlarges data set
LIDS are interlinked
Query is executed again
on new data set
Repeat until no new links
or no new data
Combine results
33. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Experiment: Query Answering
Input:
List of 562 (potential) universities from Facebook Graph API
Output:
Facebook fans and DBpedia student numbers for 104 universities
PREFIX u: <http://openlids.org/universities.rdf#> SELECT ?n
?f ?s WHERE {
u:list foaf:topic ?u . ?u foaf:name ?n .
?u og:fan_count ?f .?u d:numberOfStudents ?s }
34. KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
Linked * Services and PlanetData
Several areas seem likely to produce services:
Stream, inc. Sensor, resources (latest values)
Any others exposing dynamic resources
Dynamic computations, inc. on-the-fly quality
assessments
Other areas seem likely to consider service
technologies and move towards more service-like
HTTP interactions
Access control (OpenID, OAuth, etc.)
Finally, remaining areas could serve to complement
LIDS/LOS alignment
Provenance
Notes de l'éditeur
collect the data from all known sources in advance
preprocess the combined data
store the results in a central database;
queries are evaluated using the local database
parse, normalise, and split the query into subqueries
determine the sources containing results for subqueries
and evaluate the subqueries against the sources directly
Match with later architecture overview animation
MAT:
excellent query response times due to the large amount of preprocessing carried out during the load and indexing steps
aggregated data is never current as collecting and indexing vast amounts of data is time-consuming
from the viewpoint of a single requester with a particular query, there is a large amount of unnecessary data gathering and storage
due to the replicated data storage the data providers have to give up their sole sovereignty on their data (e.g., they cannot restrict or log access any more since queries are answered against a copy of the data)
DQP:
system is more dynamic with up-to-date data
new sources can be added easily without time lag for indexing and integrating the data
the systems require less storage and processing resources at the query issuing site
DQP systems cannot give strict guarantees about query performance since the integration system relies on a large number of potentially unreliable sources
Source selection affects efficiency of query execution
@Juergen: join processing as scan (DL) or in Jena (QTree)?
We want not materialise, but distributed Web Linked Lookups use web architecture (also different to distributed SPARQL)
Traditional approaches assume a few data sources with full query processing capabilities (drei riessen bobbel, 100 kleine quellen)
Linked Data: very large number of relatively small sources (kilobytes to megabytes)
HTTP GET is sole operation
We assume relatively stable source URIs
Focus on tree-shaped conjunctive queries, full SPARQL can be layered on top
The upper right is standard application of Linked Data principles – if you request (state, in the request header, that you accept) HTML, you are redirected to a ‚page‘ URI; if you request RDF, you are redirected to a ‚data‘ URI (i.e. page/data is, in our implementation, appended to the end of the resource‘s URI). This is because the original URI actually identifies the airport but, since the airport is a real thing, not an information resource, you can‘t actually retrieve it in itself, only a related information resource.
The bottom right is how we extend in LOS – under the same URI scheme you can ask for a computation relative to the resource by POSTing to a URI representing the weather under it (the airport).