A single interface for accessing life sciences (LS) data is a natural consequence to master the data deluge in this domain. The data in the LS requires integration and current integrative solutions increasingly rely on the federation of queries for distributed resources. We introduce a federated query processing system name ``BioFed", customised for LS-LOD. BioFed federates SPARQL queries over more than 130 public SPARQL endpoints.
Federated Query Formulation and Processing through BioFed
1. Semantic Web Solutions For Large-Scale
Biomedical Data Analytics (SEWEBMEDA)
Workshop at ESWC2017, Portoroz,
Slovenia
May 28th, 2017
Federated Query Formulation and
Processing through BioFed
Ali Hasnain, Syeda Sana E Zainab, Dure Zehra,
Qaiser Mehmood, Muhammad Saleem and Dietrich
Rebholz-Schuhmann
1
4. INTRODUCTION: EXAMPLE
Return the party membership and news pages about all US presidents.
Party memberships
US presidents
US presidents
News pages
Computation of results require data from both sources
4
6. BIOFED: SOURCE SELECTION
Two steps triple pattern-wise source selection:
1. Road Map lookup for predicate of each triple pattern
Select those sources that contain the predicate
Select all sources if predicate is unbound
2. If subject or object of triple pattern is bound
Send SPARQL ASK query to each of the selected source in step 1, asking
for the complete triple pattern
Prune relevant sources that returns false for the SPARQL ASK query
6
7. BIOFED: SOURCE SELECTION
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
//TP1
//TP3
//TP4
//TP5
//TP2
7
Step 1: Road Map lookup
for rdf:type
S2 S3 S4
DBpedia
RDF
KEGG
RDF
ChEBI
RDF
NYT
RDF
S1 S2 S3 S4
8. BIOFED: SOURCE SELECTION
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
//TP1
//TP3
//TP4
//TP5
//TP2
8
S2 S3 S4
DBpedia
RDF
KEGG
RDF
ChEBI
RDF
NYT
RDF
Step 2: Prune step 1 sources
using SPARQL ASK queries
ASK{ ?president rdf:type
dbpedia:President}
S1 S2 S3 S4
9. BIOFED: SOURCE SELECTION
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
//TP1
//TP3
//TP4
//TP5
//TP2
9
DBpedia
RDF
KEGG
RDF
ChEBI
RDF
NYT
RDF
S1 S2 S3 S4
10. MOTIVATION: SOURCE SELECTION
10
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 = S1TP2 =
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
//TP1
//TP3
//TP4
//TP5
//TP2
DBpedia
RDF
KEGG
RDF
ChEBI
RDF
NYT
RDF
S1 S2 S3 S4
11. MOTIVATION: SOURCE SELECTION
11
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 = S1TP2 =
S1TP3 =
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
//TP1
//TP3
//TP4
//TP5
//TP2
DBpedia
RDF
KEGG
RDF
ChEBI
RDF
NYT
RDF
S1 S2 S3 S4
12. MOTIVATION: SOURCE SELECTION
12
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 = S1TP2 =
S1TP3 = S4TP4 =
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
//TP1
//TP3
//TP4
//TP5
//TP2
DBpedia
RDF
KEGG
RDF
ChEBI
RDF
NYT
RDF
S1 S2 S3 S4
13. MOTIVATION: SOURCE SELECTION
13
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 = S1TP2 =
S1TP3 = S4TP4 =
S1TP5 = S2 S4
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
//TP1
//TP3
//TP4
//TP5
//TP2
DBpedia
RDF
KEGG
RDF
ChEBI
RDF
NYT
RDF
S1 S2 S3 S4