XSPARQL is a language for transforming data between XML and RDF. XML is a widely used format for data exchange. RDF is a data format based on directed graphs, primarily used to represent Semantic Web data. XSPARQL is built by combining the strengths of the two corresponding query languages XQuery for XML, and SPARQL for RDF. In this talk we will present two new XSPARQL enhancements called Constructed Dataset and Dataset Scoping, the XDEP dependent join optimization, and a new XSPARQL implementation. Constructed Dataset allows to create intermediary RDF graphs while querying data sources. The Dataset Scoping enhancement provides an optional fix for unintended results which may occur when evaluating complex XSPARQL queries containing nested SPARQL queries. The XSPARQL implementation works by first rewriting an XSPARQL to XQuery expressions containing interleaved calls to a SPARQL engine. The resulting query is then evaluated by standard XQuery and SPARQL engines. The dependent join optimization XDEP is designed to reduce query evaluation time for queries demanding repeated evaluation of embedded SPARQL query parts. XDEP minimizes the number of interactions between the XQuery and SPARQL engines by bundling similar queries and let XQuery engines select relevant data on their own. We did an experimental evaluation of our approach using an adapted version of the XQuery benchmark suite XMark. We will show that the XDEP optimization reduces the evaluation time of all compatible benchmark queries. Using this optimization we could evaluate certain XSPARQL queries by two orders of magnitude faster than with unoptimized XSPARQL.
See also http://stefanbischof.at/masterthesis/ for the full text.
Implementation and Optimization of Queries in XSPARQL
1. Towards XSPARQL 1.1: New Features
and Optimization
XML XSPARQL RDF
New Features and
Faster Query Evaluation
for XSPARQL
Master’s Thesis Presentation of Stefan Bischof
October 1, 2010
2. Data Representation
XML (Extensible Markup Language) is a markup
language designed for data exchange over the internet.
Documents and other data are represented as trees.
RDF (Resource Description Framework) is a
framework for describing arbitrary resources.
Resources and their relations are represented as
directed graphs. Mainly used for Semantic Web data.
XML XSPARQL RDF
3. Example Data
knows knows
knows
The ‘knows’ relation is directed; Charles knows
nobody in this example.
4. RDF and XML
relations.xml relations.rdf
<relations>
<person name="Alice"> @prefix foaf: <http://xmlns.com/foaf/0.1/> .
<knows>Bob</knows> _:b1 a foaf:Person;
<knows>Charles</knows> foaf:name "Alice";
</person> foaf:knows _:b2;
<person name="Bob"> foaf:knows _:b3.
<knows>Charles</knows> _:b2 a foaf:Person; foaf:name "Bob";
</person> foaf:knows _:b3.
<person name="Charles"/> _:b3 a foaf:Person; foaf:name "Charles".
</relations>
relations
foaf:Person Charles
rdf:type
Alice
person person person
rdf:type Bob
rdf:type
foaf:name
foaf:name
foaf:name
name knows knows name knows name _:b1 foaf:knows _:b2 foaf:knows _:b3
Alice Bob Charles Bob Charles Charles foaf:knows
5. XQuery + SPARQL = XSPARQL
XQuery is a functional query language designed for
processing XML data. Large function library; superset of
XPath 2.0
SPARQL is a query language for RD. It uses graph
patterns to filter RDF data.
XML XSPARQL RDF
6. What’s XSPARQL?
Simplifies data transformation by combining the
advantages of XQuery and SPARQL:
Provides RDF graph pattern matching to XQuery ➔
serialization format agnostic access to RDF data
SPARQL gains access to large XQuery function library
+ subqueries
XML XSPARQL RDF
7. XSPARQL Example
Convert from RDF to
XML
for $name from <people.rdf>
where {
For each person P: $person a foaf:Person .
$person foaf:name $name . }
Print P’s name return
<person name=”{$name}”> {
for $fname from <people.rdf>
For each friend F of P: where { $pers1 foaf:name $name.
$pers1 foaf:knows $friend .
$friend foaf:name $fname . }
Print F’s name return
<knows name=”{$fname}”/>
}
</person>
10. Implementation
Input
XSPARQL XML RDF
query data data
Query XSPARQL XQuery XQuery SPARQL
processing rewriter query engine engine
HTTP
Output XML or
RDF
11. Query Evaluation
XQuery SPARQL
for $name from <people.rdf>
where {
$person a foaf:Person .
$person foaf:name $name . }
return
<person name=”{$name}”> {
for $fname from <people.rdf>
where { $pers1 foaf:name $name.
$pers1 foaf:knows $friend .
$friend foaf:name $fname . }
return
<knows name=”{$fname}”/>
}
</person>
12. Query Evaluation
SELECT $name
XQuery SPARQL FROM <people.rdf>
WHERE {
$person a foaf:Person .
$person foaf:name $name . }
for $name from <people.rdf>
where {
$person a foaf:Person .
$person foaf:name $name . }
return
<person name=”{$name}”> {
for $fname from <people.rdf>
where { $pers1 foaf:name $name.
$pers1 foaf:knows $friend .
$friend foaf:name $fname . }
return
<knows name=”{$fname}”/>
}
</person>
13. Query Evaluation
SELECT $name
XQuery SPARQL FROM <people.rdf>
WHERE {
$person a foaf:Person .
$person foaf:name $name . }
for $name from <people.rdf>
where {
$person a foaf:Person .
$person foaf:name $name . }
return
<person name=”{$name}”> {
for $fname from <people.rdf>
# $namewhere { $pers1 foaf:name $name.
$pers1 foaf:knows $friend .
1 “Alice”$friend foaf:name $fname . }
return
2 “Bob” <knows name=”{$fname}”/>
}
3 “Charles”
</person>
14. Query Evaluation
XQuery SPARQL
for $name from <people.rdf>
where {
$person a foaf:Person .
$person foaf:name $name . }
return
<person name=”{$name}”> {
for $fname from <people.rdf>
where { $pers1 foaf:name $name.
$pers1 foaf:knows $friend .
$friend foaf:name $fname . }
return
<knows name=”{$fname}”/>
}
</person>
15. Query Evaluation
SELECT $fname
XQuery SPARQL FROM <people.rdf>
WHERE { $pers1 foaf:name “Alice” .
$pers1 foaf:knows $friend .
$friend foaf:name $fname . }
for $name from <people.rdf>
where {
$person a foaf:Person .
$person foaf:name $name . }
return
<person name=”{$name}”> {
for $fname from <people.rdf>
where { $pers1 foaf:name $name.
$pers1 foaf:knows $friend .
$friend foaf:name $fname . }
return
<knows name=”{$fname}”/>
}
</person>
16. Query Evaluation
SELECT $fname
XQuery SPARQL FROM <people.rdf>
WHERE { $pers1 foaf:name “Alice” .
$pers1 foaf:knows $friend .
$friend foaf:name $fname . }
for $name from <people.rdf>
where {
$person a foaf:Person .
$person foaf:name $name . }
return
<person name=”{$name}”> {
for $fname from <people.rdf>
where { $pers1 foaf:name $name.
$pers1 foaf:knows $friend .
# $fname $friend foaf:name $fname . }
return
1 “Bob” <knows name=”{$fname}”/>
}
2 </person>
“Charles”
17. Query Evaluation
XQuery SPARQL
for $name from <people.rdf>
where {
$person a foaf:Person .
$person foaf:name $name . }
return
<person name=”{$name}”> {
for $fname from <people.rdf>
where { $pers1 foaf:name $name.
$pers1 foaf:knows $friend .
$friend foaf:name $fname . }
return
<knows name=”{$fname}”/>
}
</person>
18. Query Evaluation
SELECT $fname
XQuery SPARQL FROM <people.rdf>
WHERE { $pers1 foaf:name “Bob” .
$pers1 foaf:knows $friend .
$friend foaf:name $fname . }
for $name from <people.rdf>
where {
$person a foaf:Person .
$person foaf:name $name . }
return
<person name=”{$name}”> {
for $fname from <people.rdf>
where { $pers1 foaf:name $name.
$pers1 foaf:knows $friend .
$friend foaf:name $fname . }
return
<knows name=”{$fname}”/>
}
</person>
19. Query Evaluation
SELECT $fname
XQuery SPARQL FROM <people.rdf>
WHERE { $pers1 foaf:name “Bob” .
$pers1 foaf:knows $friend .
$friend foaf:name $fname . }
for $name from <people.rdf>
where {
$person a foaf:Person .
$person foaf:name $name . }
return
<person name=”{$name}”> {
for $fname from <people.rdf>
where { $pers1 foaf:name $name.
$pers1 foaf:knows $friend .
$friend foaf:name $fname . }
return
# $fname
}
<knows name=”{$fname}”/>
</person>
1 “Charles”
20. Query Evaluation
XQuery SPARQL
for $name from <people.rdf>
where {
$person a foaf:Person .
$person foaf:name $name . }
return
<person name=”{$name}”> {
for $fname from <people.rdf>
where { $pers1 foaf:name $name.
$pers1 foaf:knows $friend .
$friend foaf:name $fname . }
return
<knows name=”{$fname}”/>
}
</person>
21. Query Evaluation
SELECT $fname
XQuery SPARQL FROM <people.rdf>
WHERE { $pers1 foaf:name “Charles” .
$pers1 foaf:knows $friend .
$friend foaf:name $fname . }
for $name from <people.rdf>
where {
$person a foaf:Person .
$person foaf:name $name . }
return
<person name=”{$name}”> {
for $fname from <people.rdf>
where { $pers1 foaf:name $name.
$pers1 foaf:knows $friend .
$friend foaf:name $fname . }
return
<knows name=”{$fname}”/>
}
</person>
22. Query Evaluation
SELECT $fname
XQuery SPARQL FROM <people.rdf>
WHERE { $pers1 foaf:name “Charles” .
$pers1 foaf:knows $friend .
$friend foaf:name $fname . }
for $name from <people.rdf>
where {
$person a foaf:Person .
$person foaf:name $name . }
return
<person name=”{$name}”> {
for $fname from <people.rdf>
where { $pers1 foaf:name $name.
$pers1 foaf:knows $friend .
$friend foaf:name $fname . }
return
<knows name=”{$fname}”/>
}
# $fname
</person>
23. Query Evaluation
XQuery SPARQL
for $name from <people.rdf>
where {
$person a foaf:Person .
$person foaf:name $name . }
return
<person name=”{$name}”> {
for $fname from <people.rdf>
where { $pers1 foaf:name $name.
$pers1 foaf:knows $friend .
$friend foaf:name $fname . }
return
<knows name=”{$fname}”/>
}
</person>
24. RDF to XML
relations.xml relations.rdf
<relations>
<person name="Alice"> @prefix foaf: <http://xmlns.com/foaf/0.1/> .
<knows>Bob</knows> _:b1 a foaf:Person;
<knows>Charles</knows> foaf:name "Alice";
</person> foaf:knows _:b2;
<person name="Bob"> foaf:knows _:b3.
<knows>Charles</knows> _:b2 a foaf:Person; foaf:name "Bob";
</person> foaf:knows _:b3.
<person name="Charles"/> _:b3 a foaf:Person; foaf:name "Charles".
</relations>
25. XDEP Join Optimization
Goal: Improve performance for nested queries
Reduce number of SPARQL calls: N ➔ 1
only one single SPARQL call for the inner loop
join is performed later using XQuery
Constraint: applicable only for dependent joins
join variable always bound
Saves communication and repeated evaluation time
26. XDEP Join Optimization
XQuery SPARQL
for $name from <people.rdf>
where {
$person a foaf:Person .
$person foaf:name $name . }
return
<person name=”{$name}”> {
for $fname from <people.rdf>
where { $pers1 foaf:name $name.
$pers1 foaf:knows $friend .
$friend foaf:name $fname . }
return
<friend name=”{$fname}”/>
}
</person>
27. XDEP Join Optimization
XQuery SPARQL
for $name from <people.rdf>
where {
$person a foaf:Person .
$person foaf:name $name . }
return
<person name=”{$name}”> {
for $fname from <people.rdf>
where { $pers1 foaf:name $name.
$pers1 foaf:knows $friend .
$friend foaf:name $fname . }
return
<friend name=”{$fname}”/>
}
</person>
28. Practical Performance Evaluation
Use common XQuery benchmark suite XMark
Generate test documents of different sizes (5-100MB)
Automatically translate test documents to RDF
Manually translate queries from XQuery to XSPARQL
Compare standard XSPARQL with optimized
XSPARQL
31. XDEP Performance Increase
1000
Performance gain factor
Query 9
100
Query 8
Unoptimized time
10
optimized time
Query 10
1
0 5 10 15 20
Dataset size (MB)
32. XDEP Performance Increase
1000
Performance gain factor
Query 9
100
Query 8
10
Query 10
1
10 100 1000 10000
Number of outer loop iterations
= number of saved SPARQL calls
33. New Features
Dataset Scoping
Fixes unintended behavior of nested queries
Constructed Dataset
Create and query temporary data source
34. New Features Dataset Scoping
Fixes unintended behavior of nested SPARQL parts
Reuse variable bound to blank node in inner query
Variable behaves as free/unbound variable again
because
Scope of blank node is limited to one
dataset/SPARQL query
Blank nodes in graph patterns are similar to
unbound variables
Dataset Scoping extends the scope of a dataset
over subqueries and allows blank node joins
35. New Features Constructed Dataset
Create intermediary RDF graphs to be used in
the same query as data source
Use cases
Query aggregated data using XQuery’s built-in
functions.
Manually optimize queries by preselecting relevant
parts of a data source.
36. Conclusions
Enhance XSPARQL Capabilities
Constructed Dataset and Dataset Scoping features are
valuable additions to XSPARQL
Increase Evaluation Performance
Dependent join optimization XDEP offers a
confirmed performance increase for nested
XSPARQL queries
More info and demo http://xsparql.deri.org
37. References
Akhtar W., Kopecký J., Krennwallner T., and Pollers A. XSPARQL:Traveling between the
XML and RDF worlds - and Avoiding the XSLT pilgrimage. In 5th European Semantic
Web Conference (ESWC2008), pages 432–447, 2008.
Bischof S. Implementation and Optimization of Queries in XSPARQL. Master’s Thesis,
Vienna University of Technology, 2010.
Bray T., Paoli J., Sperberg-McQueen C. M., Maler E., and Yergeau F. Extensible Markup
Language (XML) 1.0 (Fifth Edition). http:// www.w3.org/TR/xml/, November 2008.
W3C Recommendation.
Boag S., Chamberlin D., Fernández M. F., Florescu D., Robie J., and Siméon J. XQuery
1.0: An XML Query Language. http://www.w3.org/TR/xquery/, January 2007.
W3C
Recommendation.
Manola F. and Miller E. RDF Primer. http://www.w3.org/TR/rdf-primer/, February
2004. W3C Recommendation.
Prud’hommeaux E. and Seaborne A. SPARQL Query Language for RDF. http://
www.w3.org/TR/rdf-sparql-query/, January 2008. W3C Recommendation.
38. Example Dataset Scoping
for $name $person from <people.rdf>
where {
$person a foaf:Person .
$person foaf:name $name . }
return
<person name=”{$name}”> {
for $fname from <people.rdf>
where { $pers1 foaf:name $name.
$person foaf:knows $friend .
$friend foaf:name $fname . }
return
<friend name=”{$fname}”/>
}
</person>
39. Example Dataset Scoping
for $name $person from <people.rdf>
where {
$person a foaf:Person .
$person foaf:name $name . }
return
<person name=”{$name}”> {
for $fname from <people.rdf>
where { $pers1 foaf:name $name.
$person foaf:knows $friend .
$friend foaf:name $fname . }
return
<friend name=”{$fname}”/>
}
</person>
40. Example Dataset Scoping
# $name $person
1 “Alice” _:b1
2 “Bob” _:b2
3 “Charles” _:b3
for $name $person from <people.rdf>
where {
$person a foaf:Person .
$person foaf:name $name . }
return
<person name=”{$name}”> {
for $fname from <people.rdf>
where { $pers1 foaf:name $name.
$person foaf:knows $friend .
$friend foaf:name $fname . }
return
<friend name=”{$fname}”/>
}
</person>
41. Example Dataset Scoping
# $name $person
1 “Alice” _:b1
2 “Bob” _:b2
3 “Charles” _:b3
for $name $person from <people.rdf>
where {
$person a foaf:Person .
$person foaf:name $name . }
return
<person name=”{$name}”> {
for $fname from <people.rdf>
where { $pers1 foaf:name $name.
$person foaf:knows $friend .
$friend foaf:name $fname . }
SELECT $fname return
FROM <people.rdf> <friend name=”{$fname}”/>
WHERE { }
_:b1 foaf:knows $friend. </person>
$friend foaf:name $fname.}
42. Example Dataset Scoping
# $name $person
1 “Alice” _:b1
2 “Bob” _:b2
3 “Charles” _:b3
for $name $person from <people.rdf>
where {
$person a foaf:Person .
$person foaf:name $name . }
return
<person name=”{$name}”> {
for $fname from <people.rdf>
where { $pers1 foaf:name $name.
$person foaf:knows $friend .
$friend foaf:name $fname . }
SELECT $fname return
FROM <people.rdf> <friend name=”{$fname}”/> $fname
#
WHERE { } 1 “Bob”
_:b1 foaf:knows $friend. </person> 2 “Bob”
$friend foaf:name $fname.}
3 “Charles”
45. Example Dataset Scoping
<person name=”Alice”> # $name $person
<friend name=”Bob” />
<friend name=”Charles” /> 1 “Alice” _:b1
Problem:
<friend name=”Charles” /> 2 “Bob” _:b2
</person> Blank nodes scoped to
1. 3 “Charles” _:b3
<person name=”Bob”>
<friend name=”Bob” /> query
single SPARQL for $name $person from <people.rdf>
2. name=”Charles” />
Blank nodes in graphwhere {
<friend name=”Charles” />
<friend
patterns are like variables$person a foaf:Person . . }
</person>
$person foaf:name $name
<person ➔ Variable bound to blank
name=”Charles”>
return
<friend name=”Bob” />
<person name=”{$name}”> {
<friend name=”Charles” free
node behaves as /> for $fname from <people.rdf>
<friend name=”Charles” />
variable
</person>
where { $pers1 foaf:name $name.
$person foaf:knows $friend .
$friend foaf:name $fname . }
SELECT $fname return
FROM <people.rdf> <friend name=”{$fname}”/> $fname
#
WHERE { } 1 “Bob”
_:b1 foaf:knows $friend. </person> 2 “Bob”
$friend foaf:name $fname.}
3 “Charles”
46. Example Dataset Scoping
<person name=”Alice”> Solution: # $name $person
<friend name=”Bob” /> Extend Scope 1 “Alice” _:b1
<friend name=”Charles” />
Problem:
<friend name=”Charles” /> of Blank node/ 2 “Bob” _:b2
</person> Blank nodes scoped to
1.
<person name=”Bob”>
Dataset 3 “Charles” _:b3
<friend name=”Bob” /> query
single SPARQL for $name $person from <people.rdf>
2. name=”Charles” />
Blank nodes in graphwhere {
<friend name=”Charles” />
<friend
patterns are like variables$person a foaf:Person . . }
</person>
$person foaf:name $name
<person ➔ Variable bound to blank
name=”Charles”>
return
<friend name=”Bob” />
<person name=”{$name}”> {
<friend name=”Charles” free
node behaves as /> for $fname from <people.rdf>
<friend name=”Charles” />
variable
</person>
where { $pers1 foaf:name $name.
$person foaf:knows $friend .
$friend foaf:name $fname . }
SELECT $fname return
FROM <people.rdf> <friend name=”{$fname}”/> $fname
#
WHERE { } 1 “Bob”
_:b1 foaf:knows $friend. </person> 2 “Bob”
$friend foaf:name $fname.}
3 “Charles”
47. Example Dataset Scoping
<person name=”Alice”> Solution: # $name $person
<friend name=”Bob” /> Extend Scope 1 “Alice” _:b1
<friend name=”Charles” />
Problem:
<friend name=”Charles” /> of Blank node/ 2 “Bob” _:b2
</person> Blank nodes scoped to
1.
<person name=”Bob”>
Dataset 3 “Charles” _:b3
<friend name=”Bob” /> query
single SPARQL for $name $person from <people.rdf>
2. name=”Charles” />
Blank nodes in graphwhere {
<friend name=”Charles” />
<friend
patterns are like variables$person a foaf:Person . . }
</person>
$person foaf:name $name
<person ➔ Variable bound to blank
name=”Charles”>
return
<friend name=”Bob” />
<person name=”{$name}”> {
<friend name=”Charles” free
node behaves as /> for $fname from <people.rdf>
<friend name=”Charles” />
variable
</person>
where { $pers1 foaf:name $name.
$person foaf:knows $friend .
$friend foaf:name $fname . }
SELECT $fname return
FROM <people.rdf> <friend name=”{$fname}”/> $fname
#
WHERE { } 1 “Bob”
_:b1 foaf:knows $friend. </person> 2 “Bob”
$friend foaf:name $fname.}
3 “Charles”
48. Example Dataset Scoping
<person name=”Alice”> Solution: # $name $person
<friend name=”Bob” /> Extend Scope 1 “Alice” _:b1
<friend name=”Charles” />
Problem:
<friend name=”Charles” /> of Blank node/ 2 “Bob” _:b2
</person> Blank nodes scoped to
<person 1.
name=”Alice”>
<person name=”Bob”> />
Dataset 3 “Charles” _:b3
<friend name=”Bob”
<friend name=”Bob” /> query
single SPARQL />
name=”Charles” for $name $person from <people.rdf>
2. name=”Charles” />
Blank nodes in graphwhere {
<friend name=”Charles” />
<friend
</person>
patterns are like variables$person a foaf:Person . . }
</person>
<person name=”Bob”> $person foaf:name $name
<person ➔ Variable bound to blank
name=”Charles”>
return
<friend name=”Bob” />
<person name=”{$name}”> {
<friend name=”Charles” free
node behaves as /> for $fname from <people.rdf>
<friend name=”Charles” />
variable
</person>
</person>
<person name=”Charles”>
where { $pers1 foaf:name $name.
$person foaf:knows $friend .
$friend foaf:name $fname . }
SELECT $fname return
FROM <people.rdf> <friend name=”{$fname}”/> $fname
#
</person>
WHERE { } 1 “Bob”
_:b1 foaf:knows $friend. </person> 2 “Bob”
$friend foaf:name $fname.}
3 “Charles”
Notes de l'éditeur
Explain layers
Rewrite XSPARQL query to an XQuery query, then evaluate XQuery.
if the original query contains SPARQL graph patterns, sparql engine is used for evaluation.
rely on standard XQuery and SPARQL engines
Communication over HTTP optimization