1. RDF Processing for JAVA:
A comparative study
Ioanid Tibuleac, Cristian Turlica,
1
Facultatea de Informatica, Universitatea „Al. I. Cuza“, Iasi, Romania
{ioanid.tibuleac, cristian.turlica}@info.uaic.ro
Abstract. This paper aims to be an introduction to some RDF processing APIs
for Java developers. The APIs that are given a brief description here are Jena
and Sesame. Aspects like RDF storage capabilities, RDF access through
SPARQL queries and overall programmer support are taken into consideration.
Some tests have been conducted to estimate which of the two runs SPARQL
queries faster on an in memory graph read from a file. Conclusions are that Jena
will generally run slower then Sesame when executing a single query, but its
optimizations allow it to perform better when executing a sequence of queries.
Keywords: API, RDF, SPARQL, Java, Jena, Sesame.
1 Introduction
This paper discusses certain aspects of RDF processing APIs for Java developers. We
have chosen two of the most used APIs, according to our own opinion. Jena and
Sesame offer RDF data access, storage in files, sql or native RDF databases, querying
and inferencing. These features have made us select them as our test case.
2 Jena RDF Api
Jena is an open source Semantic Web Framework for Java developed by researchers
from HP Labs Semantic Web Programme [1]. It provides support for RDF
manipulation, from creation and storage of statements, to SPARQL queries and RDF
graph operations. Besides the RDF API, the Jena framework also contains the OWL
API, a component for processing ontology, and a rule based inference engine.
In Jena RDF elements have been modeled into Java classes. The RDF graph
concept is called model and is handled using an instance of the Model class. Other
concepts like resource, property and literal are represented by the Resource, Property
and Literal interfaces.
These interfaces are contained in the jena.rdf.model package, toghether with a
ModelFactory that allows the creation of models with various storage methods. The
model is built as a set of statements, thus elimininating the existance of duplicates,
and supports the union, intersection and difference graph operations. The
2. jena.rdf.impl package offers implementation for the interfaces of the RDF elements
that is used by the model.
The Jena framework offers various representation modes for RDF triples. Besides
memory and file storage, Jena comes with two systems designed to persist RDF
information, the TDB and SDB.
File level storage is obtained using Java InputStreams and OutputStreams. Though
the API also contains methods to read or write triples with either a Java Reader or
Writer class, there is a strong warning about using these methods when writing files.
Problems may appear due to the encoding of the output file. Supported RDF formats
are RDF/XML, RDF/XML-ABBREV, N3,N-TRIPLE and TURTLE.
The SDB offers RDF persistence using conventional SQL databases. As a result,
specific database tools can be used to improve and secure data access, while offering
support for SPARQL queries. A multitude of database management systems can be
used, including Microsoft Sql Server 2005, Oracle 10g and PostgreSQL.
The TDB offers native support for triples and SPARQL queries, allowing custom
indexing and storage. This Java engine uses both static and dinamic optimization of
SPARQL queries, taking into account partially retrieved data. These features make
the TDB engine faster then the SBD, according to the developers.
The Jena framework contains an implementation of the W3C SPARQL
specifications, the ARQ query engine. The access given by the model interface is
limited to iterating statements that satisfy certain conditions, but this is extended by
the jena.rdf.query package. The supported SPARQL constructs are SELECT,
CONSTRUCT, DESCRIBE and ASK.
The Jena framework comes with documentation and tutorials that allow
programers to easily test its capabilities. In depth information is also available for
more experienced users. Community information is also available on various sites,
like [5], showing that the Jena framework is used and that its development will be
continued.
3 Sesame
Sesame is an open source framework for storage, infering and querying the RDF data
[2]. The RDF API may be used to manipulate statements in a normal java application,
or as a part of a client –server application. The Sesame framework also contains a
Http Server that can be addressed using the SPARQL protocol.
The Sesame framework has a more complex architecture. At the bases of the
architecture is the RDF Model where the basic RDF concepts, like literal or statement,
are defined as interfaces. There are other specialized components, like the Rio (RDF
I/O) that manage reading and writing RDF to various file formats and the Sail API
(Storage and Inference API) that gives uniform access to a RDF storage regardless of
what it may be. The API used to manipulate RDF data at a higher level is the
Repository API that offers access via the Sail API or via Http to a remote repository.
Sesame offers in memory, native and remote access to RDF data.
The Sesame framework uses the SeRQL (Sesame RDF Query Language).
Apperantly this language is very similar to SPARQL and features have been adopted
3. back and forth between the two. Thought we did not take time to notice significant
differences between the two, a partially different language then the standard may
require additional time to get used to.
The Sesame framework comes with a lot of documentation, but unfortunately it
may prove to be too difficult for less experienced users. Running a simple program
has proven, at first, a little difficult for us, because of the additional libraries used by
the Repository API (for example Simple Logging Façade for Java). As a result we
have turned to online help like [4]. Overall, the documentation is perhaps more
detailed then the one for Jena, but simple examples are scarce.
4 SPARQL Tests
We have made several tests using the two APIs and a two RDF files that vary in size.
The development environment used was Eclipse. We have used a code sample for
Sesame available at [4]. Our main focus was testing SPARQL query execution speed,
using files as storage for the RDF statements.
The first RDF file is a larger file containing information about sessions and
speakers at a conference [3]. The SPARQL query selects information about distinct
presentations:
SELECT DISTINCT ?title ?presenter ?description
WHERE
{
?session rdf:type svcc:Session .
?session dc:title ?title .
?session svcc:presenter ?presenter .
?session dc:description ?description .
};
Execution times obviously favor Sesame over Jena (as shown in the table below).
The documentation for Jena explains that a search is conducted for the reuse of the
rdf:ID element and this may cause a slower response when reading large files.
Query 1 execution Jena Sesame
1 2172 656
2 2094 625
3 2125 687
4 2062 625
5 2031 641
The same situation occurs for the second query that searches in a file containing
information about semantic web tools [6], though the timing difference is reduced.
SELECT ?nume ?url ?limbaj
WHERE {
[g:label ?nume;
g:URL ?url ;
4. g:FOSS ?foss ;
g:Category ?categ ;
g:Language ?limbaj ] .
FILTER( ?foss = ‘Yes’ &&
?categ = ‘Database/Datastore’ &&
(?limbaj = ‘PHP’ || regex (?limbaj, ‘^C’))) .
} ORDER BY ?limbaj
Query 2 execution Jena Sesame
1 1860 656
2 1875 672
3 1891 672
4 1875 688
5 1813 688
Runing both tests shows that Jena’s execution speed increases as more queries are
made, getting close to the performance of Sesame.
Combined Jena Q1 Jena Q2 Sesame Q1 Sesame Q2
execution
1 2110 234 765 188
2 2782 250 985 265
3 3063 406 1156 250
4 2156 187 719 187
5 2251 265 735 203
Out initial tests were somewhat different because we used a Sesame repository
object with inferencing, although there was no need for it. In this case, Sesame’s
performance decreased but it still managed to outrun Jena on single query execution.
However, multiple query execution confirmed that Jena can perform better in such
cases.
In conclusion, we see the Jena RDF API as an easier starting point for most
programmers, thought it might not be as complex as the Sesame framework.
References
1. Jena website, http://jena.sourceforge.net/documentation.html
2. Sesame website, http://www.openrdf.org/documentation.jsp
3. Hewett Research, http://www.hewettresearch.com/svcc2009/
4. “How to use the Sesame Java API to power a Web or Client – Server Application",
http://answers.oreilly.com/topic/447-how-to-use-the-sesame-java-api-to-power-a-web-or-
client-server-application/
5. “Jena, A Java API for RDF”, http://www.docstoc.com/docs/13042314/Jena-----A-Java-API-
for-RDF
6. Sweet rdf file, http://profs.info.uaic.ro/~busaco/teach/courses/wade/demos/sparql/sparql.zip