RAP, ARC, and Raptor are PHP libraries for processing RDF. RAP provides classes for manipulating RDF models and triples. ARC uses associative arrays to represent triples and resources, providing faster performance. Raptor is a C library with PHP bindings that parses RDF files into models. It had the highest performance on tests, passing all tests, while RAP and ARC passed most but not all tests.
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Comparative study on the processing of RDF in PHP
1. Comparative study on the processing of RDF in PHP
Gabriel-Ştefan Munteanu, Nicu-Cosmin Ursache
Faculty of Computer Science Iasi
stefan.munteanu@infoiasi.ro, nicu.ursache@infoiasi.ro
Abstract. Sharing of content on the Web is already possible through other
technologies such as FTP. It is therefore difficult to understand the need for a
single Web-based format when already there are enough formats such as
relational databases with annotated data that can be reused by other systems.
Putting information into RDF files, makes it possible for computer programs to
search, discover, pick up, collect, analyze and process information from the
web. Using RDF, a Web browser should be able to reuse the data, requiring no
additional work on the part of users, and here comes the tricky part to make
easier for web programmers to work with RDF by using some RDF libraries.
Keywords: RDF, API, RAP, Raptor, ARC, SPARQL
1. Introduction
To help web programmers choose what RDF library to work with, when their project
requires, they should have a comparative study of existing RDF API’s based on PHP.
W3C offers a lot of information of parsing RDF file tests with some API’s but there
are not touching all the view points, so this article it trying to make a short
comparative presentation of 3 packages that are offering RDF support. This packages
are RDF API for PHP (that from now on we will call RAP for short), ARC API and
RAPTOR RDF Parser Library (even if it is not written in PHP but in C it is included
in this study because it can be ported in PHP very easy by using Redland RDF
Language Bindings and it has very good performances).
2. API Description
2.1. RAP - RDF API for PHP
RAP[1] is a software package for parsing, querying, manipulating, serializing and serving RDF
models.
2. 2 Gabriel-Ştefan Munteanu, Nicu-Cosmin Ursache
Its features include:
• statement-centric methods for manipulating an RDF model as a set of RDF
triples
• resource-centric methods for manipulating an RDF model as a set of
resources
• ontology-centric methods for manipulating an RDF model through
vocabulary specific methods
• quad- and named graph-centric methods for manipulating RDF datasets
• integrated RDF/XML, N3, N-TRIPLE, TriX, GRDDL, RSS parser
• integrated RDF/XML, N3, N-TRIPLE, TriX serializer
• in-memory or database model storage
• SPARQL query engine supporting all features of the W3C SPARQL
Recommendation
• SPARQL client library
• RDQL query engine
• inference engine supporting RDF-Schema reasoning and some OWL
entailments
• integrated RDF server providing similar functionality as the Joseki RDF
server
• integrated linked data frontend
• graphical user-interface for managing database-backed RDF models
• support for common vocabularies
• drawing graph visualizations
2.2. Raptor
Raptor[2] is a free software / Open Source C library that provides a set of parsers and
serializers that generate Resource Description Framework (RDF) triples by parsing
syntaxes or serialize the triples into a syntax. The supported parsing syntaxes are
RDF/XML, N-Triples, TRiG, Turtle, RSS tag soup including all versions of RSS,
Atom 1.0 and 0.3, GRDDL and microformats for HTML, XHTML and XML and
RDFa. The serializing syntaxes are RDF/XML (regular, and abbreviated), Atom 1.0,
GraphViz, JSON, N-Triples, RSS 1.0 and XMP.
Raptor was designed to work closely with the Redland RDF library (RDF Parser
Toolkit for Redland) but is entirely separate. It is a portable library that works across
many POSIX systems (Unix, GNU/Linux, BSDs, OSX, cygwin, win32). Raptor has
no memory leaks and is fast.
This is a mature and stable library:
• Parses content on the web if libcurl, libxml2 or BSD libfetch is available.
• Supports all RDF terms including datatyped and XML literals
• Optional features including parsers and serialisers can be selected at
configure time.
• Language bindings to Perl, PHP, Python and Ruby when used via Redland
• No memory leaks
• Fast
3. Comparative study on the processing of RDF in PHP 3
• Standalone rapper RDF parser utility program
2.3. ARC
ARC[3] is a flexible RDF system for semantic web and PHP practitioners.
Components & Features
• ConNeg-capable Web Reader - Support for proxies, redirects, and Content
Negotiation
• Various parsers - RDF/XML, Turtle, SPARQL + SPOG, Legacy XML,
HTML tag soup, RSS 2.0, Google Social Graph API JSON
• Serializers - N-Triples, RDF/JSON, RDF/XML, Turtle, SPOG dumps
• 2 internal structures - resource-centric processing, statement-centric
processing
• RDF Storage (using MySQL) - SPARQL SELECT, ASK, DESCRIBE,
CONSTRUCT, + aggregates, LOAD, INSERT, and DELETE
• SPARQL Endpoint Class - Set up a compliant SPARQL endpoint with 3
lines of code
• SemHTML RDF extractors - DC, eRDF, microformats, OpenID, RDFa
• RemoteStore Class - Query remote SPARQL endpoints as if they were local
stores (results are returned as native PHP arrays)
• Turtle templating - Generate dynamic graphs
• Plugins - Extend ARC with your own, custom extensions
• Triggers - Register event handlers for selected SPARQL Query types
• SPARQLScript - SPARQL-based scripting and output templating
3. RDF triples storage
The RAP classes are split into three main packages: model, syntax and util.
The model package includes all the classes to create or read specific elements of an
RDF model, including reading or creating complete statements from a model or their
individual components.
These classes are:
- BlankNode - used to create a blank node, to get the bnode identifier,
or check equality between two bnodes
- Literal - support for model literals
- Model - contains methods to build or read a specific RDF model
- Node - an abstract RDF node
- Resource - support for model resources
- Statement - creating or manipulating a complete RDF triple
The util class Object is another abstract class with some general methods
overloaded in classes built on it, so it's of no interest for our purposes. However, the
4. 4 Gabriel-Ştefan Munteanu, Nicu-Cosmin Ursache
RDFUtil class provides some handy methods, including the method
writeHTMLTable to output an RDF/XML document in nice tabular form.
The RAPTOR is a collection of functions separated in files based on specify
functionality:
Parsers:
- RDF/XML Parser,
- N-Triples Parser,
- Turtle Parser,
- TRiG Parser,
- RSS "tag soup" parser,
- RDFa parser.
Serializers:
- RDF/XML Serializer,
- N-Triples Serializer,
- Atom 1.0 Serializer,
- JSON Serializers,
- RSS 1.0 Serializer,
- Turtle Serializer,
- XMP Serializer.
The triple structure in RAPTOR is memorate like this:
typedef struct {
const void *subject;
raptor_identifier_type subject_type;
const void *predicate;
raptor_identifier_type predicate_type;
const void *object;
raptor_identifier_type object_type;
raptor_uri *object_literal_datatype;
const unsigned char *object_literal_language;
} raptor_statement;
and from the second version of RAPTOR was added an wrapper like this:
typedef struct {
raptor_world* world;
raptor_statement *s;
} raptor_statement_v2;
ARC uses object-oriented code for its components and methods, but the processed
data structures consist of simple associative arrays, which leads to faster operations
and less memory consumption. Apart from a few special formats returned by the
SPARQL engine (e.g. from SELECT or INSERT queries), ARC is built around two
core structures: triple sets and resource indexes.
Triple sets
5. Comparative study on the processing of RDF in PHP 5
A triple set is a flat array that contains (associative) triple arrays. Triple sets can be
processed with a simple loop:
...
$triples = $parser->getTriples();
for ($i = 0, $i_max = count($triples); $i < $i_max;
$i++) {
$triple = $triples[$i];
...
}
A single triple array can contain the following keys:
s - the subject value (a URI, Bnode ID, or Variable)
p - the property URI (or a Variable)
o - the subject value (a URI, Bnode ID, Literal, or Variable)
s_type - "uri", "bnode", or "var"
o_type - "uri", "bnode", "literal", or "var"
o_datatype - a datatype URI
o_lang - a language identifier, e.g. ("en-us")
Resource Indexes
A resource index is an associative array of triples indexed by subject -> predicates ->
objects.
$index = array(
'_:john' => array(
'http://xmlns.com/foaf/0.1/knows' => array(
'_:bill',
'_:bob',
'_:mary',
),
),
'_:mary' => ...
);
echo
$index['_:john']['http://xmlns.com/foaf/0.1/knows'][0];
ARC supports two index forms. The one above uses flat objects, which can be handy
for simplified access operations, but can lead to information loss (e.g. when the object
type is not clear, or when a datatype was present in the original triples). The second,
slightly extended index structure keeps the object details:
$index = array(
'_:john' => array(
'http://xmlns.com/foaf/0.1/knows' => array(
array('value' => '_:bill', 'type' => 'bnode'),
6. 6 Gabriel-Ştefan Munteanu, Nicu-Cosmin Ursache
array('value' => '_:bob', 'type' => 'bnode'),
...
),
),
);
echo
$index['_:john']['http://xmlns.com/foaf/0.1/knows'][0][
'value'];
4. SPARQL query support
RAP's SPARQL client allows you to execute SPARQL queries against remote
SPARQL endpoints using the SPARQL protocol. Query results are returned as array
of variable bindings, RAP MemModel or boolean, depending on the type of SPARQL
query.
RAPTOR supports SPARQL query only from RASCAL library, so if someone needs
SPARQL or RDQL query have to use Rasqal RDF Query Library.
ARC supports all SPARQL Query Language features, and now is working on a
number of pragmatic extensions such as aggregates (AVG / COUNT / MAX / MIN /
SUM) and write mechanisms.
5. Developer support
5.1. Documentation
Redland provides to developers an extensive documentation with many examples.
This documentation address both advanced programmers and especially junior
developers. Each of the higher level language APIs contains a mapping to the core C
API and also include extra documentation describing the native APIs along with
examples of use.
RAP’s documentation is based on tutorials, usage examples and implementation
notes. The API documentation covers all classes and methods but in a short way. The
implementation notes cover the database backend and the RDQL engine.
ARC provides a brief documentation for version 1 and a slightly more detailed
documentation for version 2.
7. Comparative study on the processing of RDF in PHP 7
5.2. IDE integration
RAP and ARC can be used very easy in all the PHP frameworks only by copying the
files of RAP and acces them from project code.
RAP usage exemple(how to parse an RDF file and print his content in a HTML file):
// Include RAP
define("RDFAPI_INCLUDE_DIR", "./../api/");
include(RDFAPI_INCLUDE_DIR . "RDFAPI.php");
// Filename of an RDF document
$base="example1.rdf";
// Create a new MemModel
$model = ModelFactory::getDefaultModel();
// Load and parse document
$model->load($base);
// Output model as HTML table
$model->writeAsHtmlTable();
ARC usage exemple(how to extract RDF triples from HTML file):
include_once(RDFAPI_INCLUDE_DIR . "ARC2.php");
$config = array('auto_extract' => 0);
$parser = ARC2::getSemHTMLParser();
$parser->parse('http://example.com/home.html');
$parser->extractRDF('rdfa');
$triples = $parser->getTriples();
$rdfxml = $parser->toRDFXML($triples);
RAPTOR because of his C based code can not be used in PHP development
enviroment without Redland RDF Language Bindings - PHP Interface, with ofers an
PHP interface for PHP programmers.
RAPTOR usage exemple(writed in PHP using Redland RDF Language Bindings for
parsing an rdf file content into a rdf model):
//Redland world open
$world=librdf_php_get_world();
//create new storage
$storage=librdf_new_storage($world,'hashes','dummy',"new=
yes,hash-type='memory'");
//create the model
$model=librdf_new_model($world,$storage,'');
8. 8 Gabriel-Ştefan Munteanu, Nicu-Cosmin Ursache
//create the parser
$parser=librdf_new_parser($world,'rdfxml','application/rd
f+xml',null);
//create a new uri for rdf file
$uri=librdf_new_uri($world,'file:../data/dc.rdf');
//parsing the content of the rdf file into a model
librdf_parser_parse_into_model($parser,$uri,$uri,$model);
//free memory
librdf_free_uri($uri);
librdf_free_parser($parser);
6. Performance
6.1. Processing speed and reliability
The RDF Interest Group and other members of the RDF community have identified
issues/ambiguities in the [RDFMS] Specification and the [RDF-SCHEMA] Candidate
Recommendation. These issues have been collected and categorized in the RDF Core
Working Group Issue Tracking document. The RDF Core Working Group uses this
issue list to guide its work. The scope is to create a comprehensive and complete test
suite for RDF should cover all of the rules in the Formal Grammar for RDF.
The file consists of a simple header [MANIFEST-HEAD], individual descriptions of
the test cases, and a closing footer [MANIFEST-TAIL].
The test cases are divided into the following categories:
Positive Parser Tests
These tests consist of one (or more) input documents in RDF/XML as is
revised in [RDF-SYNTAX]. The expected result is defined using the N-Triples
syntax. A parser is considered to pass the test if it produces a graph equivalent to the
graph described by the N-triples output document, according to the definition of graph
equivalence given in [RDF-CONCEPTS].
Negative Parser Tests
These tests consist of one input document. The document is not legal
RDF/XML. A parser is considered to pass the test if it correctly holds the input
document to be in error.
9. Comparative study on the processing of RDF in PHP 9
RAP ARC Raptor
Positive Parser Test (128 Approved)Test
Percent Passing (of 128 tests) 96% 98% 100%
Negative Parser Test (41 Approved)Test
Percent Passing (of 41 tests) 97% 98% 100%
Tests With 3 Passes (1 Approved Negative Parser
Test)Test
Percent Passing (of 1 tests) 0% 0% 100%
Tests With 4 Passes (11 Approved Positive Parser
Test)Test
Percent Passing (of 11 tests) 81% 90% 100%
Tests With 4 Passes (40 Approved Negative Parser
Test)Test
Percent Passing (of 40 tests) 100% 100% 100%
Tests With 5 Passes (33 Approved Positive Parser
Test)Test
Percent Passing (of 33 tests) 93% 95% 100%
Tests With 6 Passes (77 Approved Positive Parser
Test)Test
Percent Passing (of 77 tests) 100% 100% 100%
Tests With 7 Passes (7 Approved Positive Parser
Test)Test
Percent Passing (of 7 tests) 100% 100% 100%
Tests With 0 Fails (107 Approved Positive Parser
Test)Test
Percent Passing (of 107 tests) 100% 100% 100%
Tests With 0 Fails (6 Approved Negative Parser
Test)Test
Percent Passing (of 6 tests) 100% 100% 100%
Tests With 1 Fail (18 Approved Positive Parser Test)Test
Percent Passing (of 18 tests) 88% 100% 100%
Tests With 1 Fail (34 Approved Negative Parser
Test)Test
Percent Passing (of 34 tests) 100% 100% 100%
Tests With 2 Fails (3 Approved Positive Parser Test)Test
Percent Passing (of 3 tests) 33% 66% 100%
Tests With 2 Fails (1 Approved Negative Parser
Test)Test
Percent Passing (of 1 tests) 0% 0% 100%
Table 1. Core tests
Considering the test results Raptor RDF Parser tends to be more reliable than RAP.
A small test for parsing and reserializing appr. 500 statements 100 times with Raptor,
ARC, and RAP:
10. 10 Gabriel-Ştefan Munteanu, Nicu-Cosmin Ursache
• Raptor: 6 seconds
• RAP: 41 seconds
• ARC: 18 seconds
It turns out Raptor is about 3 times as fast as ARC, which is about twice as fast as
RAP.
A more realistic benchmark, doing only the parsing, no serialising:
• Raptor: 2.7 seconds
• RAP: 21 seconds
• ARC: 14 seconds
6.2. Query efficiency
To evaluate a SPARQL query language implementation the RDF Data Access
Working Group (DAWG) uses a test-driven process was made an easy-to-use suite of
test cases that SPARQL query language implementors can use to evaluate and report
on their implementation.
The test manifest files define three vocabularies to express tests:
1. manifest vocabulary
2. query-evaluation test vocabulary
3. DAWG test approval vocabulary
RDF
Raptor/
API for ARC
Rascal
PHP
ASK query form 0 0.41 0.53
Basic graph pattern matching. Triple pattern constructs.
Blank node scoping 0.41 0.64 0.67
Compliance with SPARQL Grammar 0.73 0.99 1
CONSTRUCT query form 0 1 1
Core bits of SPARQL. Prefixed names, variables, blank
nodes, graph terms 0.41 0.69 0.68
FILTER clauses and expressions 0.37 0.57 0.55
OPTIONAL pattern matching 0.28 0.36 0.8
RDF datasets. Default and named graphs. GRAPH
keyword 0 0.45 0.14
SELECT query form 0.41 0.69 0.71
Sorting (ORDER BY) and slicing (LIMIT, OFFSET) 0.5 1 0.9
UNION pattern matching 0.25 0 0.46
11. Comparative study on the processing of RDF in PHP 11
Table 2. SPARQL Implementation
The interface between the programming language (such as PHP in our case) and the
database query language (SPARQL) is an application programming interface (API). A
few PHP-based open-source RDF APIs are available, and RAP (RDF API for PHP) is
one of the most mature one amongst them. One of the limitations of RAP was its
SPARQL engine. It is built to work on any RDF model that can be loaded into
memory. Using SPARQL to query a database required to load the complete database
into memory and execute the SPARQL query on it. While this works well with some
dozens of RDF triples, it can not be used for databases with millions of triple data -
main memory is one limitation, execution time another one (code implemented in a
scripting language such as PHP is slower than pure C implementations of the same
code) and here RAPTOR wins because of speed of RASCAL.
7. License
Raptor RDF Library is licensed under the following licenses as alternatives, if one
license is selected, that one alone applies.
1. The GNU Lesser General Public License (LGPL) Version 2.1
See http://www.gnu.org/copyleft/lesser.html or COPYING.LIB for the full
license text.
Copyright (C) 2000-2005 University of Bristol. All Rights Reserved.
2. The Apache License V2.0
See LICENSE-2.0.txt for the full license text.
Copyright (C) 2000-2005 University of Bristol.
RAP can be used under the terms of the GNU LESSER GENERAL PUBLIC
LICENSE (LGPL).
ARC is available under the W3C Software License and, since 2009, also under the
GPL (version 2 and 3).
8. Conclusions
A common complaint about the RDF/XML syntax in the XML-literate communities
is the lack of a simple PHP parser. While Raptor does the job perfectly, it almost
demands root access to install, and doesn’t run on the Windows platform without
cygwin.
ARC tends to be a bit faster and easier to use than RAP but it lacks some features.
The best alternative for PHP is RAP, but that is often claimed to be too slow or there
are problems understanding and using the API.