1. Semantic
Semantic CMS Community Data
Access
Lecturer
Organization
Date of presentation
Co-funded by the
1 Copyright IKS Consortium
European Union
2. Page:
Part I: Foundations
(1) Introduction of Content Foundations of Semantic
(2)
Management Web Technologies
Part II: Semantic Content Part III: Methodologies
Management
Knowledge Interaction Requirements Engineering
(3) (7)
and Presentation for Semantic CMS
(4) Knowledge Representation
and Reasoning
(8)
Designing
Semantic CMS
Semantifying
(5) Semantic Lifting (9) your CMS
Storing and Accessing Designing Interactive
(6) Semantic Data
(10) Ubiquitous IS
www.iks-project.eu Copyright IKS Consortium
3. Page: 3
What is this Lecture about?
We have learned ... Part II: Semantic Content
... which languages can be used Management
to model knowledge. Knowledge Interaction
(3)
... how to extract knowledge and Presentation
from content in a automatic way
(semantic lifting). (4) Knowledge Representation
and Reasoning
We need a way ... (5) Semantic Lifting
... to store the extracted
Storing and Accessing
knowledge technically in an (6) Semantic Data
accessible way.
www.iks-project.eu Copyright IKS Consortium
4. Page: 4
Outline
Semantic Data
Semantic Web
RDF
Semantic Data Storage
Triple Stores
Semantic Data Access
SPARQL
RQL
API Calls
www.iks-project.eu Copyright IKS Consortium
5. Page: 5
Semantic Data
Stands for machine understandable information
Allows computers to figure out the data without user
interference
Allows computers act intelligently without programming
for each task
www.iks-project.eu Copyright IKS Consortium
6. Page: 6
Semantic Data
Provides infrastructure to get practical results
Applications find out subsequent information based on the
previous relations. (e.g. Eiffel Tower -> Paris -> France)
Allows reasoning capabilities
Providing extraction of related information which is not
directly linked
www.iks-project.eu Copyright IKS Consortium
7. Page: 7
Semantic Web
A classical generic description:
“Web of data”
Extends the World Wide Web
By encouraging,
Common language for representing data
Transformable to/from disparate sources such as relational
databases, XML, etc (RDF)
Common reusable data model to represent data from different
domains in common terms (RDFS, OWL, etc)
Rules to enable applications reason over the information
(SWRL)
www.iks-project.eu Copyright IKS Consortium
8. Page: 8
Semantic Web Layer Cake
Semantic Web Layer Cake, Image source: http://www.w3.org/2007/03/layerCake.svg
www.iks-project.eu Copyright IKS Consortium
9. Page: 9
Semantic Web
So many organizations publishing their data in different
domains
Media
Geographic
Government
…
Whole set contains approximately 30 billion triples
One of the largest collections is DBPEDIA
Semantified version of Wikipedia
Example:
Obtain cities of China that have population over 20 million
Needs efficient storage and query for semantic data
www.iks-project.eu Copyright IKS Consortium
10. Page: 10
Representation of Semantic
Data
RDF
The common data format
An abstract model with several serialization formats
Consists of statement referred as triples having the form
(subject, predicate, object) where,
Subject: any resource identifier
Predicate: a resource identifier of any property
Object: either a resource identifier or a literal value
www.iks-project.eu Copyright IKS Consortium
11. Page: 11
Storing Semantic Data
Need for specialized designs for triple collections
Two modalities:
Relational databases
Triple stores
Mostly used for storage
Lots of implementations
They can also be RDB based.
www.iks-project.eu Copyright IKS Consortium
12. Page: 12
Triple Store
A purpose-built database for the storage and retrieval of
RDF data.
Optimized place to add, remove and query for triples.
Each triple in the TripleStore complies with the form
(subject, predicate, object)
www.iks-project.eu Copyright IKS Consortium
13. Page: 13
Considering XML Databases
XML databases are existing storage systems for semi-
structured data
Idea: Transform RDF to XML and store it in XML databases
Yet, XML data model is not exactly same with semantic data
XML data model is a tree-like structure
RDF data is represented through a graph without an hierarchy
www.iks-project.eu Copyright IKS Consortium
14. Page: 14
Considering XML Databases
XML Databases are not suitable for storage and querying
RDF
Only simple manipulations can be handled through XML query
languages
RDF Schema processing and inference is not possible
Standard RDF/XML mapping is unsuitable
www.iks-project.eu Copyright IKS Consortium
15. Page: 15
Monolithic approach for DB
Based Triple Stores
Generic representation for all RDF schemas
Only two tables are used
Resources table
Triples table
www.iks-project.eu Copyright IKS Consortium
16. Page: 16
Monolithic approach for DB
Based Triple Stores
predid subid objid objvalue id uri
6 2 1 1 http://www.iks.og/topics.rdfs#Hotel
5 3 7 2 http://www.iks.og/topics.rdfs#HotelDirections
5 1 8 3 http://www.oclc.org/dublincore.rdfs#title
5 9 2 4 http://www.iks.og/schema.rdf#Ext.Resource
5 http://www.w3.org/1999/02/22-rdf-syntax-ns#type
3 9 Sunscal
e 6 http://www.w3.org/2000/01/rdf-schema#subClassOf
7 http://www.w3.org/1999/02/22-rdf-syntax-
ns#Property
8 http://www.w3.org/2000/01/rdf-schema#Class
9 rl
www.iks-project.eu Copyright IKS Consortium
17. Page: 17
Triples Stores
Can be categorized into 3 category:
In memory triple stores
Used for certain operations like benchmarking, caching, etc
Native triple stores
Provides their own implementations (Virtuoso, Mulgara,
AllegroGraph, …)
Non memory non native triple stores
Are built on third party databases (Jena SDB, Kaon, …)
www.iks-project.eu Copyright IKS Consortium
18. Page: 18
Functionalities provided by
Triple Stores
RDBMS-support
General RDF model access
Query language support in the store such as RQL,
SPARQL
Some stores provide:
Provenance - tracking of who-said-what
APIs for accessing triple store over network
Very few stores provide:
Full text search
Inference and rule languages
www.iks-project.eu Copyright IKS Consortium
19. Page: 19
Example Triple Store implementations
RDF Suite
Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis,
Dimitris Plexousakis, Karsten Tolle. The ICS-FORTH RDFSuite:
Managing Voluminous RDF Description Bases , SemWeb, 2001
Based on an ORDBMS model
Sesame
http://www.openrdf.org/
Relational databases (mysql, postgres, oracle)
Jena
http://www.hpl.hp.com/semweb/jena2.htm
Relational databases (mysql , postgres, oracle)
Virtuoso
http://virtuoso.openlinksw.com/
Native RDF Quad Storage (Physical Quads)
www.iks-project.eu Copyright IKS Consortium
21. Page: 21
How triples are stored and
accessed in RDF Suite
Separate tables are created to store resources
Properties, subClasses, subProperties and instances
Indiceson attributes like URI, source and target
Querying is possible through RQL
www.iks-project.eu Copyright IKS Consortium
22. Page: 22
How triples are stored and
accessed in RDF Suite
[Figure from *]
www.iks-project.eu Copyright IKS Consortium
*Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis, Dimitris Plexousakis, Karsten Tolle. The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases , SemWeb, 2001
23. Page: 23
Sesame Architecture
DBMS-independent API for
accessing triple
repositories
SAIL API
A set of Java interfaces
between other modules and
repository
Abstract from the actual
storage mechanism
Query Module
RQL support
Different ways to
communicate with clients
Through Protocol handlers
www.iks-project.eu Copyright IKS Consortium
*Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First International
Semantic Web Conference, 2002
24. Page: 24
SAIL API over PostgreSQL
PostgreSQL
Object-relational
DBMS
Support sub-table
relations between its
tables for providing
RDF Schema class
and property
subsumption
Individuals are
represented under
separate tables
created for resources
Difficult to add table
www.iks-project.eu Copyright IKS Consortium
*Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First International
Semantic Web Conference, 2002
25. Page: 25
SAIL API over MySQL
MySQL
The database
schema does
not change
when the
RDFS changes
Has advantage
where RDFS is
unstable
www.iks-project.eu Copyright IKS Consortium
*Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First International
Semantic Web Conference, 2002
27. Page: 27
Jena2 Architecture
www.iks-project.eu Copyright IKS Consortium
*Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds: Efficient RDF Storage and Retrieval in Jena2, Proceedings of SWDB'03, The first International Workshop on
Semantic Web and Databases
30. Page: 30
Property Tables
Triple Store Only Person Property Table
Subject Property Object
ID name age gender
person1 name Alice
p1 Alice 32 -
person1 age 32
p2 Bob 35 male
person1 twinOf person2
person1 faxPhone x1234 Triple Store
person1 adminPh x5678 Subject Property Object
person2 name Bob person1 twinOf person2
person2 age 35 person1 faxPhone x1234
person2 adopteeOf person6 person1 adminPh x5678
person2 friendOf person8 person2 adopteeOf person6
person2 gender male person2 friendOf person8
www.iks-project.eu Copyright IKS Consortium
*Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds: Efficient RDF Storage and Retrieval in Jena2, Proceedings of SWDB'03, The first International Workshop on
Semantic Web and Databases
31. Page: 31
Jena Persistence Options
SDB
Scalable storage and query for RDF
Specifically designed for SPARQL support
Supports: MySQL, PostgreSQL, Oracle 11g, Microsoft
SQL server and IBM DB2
Scales to graphs of 100 million triples
www.iks-project.eu Copyright IKS Consortium
32. Page: 32
Jena Persistence Options
TDB
Provides for large scale storage and query of RDF
datasets using a pure Java engine
Supports SPARQL
A non-transactional, faster database solution for use by a
single system
It scales well beyond SDB and is simpler to setup
www.iks-project.eu Copyright IKS Consortium
33. Page: 33
Virtuoso
General purpose RDBMS with extensive RDF
adaptations
RDF data is stored as RDF quads, i.e. it supports RDF
with named graphs
i.e. graph, subject, predicate, object tuples
The columns are G for graph, P for predicate, S for subject
and O for object
www.iks-project.eu Copyright IKS Consortium
34. Page: 34
Querying Semantic Data
Semantic data can be queried from triple stores by
Various query languages
SPARQL
Different endpoints provided
RQL
RDQL
SeRQL
…
API Calls
Through proprietary APIs of different projects
Linked Data
www.iks-project.eu Copyright IKS Consortium
35. Page: 35
SPARQL
Is an RDF query language
Standardized by W3C consortium
Similar concept of SQL for databases
Syntactically
resembles to SQL
RDF Graphs instead of databases
www.iks-project.eu Copyright IKS Consortium
36. Page: 36
SPARQL Endpoints
Provides functionality to query the knowledge base via
the SPARQL language
Accepts queries and returns results through HTTP
protocol
Query results can be in different formats such as
RDF
XML
HTML
JSON
CSV
www.iks-project.eu Copyright IKS Consortium
37. Page: 37
Semantic Data Access With API
Calls
Open source projects provides APIs to manipulate RDF
data
Jena
Apache Clerezza
Sesame
JRDF
www.iks-project.eu Copyright IKS Consortium
38. Page: 38
Jena
Jenaprovides a rich API to manipulate the RDF stored in
the underlying triple store.
Model to represent graphs
CRUD methods for triples
Querying methods for existing resources
See the next slide for the code snippet…
www.iks-project.eu Copyright IKS Consortium
39. Page: 39
Jena Code Snippet
String personURI = "http://somewhere/JohnSmith";
String givenName = "John";
String familyName = "Smith";
String fullName = givenName + " " + familyName;
// create an empty Model which represents an RDF graph
Model model = ModelFactory.createDefaultModel();
// create the resource which will produce the triples in the next slide
Resource johnSmith
= model.createResource(personURI)
.addProperty(VCARD.FN, fullName)
.addProperty(VCARD.N,
model.createResource()
.addProperty(VCARD.Given, givenName)
.addProperty(VCARD.Family, familyName));
www.iks-project.eu Copyright IKS Consortium
40. Page: 40
Jena
Created triples with the code snippet in previous slide:
(<http://somewhere/JohnSmith>, VCARD.FN, “John
Smith”)
(<http://somewhere/JohnSmith>, VCARD.FN, _)
(_, VCARD.Given, “John”)
(_, VCARD.Family, “Smith”)
• Note that _ symbol represents a blank node
www.iks-project.eu Copyright IKS Consortium
41. Page: 41
Apache Clerezza
Provides an API regardless from the different triples
stores it supports
Its API provides a model to represent RDF graphs and
manipulate those graphs
Also provides an SPARQL endpoint to query the stored
knowledge
www.iks-project.eu Copyright IKS Consortium
42. Page: 42
Apache Clerezza Code Snippet
Simple code snippet adding two triples to the graph:
String base = “http://www.example.org#”;
MGraph g = new SimpleMGraph();
g.add( new TripleImpl(
new UriRef(base + “JohnSmith”),
new UriRef(rdf:Type)
new UriRef(foaf:Person)));
g.add( new TripleImpl(
new UriRef(base + “JohnSmith”),
new UriRef(VCARD:FN)
LiteralFactory.getInstance().createTypedLiteral(“John”)));
www.iks-project.eu Copyright IKS Consortium
43. Page: 43
Linked Data
Interrelated datasets on the Web so that computers can
explore them
Has a standard format to be accessed and managed
Provides integration and reasoning on a huge amount
of data on the Web
www.iks-project.eu Copyright IKS Consortium
44. Page: 44
Linked Data
Fourfamous principles of linked data represented by
Tim Berners-Lee
Use URIs as names of things
Use HTTP URIs to provide dereferencable data to people
When an URI is dereferenced provide useful information in
standard format (RDF, SPARQL)
Provide links to other URIs to make possible discovery of
related data
www.iks-project.eu Copyright IKS Consortium
46. Page: 46
Linking Open Data Project
Isan W3C SWEO Project
Aims to make data freely to everyone
Aims to publish open data sets as RDF and set
semantic relationships between them
Serves information in a machine readable format
Enriches content
Reduces duplication
Linked datasets increasing rapidly
A large number of datasets are linked already
www.iks-project.eu Copyright IKS Consortium
50. Page: 50
Access Data In The Cloud
Follow the RDF links representing the “things”
SPARQL Endpoints
Ready to use software to discover linked data (See the
next slide)
www.iks-project.eu Copyright IKS Consortium
51. Page: 51
Linked Data Applications
Lots of application on top of the linked data
Tabulator
Marbles
Openlink RDF Browser
…
Just google
RDF Crawlers
RDF Browsers
Also see the following link containing a number of linked data
applications:
http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/
LinkingOpenData/Applications
www.iks-project.eu Copyright IKS Consortium
52. Page: 52
Available SPARQL Endpoints
http://dbpedia.org/sparql
http://www4.wiwiss.fu-berlin.de/dblp/
Tosee possible SPARQL endpoints providing a certain
URI see
http://void.rkbexplorer.com/endpoint-search/
See also a list of alive SPARQL endpoints
http://www.w3.org/wiki/SparqlEndpoints
www.iks-project.eu Copyright IKS Consortium
53. Page: 53
References
http://www.w3.org/TR/rdf-sparql-query
http://jena.sourceforge.net/tutorial/RDF_API/index.html
http://www.slideshare.net/ldodds/sparql-tutorial
http://www.slideshare.net/shamod/a-hands-on-overview-of-the-semantic-
web?src=related_normal&rel=1702851
http://www.cambridgesemantics.com/2008/09/sparql-by-example
http://linkeddata-specs.info/
http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
http://www.bioontology.org/wiki/images/6/6a/Triple_Stores.pdf
Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis, Dimitris Plexousakis, Karsten Tolle. The
ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases , SemWeb, 2001
Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for
Storing and Querying RDF and RDF Schema, Proceedings of the First International, Semantic Web
Conference, 2002
Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds: Efficient RDF Storage and Retrieval in
Jena2, Proceedings of SWDB'03, The first International Workshop on Semantic Web and Databases
http://jena.sourceforge.net/DB/index.html
http://virtuoso.openlinksw.com/
www.iks-project.eu Copyright IKS Consortium
Notes de l'éditeur
Web of data refers to interconnected structured datasets distributed all over the world. It enables machines to traverse the links between these datasets in a noiseless way. The noise referred here is resulted from containing metadata and actual data in the web sites.
The figure illustrates different layers of semantic web stack. Content of this lecture will be covering querying, data interchange, syntax and identifiers layers.The overall figure shows the standardized technologies to form Semantic Web.Identifiers are used to identify semantic web resources. URIs are used to identify resources in a dereferencable way. In the syntax layer, semantic web resources are represented in different formants e.g. XML. In the data interchange layer, RDF is the language that is used to represent semantic web resources. Different formats for RDF is available e.g. RDF+XML, Turtle, etc. Querying layer provides methods to obtain semantic web resources. Sparql is the most common query language.
An RDF triple contains three components: the subject, which is an RDF URI reference or a blank node the predicate, which is an RDF URI reference the object, which is an RDF URI reference, a literal or a blank node An RDF triple is conventionally written in the order subject, predicate, object.The predicate is also known as the property of the triple.From wikipedia:The subject denotes the resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object. For example, one way to represent the notion "The sky has the color blue" in RDF is as the triple: a subject denoting "the sky", a predicate denoting "has the color", and an object denoting "blue".
An XML model can be used to store triple-like data by rewriting the triples into simple 3-part XML element structures and then using existing XML query systems. However, XML data model is a tree-like structure with elements and attributes in different facets on the other hand RDF data forms a directed-cyclic graph which does not have a proper hierarchical structure.
Storing and querying semantic data through XML databases and Query Languages would not work, since:Only simple manipulations can be handled through XML query languagesRDF Schema processing and inference is not possibleStandard RDF/XML mapping is unsuitable since multipleXML serializations are possible for the same RDF graph, making retrieval complex.
In the monolithic approach there are two tables storing the data: Triples table and resources table.Resources table stores only the URIs and identifiers associated with them. In the triples table, one each reference for subject and for predicate is stored. If the object value is also a URI, it is also represented with a reference. These references are used to fetch corresponding URI from resources table. If the object value is not a URI i.e it is a literal, its value directly stored in the triples table.However, collecting all data within two tables is not scalable and does not allow complex operations e.g reasoning, querying on it.
Overall architecture of RDFSuite. It separates logical and physical data by allowing queries through a high level query language(RQL) over the stored semantic data. For storage, RDFSuite uses an ORDBMS. Resources are loaded to the system by exploiting the available RDF schema knowledge. Database representations can be customized according to employed schemas.
-A non-monolithic approach is used. This approach states separation of tables to store classes.-Indices are constructed on the attributes such as URI, source andtarget of the created tables in order to speed up joins andthe selection of spesific tuples of the tables
An example database structure that is formed through RDF schema. The core schema is represented by the four schemas namely, Class, Property, SubClass and SubPropertytables. This approach is more flexible than the monolithic approach in terms of ability of customizing the physical representation of data in the underlying database.
Main prominent feature of Sesame is to offer an Application Programming Interface on top of the actual data storage. This makes possible to implement the interface on top of different repositories. Other components are clients of SAIL API.
Difficult to add table in PostrgresqlWhen adding a new subClassOf relation between two existing classes, the complete class hierarchy starting from the subclass needs to broken down and rebuilt again because subtable relations can not be added to an existing table;the subtable relations have to be specified when a table is createdOnce created, the subtable relations are fixed.
Jena provides a simple minimalist view of the RDF graph allowing exposing of data as triples. Users interact with the abstract Model. Model interface delegates high level operations to the low level operations on triples stored in an RDF graph. Jena2 storage provides 3 graph operations namely, add, delete and find.
Persistence layer presents a Graph interface to the higher levels of Jena as already said. Each logical graph is implemented using an ordered list of specialized graphs.An operation on the entire logical graph, such as add , delete or find, is processed by invoking add, delete, find on each specialized graph.
Jena 2 uses denormalized schemas. Because in normalized graphs every find operation required multiple joinsbetween the Resources table and the Triples table. In denormalized schemas URIs and simple literal values are stored directly in the statement tables. They are exemplified in the next slide.There are also multiple statements tables. Because single statement table approach is not scalable for large data sets and cannot benefit from the locality among subjects and predicates. Jena2 uses Property Tables. Those tables store patterns of RDF statements. They are database tables independent from the actual triple store framework. Statement and properties are stored in triple store or property table, but not in both.
Let’s compare the triple store and application specific schema by an example. Suppose we want to store information about people, each of them has some properties such as name, age, and so on. The triple store approach needs to store 10 record. For application specific schema, if we know that most people have name, age and gender, we will group these 3 properties into one table, called property table. For those multi-value properties, we still store them in triple store, these way we reduce the number of records to be stored from 10 to 7. Also, if users always query people’s name by their age. Using property table, once the age is qualified, the name value can be retrieved immediately. But in triple store approach, it needs to first get the subject with certain value of age property, then use the subject to look for name value again, which is less efficient.
Provides for scalable storage and query of RDF datasets using conventional SQL databases for use in standalone applications, J2EE and other application frameworks.
All quads are in one table, which may have different indexing depending on the expected query loadtriples should be locatable given the S or a value of Otwo covering indices, G, S, P, O and O, G, P, S.Any Triple Store that supports Named Graph functionality is more than likely a Quad Store. Many Triple Stores are in fact Quad Storesdue to the need to maintain RDF Data provenance within the data
SPARQL is the defacto query language which is used to express queries over RDF data sources. It allows querying RDF graph patterns together with their conjunctions and disjunctions. Other languages are more proprietary and used in narrow scopes.There are several open source projects that provides knowledge management functionality such as Apache Clerezza and Jena. They provided APIs to users for storing and accessing the semantic data.As organizations publish their data in RDF format, there occurred opportunities to interlink the related contents. As a result, once a user obtain a resource from the linked data cloud, s/he can traverse related data through the links.
Different organizations provide querying services over their RDF data through SPARQL endpoints. SPARQL endpoints are machine friendly interfaces towards underlying knowledge bases. See http://www.w3.org/wiki/SparqlEndpoints for several SPARQL endpoints.
This figure represents the 4 design principles of linked data in a stack like architecture. URIs are used as names of the resources on the web. HTTP URIs are used so that others can access the actual data represented by the URI. RDF is the actual representation of the resources represented by URIs and lastly SPARQL is used to obtain desired information over the RDF data.
SWEO: Semantic Web Education and Outreach… This was an interest group within W3C. SWEO Interest Group had been established to develop strategies and materials to increase awareness among the Web community of the need and benefit for the Semantic Web, and educate the Web community regarding related solutions and technologies.