Lecture semantic dataaccess_presentation

Semantic
Semantic CMS Community Data
Access

Lecturer
Organization

Date of presentation

Co-funded by the
1 Copyright IKS Consortium
European Union

Page:

Part I: Foundations

(1) Introduction of Content Foundations of Semantic
(2)
Management Web Technologies

Part II: Semantic Content Part III: Methodologies
Management

Knowledge Interaction Requirements Engineering
(3) (7)
and Presentation for Semantic CMS

(4) Knowledge Representation
and Reasoning
(8)
Designing
Semantic CMS

Semantifying
(5) Semantic Lifting (9) your CMS

Storing and Accessing Designing Interactive
(6) Semantic Data
(10) Ubiquitous IS

www.iks-project.eu Copyright IKS Consortium

Page: 3

What is this Lecture about?
 We have learned ... Part II: Semantic Content
 ... which languages can be used Management
to model knowledge. Knowledge Interaction
(3)
 ... how to extract knowledge and Presentation
from content in a automatic way
(semantic lifting). (4) Knowledge Representation
and Reasoning

 We need a way ... (5) Semantic Lifting

 ... to store the extracted
Storing and Accessing
knowledge technically in an (6) Semantic Data
accessible way.

Page: 4

Outline
 Semantic Data
 Semantic Web
 RDF
 Semantic Data Storage
 Triple Stores
 Semantic Data Access
 SPARQL
 RQL
 API Calls


Page: 5

Semantic Data
 Stands for machine understandable information
 Allows computers to figure out the data without user
interference
 Allows computers act intelligently without programming
for each task


Page: 6

Semantic Data
 Provides infrastructure to get practical results
 Applications find out subsequent information based on the
previous relations. (e.g. Eiffel Tower -> Paris -> France)
 Allows reasoning capabilities
 Providing extraction of related information which is not
directly linked


Page: 7

Semantic Web
 A classical generic description:
 “Web of data”
 Extends the World Wide Web
 By encouraging,
 Common language for representing data
 Transformable to/from disparate sources such as relational
databases, XML, etc (RDF)
 Common reusable data model to represent data from different
domains in common terms (RDFS, OWL, etc)
 Rules to enable applications reason over the information
(SWRL)


Page: 8

Semantic Web Layer Cake

Semantic Web Layer Cake, Image source: http://www.w3.org/2007/03/layerCake.svg

Page: 9

Semantic Web
 So many organizations publishing their data in different
domains
 Media
 Geographic
 Government
 …
 Whole set contains approximately 30 billion triples
 One of the largest collections is DBPEDIA
 Semantified version of Wikipedia
 Example:
 Obtain cities of China that have population over 20 million
 Needs efficient storage and query for semantic data


Page: 10

Representation of Semantic
Data
 RDF
 The common data format
 An abstract model with several serialization formats
 Consists of statement referred as triples having the form
(subject, predicate, object) where,
 Subject: any resource identifier
 Predicate: a resource identifier of any property
 Object: either a resource identifier or a literal value


Page: 11

Storing Semantic Data
 Need for specialized designs for triple collections
 Two modalities:
 Relational databases
 Triple stores
 Mostly used for storage
 Lots of implementations
 They can also be RDB based.


Page: 12

Triple Store
 A purpose-built database for the storage and retrieval of
RDF data.
 Optimized place to add, remove and query for triples.
Each triple in the TripleStore complies with the form
(subject, predicate, object)


Page: 13

Considering XML Databases
 XML databases are existing storage systems for semi-
structured data
 Idea: Transform RDF to XML and store it in XML databases
 Yet, XML data model is not exactly same with semantic data
 XML data model is a tree-like structure
 RDF data is represented through a graph without an hierarchy


Page: 14

Considering XML Databases
 XML Databases are not suitable for storage and querying
RDF
 Only simple manipulations can be handled through XML query
languages
 RDF Schema processing and inference is not possible
 Standard RDF/XML mapping is unsuitable


Page: 15

Monolithic approach for DB
Based Triple Stores
 Generic representation for all RDF schemas
 Only two tables are used
 Resources table
 Triples table


Page: 16

Monolithic approach for DB
Based Triple Stores

predid subid objid objvalue id uri

6 2 1 1 http://www.iks.og/topics.rdfs#Hotel
5 3 7 2 http://www.iks.og/topics.rdfs#HotelDirections

5 1 8 3 http://www.oclc.org/dublincore.rdfs#title

5 9 2 4 http://www.iks.og/schema.rdf#Ext.Resource
5 http://www.w3.org/1999/02/22-rdf-syntax-ns#type
3 9 Sunscal
e 6 http://www.w3.org/2000/01/rdf-schema#subClassOf

7 http://www.w3.org/1999/02/22-rdf-syntax-
ns#Property
8 http://www.w3.org/2000/01/rdf-schema#Class
9 rl


Page: 17

Triples Stores
 Can be categorized into 3 category:
 In memory triple stores
 Used for certain operations like benchmarking, caching, etc
 Native triple stores
 Provides their own implementations (Virtuoso, Mulgara,
AllegroGraph, …)
 Non memory non native triple stores
 Are built on third party databases (Jena SDB, Kaon, …)


Page: 18

Functionalities provided by
Triple Stores
 RDBMS-support
 General RDF model access
 Query language support in the store such as RQL,
SPARQL
 Some stores provide:
 Provenance - tracking of who-said-what
 APIs for accessing triple store over network
 Very few stores provide:
 Full text search
 Inference and rule languages


Page: 19

Example Triple Store implementations

 RDF Suite
 Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis,
Dimitris Plexousakis, Karsten Tolle. The ICS-FORTH RDFSuite:
Managing Voluminous RDF Description Bases , SemWeb, 2001
 Based on an ORDBMS model
 Sesame
 http://www.openrdf.org/
 Relational databases (mysql, postgres, oracle)
 Jena
 http://www.hpl.hp.com/semweb/jena2.htm
 Relational databases (mysql , postgres, oracle)
 Virtuoso
 http://virtuoso.openlinksw.com/
 Native RDF Quad Storage (Physical Quads)


Page: 20

RDFSuite (ICS-Forth)*

* IST-1999-13479 C-Web, IST-2000-26074 Mesmuses


Page: 21

How triples are stored and
accessed in RDF Suite
 Separate tables are created to store resources
 Properties, subClasses, subProperties and instances
 Indiceson attributes like URI, source and target
 Querying is possible through RQL


Page: 22

How triples are stored and
accessed in RDF Suite

[Figure from *]


*Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis, Dimitris Plexousakis, Karsten Tolle. The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases , SemWeb, 2001

Page: 23

Sesame Architecture
 DBMS-independent API for
accessing triple
repositories
 SAIL API
 A set of Java interfaces
between other modules and
repository
 Abstract from the actual
storage mechanism
 Query Module
 RQL support
 Different ways to
communicate with clients
 Through Protocol handlers
*Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First International
Semantic Web Conference, 2002

Page: 24

SAIL API over PostgreSQL
 PostgreSQL
 Object-relational
DBMS
 Support sub-table
relations between its
tables for providing
RDF Schema class
and property
subsumption
 Individuals are
represented under
separate tables
created for resources
 Difficult to add table


Page: 25

SAIL API over MySQL
 MySQL
 The database
schema does
not change
when the
RDFS changes
 Has advantage
where RDFS is
unstable


Page: 26

Jena2 Architecture


Page: 27

Jena2 Architecture

*Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds: Efficient RDF Storage and Retrieval in Jena2, Proceedings of SWDB'03, The first International Workshop on
Semantic Web and Databases

Page: 28

Jena2

 Jena2
 Denormalized schema
 Avoids unnecessary joins by merging URIs, literals in
statements table
 Multiple statement tables
 Better locality and caching
 Property Tables


Page: 29

Normalized vs Denormalized
Tables


Page: 30

Property Tables
Triple Store Only Person Property Table
Subject Property Object
ID name age gender
person1 name Alice
p1 Alice 32 -
person1 age 32
p2 Bob 35 male
person1 twinOf person2
person1 faxPhone x1234 Triple Store
person1 adminPh x5678 Subject Property Object
person2 name Bob person1 twinOf person2
person2 age 35 person1 faxPhone x1234
person2 adopteeOf person6 person1 adminPh x5678
person2 friendOf person8 person2 adopteeOf person6
person2 gender male person2 friendOf person8
*Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds: Efficient RDF Storage and Retrieval in Jena2, Proceedings of SWDB'03, The first International Workshop on
Semantic Web and Databases

Page: 31

Jena Persistence Options
 SDB
 Scalable storage and query for RDF
 Specifically designed for SPARQL support
 Supports: MySQL, PostgreSQL, Oracle 11g, Microsoft
SQL server and IBM DB2
 Scales to graphs of 100 million triples


Page: 32

Jena Persistence Options
 TDB
 Provides for large scale storage and query of RDF
datasets using a pure Java engine
 Supports SPARQL
 A non-transactional, faster database solution for use by a
single system
 It scales well beyond SDB and is simpler to setup


Page: 33

Virtuoso
 General purpose RDBMS with extensive RDF
adaptations
 RDF data is stored as RDF quads, i.e. it supports RDF
with named graphs
 i.e. graph, subject, predicate, object tuples
 The columns are G for graph, P for predicate, S for subject
and O for object


Page: 34

Querying Semantic Data
 Semantic data can be queried from triple stores by
 Various query languages
 SPARQL
 Different endpoints provided
 RQL
 RDQL
 SeRQL
…

 API Calls
 Through proprietary APIs of different projects
 Linked Data

Page: 35

SPARQL
 Is an RDF query language
 Standardized by W3C consortium
 Similar concept of SQL for databases
 Syntactically
resembles to SQL
 RDF Graphs instead of databases


Page: 36

SPARQL Endpoints
 Provides functionality to query the knowledge base via
the SPARQL language
 Accepts queries and returns results through HTTP
protocol
 Query results can be in different formats such as
 RDF
 XML
 HTML
 JSON
 CSV


Page: 37

Semantic Data Access With API
Calls
 Open source projects provides APIs to manipulate RDF
data
 Jena
 Apache Clerezza
 Sesame
 JRDF


Page: 38

Jena
 Jenaprovides a rich API to manipulate the RDF stored in
the underlying triple store.
 Model to represent graphs
 CRUD methods for triples
 Querying methods for existing resources
 See the next slide for the code snippet…


Page: 39

Jena Code Snippet
String personURI = "http://somewhere/JohnSmith";
String givenName = "John";
String familyName = "Smith";
String fullName = givenName + " " + familyName;

// create an empty Model which represents an RDF graph
Model model = ModelFactory.createDefaultModel();

// create the resource which will produce the triples in the next slide
Resource johnSmith
= model.createResource(personURI)
.addProperty(VCARD.FN, fullName)
.addProperty(VCARD.N,
model.createResource()
.addProperty(VCARD.Given, givenName)
.addProperty(VCARD.Family, familyName));


Page: 40

Jena
 Created triples with the code snippet in previous slide:

(<http://somewhere/JohnSmith>, VCARD.FN, “John
Smith”)
(<http://somewhere/JohnSmith>, VCARD.FN, _)
(_, VCARD.Given, “John”)
(_, VCARD.Family, “Smith”)

• Note that _ symbol represents a blank node


Page: 41

Apache Clerezza
 Provides an API regardless from the different triples
stores it supports
 Its API provides a model to represent RDF graphs and
manipulate those graphs
 Also provides an SPARQL endpoint to query the stored
knowledge


Page: 42

Apache Clerezza Code Snippet

 Simple code snippet adding two triples to the graph:

String base = “http://www.example.org#”;
MGraph g = new SimpleMGraph();
g.add( new TripleImpl(
new UriRef(base + “JohnSmith”),
new UriRef(rdf:Type)
new UriRef(foaf:Person)));
g.add( new TripleImpl(
new UriRef(base + “JohnSmith”),
new UriRef(VCARD:FN)
LiteralFactory.getInstance().createTypedLiteral(“John”)));


Page: 43

Linked Data
 Interrelated datasets on the Web so that computers can
explore them
 Has a standard format to be accessed and managed
 Provides integration and reasoning on a huge amount
of data on the Web


Page: 44

Linked Data
 Fourfamous principles of linked data represented by
Tim Berners-Lee
 Use URIs as names of things
 Use HTTP URIs to provide dereferencable data to people
 When an URI is dereferenced provide useful information in
standard format (RDF, SPARQL)
 Provide links to other URIs to make possible discovery of
related data


Page: 45

Linked Data


Page: 46

Linking Open Data Project
 Isan W3C SWEO Project
 Aims to make data freely to everyone
 Aims to publish open data sets as RDF and set
semantic relationships between them
 Serves information in a machine readable format
 Enriches content
 Reduces duplication
 Linked datasets increasing rapidly
 A large number of datasets are linked already


Page: 47

Linked Datasets As of October
2008


Page: 48

Linked Datasets As of September
2010


Page: 49

2011

Page: 50

Access Data In The Cloud
 Follow the RDF links representing the “things”
 SPARQL Endpoints
 Ready to use software to discover linked data (See the
next slide)


Page: 51

Linked Data Applications
 Lots of application on top of the linked data
 Tabulator
 Marbles
 Openlink RDF Browser
 …
 Just google
 RDF Crawlers
 RDF Browsers
 Also see the following link containing a number of linked data
applications:
 http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/
LinkingOpenData/Applications


Page: 52

Available SPARQL Endpoints
 http://dbpedia.org/sparql
 http://www4.wiwiss.fu-berlin.de/dblp/
 Tosee possible SPARQL endpoints providing a certain
URI see
 http://void.rkbexplorer.com/endpoint-search/
 See also a list of alive SPARQL endpoints
 http://www.w3.org/wiki/SparqlEndpoints


Page: 53

References
 http://www.w3.org/TR/rdf-sparql-query
 http://jena.sourceforge.net/tutorial/RDF_API/index.html
 http://www.slideshare.net/ldodds/sparql-tutorial
 http://www.slideshare.net/shamod/a-hands-on-overview-of-the-semantic-
web?src=related_normal&rel=1702851
 http://www.cambridgesemantics.com/2008/09/sparql-by-example
 http://linkeddata-specs.info/
 http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
 http://www.bioontology.org/wiki/images/6/6a/Triple_Stores.pdf
 Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis, Dimitris Plexousakis, Karsten Tolle. The
ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases , SemWeb, 2001
 Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for
Storing and Querying RDF and RDF Schema, Proceedings of the First International, Semantic Web
Conference, 2002
 Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds: Efficient RDF Storage and Retrieval in
Jena2, Proceedings of SWDB'03, The first International Workshop on Semantic Web and Databases
 http://jena.sourceforge.net/DB/index.html
 http://virtuoso.openlinksw.com/


Lecture semantic dataaccess_presentation

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (9)

Similaire à Lecture semantic dataaccess_presentation

Similaire à Lecture semantic dataaccess_presentation (20)

Plus de IKS - Project

Plus de IKS - Project (7)

Lecture semantic dataaccess_presentation

Notes de l'éditeur