Currently a large number of Web sites are driven by Content Management Systems (CMS) which manage textual and multimedia content but also - inherently - carry valuable information about a site's structure and content model. Exposing this structured information to the Web of Data has so far required considerable expertise in RDF and OWL modelling and additional programming effort. In this paper we tackle one of the most popular CMS: Drupal. We enable site administrators to export their site content model and data to the Web of Data without requiring extensive knowledge on Semantic Web technologies. Our modules create RDFa annotations and - optionally - a SPARQL endpoint for any Drupal site out of the box. Likewise, we add the means to map the site data to existing ontologies on the Web with a search interface to find commonly used ontology terms. We also allow a Drupal site administrator to include existing RDF data from remote SPARQL endpoints on the Web in the site. When brought together, these features allow networked RDF Drupal sites that reuse and enrich Linked Data. We finally discuss the adoption of our modules and report on a use case in the biomedical field and the current status of its deployment.
Computer 10: Lesson 10 - Online Crimes and Hazards
Produce and Consume Linked Data with Drupal!
1. Digital Enterprise Research Institute www.deri.ie
Produce and Consume Linked Data
with Drupal!
Stéphane Corlosquet, Renaud Delbru, Tim Clark,
Axel Polleres and Stefan Decker
ISWC 2009
scorlosquet@gmail.com
DERI NUI Galway, MGH
October 27th, 2009
Chapter 1
Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
2. Loads of Data on the Web in CMS...
Digital Enterprise Research Institute www.deri.ie
2
3. Some Motivations...
Digital Enterprise Research Institute www.deri.ie
Status of the current web
Data contained in millions of documents
Disparate platforms and systems
Wide range of topics (personal blogs, news, etc.)
Various types of resources (text, pictures, video, etc.)
Note: Lots of Structured data in Content Management Systems
Problem
Not possible to reuse this data outside the CMS (except RSS)
Not available as unified machine readable format
3
4. So, here’s our idea of CMS:
Digital Enterprise Research Institute www.deri.ie
PROJECT BLOGS
DBLP
SPARQL
endpoint
SPARQL REMOTE DRUPAL SITE
endpoint
SELECT ?name ?title Tim
WHERE { .........
?person foaf:made ?pub.
?person rdfs:label ?name.
?pub dc:title ?title. SPARQL
FILTER regex(?title, "knowledge", "i") endpoint
}
Figure 3.5: Extended example in a typical Linked Data eco-system.
4
5. Approach
Digital Enterprise Research Institute www.deri.ie
Our Goal
Integrate "any" CMS site to the Web of Data
A challenging task
Little incentive for users to annotate their data manually
Site owners do not have the resources to convert their data to RDF
Per-siteschema: each site is different and its structure cannot be
predefined
Solutions
Expose the CMS site structure in a unified format AUTOMATICALLY!
Use Semantic Web standards (RDFa, SPARQL)
5
6. Approach
Digital Enterprise Research Institute www.deri.ie
Implementation in Drupal
Why? One of the most popular CMS out there
Modules to take the burden off the site users
What our modules allow:
1. Automatic site vocabulary generation
2. Mapping Content Models to existing ontologies
3. Data endpoint for SPARQL querying
4. Lazy loading of external data (data import)
6
7. Pre-Existing work
Digital Enterprise Research Institute www.deri.ie
“Semantic Content Management Systems”
Ontology-based CMS:
– Semantic community Web portals (2000)
– OntoWebber: Model-Driven Ontology-Based Web Site Management
(2001)
Our approach is reverse: from existing CMS structure to
ontologies
7
8. The Drupal CMS
Digital Enterprise Research Institute www.deri.ie
Drupal*
Easy to use
Large community
Popularon the Web
Hundreds of thousands of sites
Modular design
Drupal site workflow
Site administrator: set up the site and install modules they
like/need
Site editors: create the content of the site following the
schema defined by the site administrator
* http://drupal.org/
8
9. Drupal: Content Construction Kit
Digital Enterprise Research Institute www.deri.ie
Content Construction Kit (CCK) module
GUI for extending the internal schema of a Drupal site
Used on many Drupal sites
Can build new types of pages, known as content types
Can create fields for each content types. Fields can be of
various types: plain text fields, dates, email addresses, file
uploads, reference to other pages
9
10. Drupal: Content Construction Kit
Digital Enterprise Research Institute www.deri.ie
Demo use case: project blogs site*
Community site
PROJECT BLOGS
Various content:
– People DBLP
– Organizations
– Projects SPARQL
endpoint
– Blogs SPARQL REMOTE DRUPAL SITE
endpoint
SELECT ?name ?title Tim
WHERE { .........
?person foaf:made ?pub.
?person rdfs:label ?name.
?pub dc:title ?title. SPARQL
FILTER regex(?title, "knowledge", "i") endpoint
}
Figure 3.5: Extended example in a typical Linked Data eco-system.
one for bridging the DBLP SPARQL endpoint to the project blogs website, and a sec-
ond for bridging the Science Collaboration Framework website. When visiting Tim’s
profile page, the relevant publication information will be fetched from both DBLP and
* http://drupal.deri.ie/projectblogs/ SCF websites, and either new nodes will be created on the site or older ones will be
updated if necessary.
10 3.4 Neologism: Easy RDFS vocabulary publishing
Neologism11 is a web-based vocabulary editor and publishing platform designed to
12. Drupal: the Person contentConstruction KitThis form
The fields form for
Content type is displayed on Figure 2.11.
llows to easily reorder the fields by a “drag and drop” technique, add new fields,
Digital Enterprise Research Institute www.deri.ie
emove existing fields or access the configuration form for a field.
CCK User Interface
Figure 2.12: Defining constraints on the gender field in Drupal’s CCK.
12
13. Figures 2.9, 2.10, 2.11 and 2.12 show the typical look and feelKit
Drupal: Content Construction of a Drupal page and
administrative interface for the Person content type, without our extensions installed.
Digital Enterprise Research Institute www.deri.ie
This content type offers fields such as name, homepage, email, colleagues, blog url,
current project,User Interface
CCK past projects, publications, contributions.
Figure 2.9: User profile page built with Drupal’s CCK.
13
An example of node (page) of the type Person is depicted on Figure 2.9 where all
14. What do we add?
Digital Enterprise Research Institute www.deri.ie
1, 2
14
15. 1. Site Vocabulary
Digital Enterprise Research Institute www.deri.ie
Automatic site vocabulary in RDFS/OWL from CCK
Describes the content types and fields
Content type <=> RDF class
Field
<=> RDF property
RDFa output on site
http://siteurl/ns#
15
16. 1. Site Vocabulary
Digital Enterprise Research Institute www.deri.ie
Automatic site vocabulary in RDFS/OWL
Field constraints
Example with cardinalities:
– the name of a Person is required
– max. 5 projects per person
16
17. Search examples are shown in Figure 3.2. Details on improving the ran
2.search algorithm can be found in [45].
Mapping Content Models to existing ontologies
Digital Enterprise Research Institute www.deri.ie
3.2.3 Mapping process
Mapping Content Models to Existing Ontologies
The terms suggested by both of the import service and the ontology search
Import of any vocabulary published online
be mapped to each content type and their fields. For mapping content ty
choose among the classes of service
External ontology search
the imported ontologies and for fields, one
Local terms are subclasses/subproperties of public terms
among the properties. The local terms will be linked with rdfs:subCl
rdfs:subPropertyOf statements, e.g.
site:Person rdfs:subClassOf foaf:Person to the mapped
site vocabulary; wherever a mapping is definined, extra triples using the m
are exposed in the RDFa of the page.
Ensure “safe” vocabulary re-use:
– only subclassing/subproperty avoids “redefinition” properties. E.g., ass
Additionally, we allow inverse reuse of existing
administrator imports amight introduce inconsistencies a relation between C
– adding cardinalities vocabulary ex: that defines still, possible to
gions and goods user interface
avoid in the that this region/coutry produces via the property ex:prod
user interface also allows to relate fields to the inverse of imported proper
stance, the origin field could be related to ex:produces in such an inve
resulting in
17
site:origin rdfs:subPropertyOf
18. 2. Mapping Content Models to existing ontologies
Digital Enterprise Research Institute www.deri.ie
RDF mappings page
18
Figure 3.2: RDF mappings management through the Dru
19. 2. Mapping Content Models to existing ontologies
Digital Enterprise Research Institute www.deri.ie
RDF mappings page
agement through the Drupal interface: RDF class map-
19
20. What do we add?
Digital Enterprise Research Institute www.deri.ie
1, 2
3
20
21. 3. Data endpoint for complex querying
Digital Enterprise Research Institute www.deri.ie
Local RDF data exposed in a SPARQL endpoint
Enables interoperability across sites
Built on the PHP ARC2 library
AllRDF data indexed in the endpoint
Each page stored as graph and kept up to date
Figure 3.6: A list of SPARQL results (left) and an RDF SPARQL Proxy
21
22. 3. Data endpoint for complex querying
Digital Enterprise Research Institute www.deri.ie
Local RDF data exposed in a SPARQL endpoint
enable interoperability across sites
built on the PHP ARC2 library
allRDF data indexed in the endpoint
Each page stored as graph and kept up to date
22
23. What do we add?
Digital Enterprise Research Institute www.deri.ie
4
1, 2
3
23
24. 4. Lazy loading of external data
Digital Enterprise Research Institute www.deri.ie
Lazy loading (caching) of distant RDF resources
Enables interoperability across sites
Built on the PHP ARC2 library
CONSTRUCT query to map distant schema to local schema
A list of SPARQL results (left) and an RDF SPARQL Proxy profile form
24
25. 4. Lazy loading of external data
Digital Enterprise Research Institute www.deri.ie
Lazy loading of distant RDF resources
25
27. Science Collaboration Framework
Digital Enterprise Research Institute www.deri.ie
Web application toolkit based on Drupal
Enables online scientific collaboration
– publishing, annotating, sharing and discussing any content
– articles, papers, reviews, perspectives, interviews, news, biographies
– profile information on community members
Targets biomedecine communities, but generic in essence
Networked sites producing Linked Data
27
28. SCF collaborating sites
Digital Enterprise Research Institute www.deri.ie
Stembook (Stem Cell articles and reviews)
– http://www.stembook.org/
28
29. SCF collaborating sites
Digital Enterprise Research Institute www.deri.ie
Michael J Fox Foundation (Parkinson disease)
– http://www.pdonlineresearch.org/
29
31. Conclusion
Digital Enterprise Research Institute www.deri.ie
Structureof CMS sites contain valuable schema
information
Our suggested “workflow”:
site vocabulary from the local structure (RDF CCK)
enables out-of-the-box RDF export: expose your Drupal site
to the Web of Data without any additional effort from site
admin or content editors (RDF CCK)
mapping to existing RDF vocabularies improves integration in
the LOD cloud (evoc)
SPARQL endpoint
Lazy loading of RDF resources (RDF Proxy)
31
32. Conclusion
Digital Enterprise Research Institute www.deri.ie
Drupal 6 modules available for download
– http://drupal.org/project/rdfcck
– http://drupal.org/project/evoc
– http://drupal.org/project/sparql_ep
– http://drupal.org/project/rdfproxy
Online prototype
– http://drupal.deri.ie/projectblogs/
32
33. Good news from Drupal 7:
Digital Enterprise Research Institute www.deri.ie
RDF mapping feature committed to Drupal 7 core
RDFa output by default (blogs, forums, comments, etc.)
using FOAF, SIOC, DC, SKOS.
Download development snapshot
– http://ftp.drupal.org/files/projects/drupal-7.x-dev.tar.gz
Currently more than 200.000* sites on Drupal 6
waiting to make the switch to Drupal 7
waiting to massively increase the amount of RDF data
on the Web
Discussion
http://groups.drupal.org/semantic-web
* http://drupal.org/project/usage/drupal
33