2. GRDDL: The Acronym
Gleaning
Resource
Descriptions (from)
Dialects (of)
Language
Rather long and intimidating
3. GRDDL: By Deconstruction
Wordnet Definition of Glean:
◦ (gather, as of natural products)
◦ Synonyms: reap, harvest.
Resource Description Framework (RDF)
◦ Logical assertions
Dialects of Language
◦ XML document families (XHTML, for instance)
4. GRDDL: By Analogy
GRDDL can be thought of
as a protocol for sowing
semantics in web content
for later harvest.
5. The Why
Vast amount of latent semantics in markup
<span>Chimezie Ogbuji<span>
Web content today is primarily built for
human consumption
Text indexing will only get you so far for
document retrieval
If machines are meant to harvest RDF from
documents, reproducible protocols are
needed
6. The Why (Cont.)
Microformats, eRDF, and RDFa
Specific to a particular family of
documents
XHTML and HTML
If the goal is machine consumption, the
bar needs to be raised beyond XHTML
7. The Why (Cont.)
It seems easy to forget that XHTML is
indeed an XML dialect
You would think the (X) would make
that obvious
What was needed was a standard way to
harvest RDF that is applicable to all XML
dialects
8. The What
Faithful rendition
Transformations
GRDDL result
Source documents
GRDDL-aware Agents
9. Faithful Rendition
“By specifying a GRDDL transformation, the author of a document
states that the transformation will provide a faithful rendition in
RDF of information (or some portion of the information)
expressed through the XML dialect used in the source document.”
Licenses an author-certified interpretation of
an XML document
A powerful paradigm for messaging
See David Booths “RDF and SOA”
http://www.w3.org/2007/01/wos-papers/booth
10. GRDDL Transformations
Functions that take an XML document and
return an RDF graph
Transformations can be written in any
particular language
The “reference” transformation language is
XSLT
“[XSLT1] is the format most widely supported by GRDDL-
aware agents as of this writing […] is specifically designed to
express XML to XML transformations and has some good
safety characteristics”
11. Other Transformation Languages
“.. technically Javascript, C, or virtually any
other programming language may be used to
express transformations for GRDDL”
However, these transformations need to be
deterministic in order to ensure the result is
a faithful rendition
Hence, they must be functions
12. GRDDL Result
The result of applying the transformation is
an RDF serialization
The RDF graph that corresponds to the
serialization is a GRDDL result of the
original document
The “reference” result format is RDF/XML
Other formats can be used (Turtle, N3,etc.)
13. GRDDL Source Documents
The class of documents for which GRDDL
defines a way to extract a result graph:
XML Documents
XML Namespace Documents
Valid XHTML
XHTML Profiles
15. GRDDL: XML Documents
GRDDL Namespace (grddl prefix)
http://www.w3.org/2003/g/data-view#
transformation attribute
<?xml version=“1.0” encoding=“UTF-8”?>
<root
xmlns:grddl='http://www.w3.org/2003/g/data-view#’
grddl:transformation=“.. path to transform ..”>
… XML content ..
</root>
16. Namespace Documents
“Transformations can be associated not only with individual
documents but also with whole dialects that share an XML
namespace”
A GRDDL source document lives at the
location of the namespace URI of the root
element (the namespace document)
The GRDDL result of the namespace
document has a statement of the form:
?nsDoc grddl:namespaceTransformation ?txDoc
• txDoc is the location of a transformation
applicable to such XML documents
17. Valid XHTML Documents
<html xmlns="http://www.w3.org/1999/xhtml">
<head
profile="http://www.w3.org/2003/g/data-view">
<title>Some Document</title>
<link rel="transformation"
href=”.. path to transformation .. " />
...
</head>
…
</html>
Refers to the GRDDL XHTML profile
Licenses the interpretation of
rel=“transformation” links
18. XHTML Profiles
“Adding a GRDDL profileTransformation assertion to a profile
document is much like adding a namespaceTransformation
assertion to a namespace document”
A GRDDL source document lives at the
location of the profile URI an XHTML
document
The GRDDL result of the profile document
has a statement of the form:
?profileDoc grddl:profileTransformation ?txDoc
• txDoc is the location of a transformation
applicable to such XML documents
19. The How
GRDDL builds on existing XML & RDF
standards
An implementation mostly needs to
orchestrate:
Parsing of data representations
Resolving representations from web locations
The necessary XML processing to peek into and
harvest RDF from the various sources
The highly recursive nature of GRDDL
21. Anatomy of a GRDDL
Implementation: GRDDL.py
A reference implementation from scratch
650 LOC
RDFLib, 4Suite-XML, and Python control logic
A layered approach
Core module that handles transformations
One module per source type stacked on top of the
core
A top layer that orchestrates the recursion and
identification of which ‘class’ a source document
belongs to
24. The Where
GRDDL services online:
http://triplr.org/ (Stuff in, triples out)
http://www.w3.org/2007/08/grddl/ (W3C GRDDL
Service)
Primary GRDDL implementations:
Redland
GRDDL.py
Virtuoso
GRDDL Reader for Jena
RDFa is most common GRDDL source
content format in the wild
25. Hidden Value Proposition
Supports separation of concerns:
XML for messaging, data collection,
structural validation
RDF for Expressive assertions, inference,
etc.
A way to invest in data richness and
accessibility
26. GRDDL Usecases
Embedding scheduling assertions on
personal pages
Using GRDDL for extracting RDF from XML
medical record documents
Cleveland Clinic use case (clinical
research)
Aggregating web-based product reviews
Embedding web service descriptions
Adding semantic assertions to XML schemas
Embedding semantic assertions to Wikis