4. the three syntaxes
• several solutions for embedding semantic data in Web
pages
• three syntaxes known (by Google) as “rich snippets”
- microformats
- RDFa
- HTML microdata
• all three are supported by Google, while
- microdata is the “recommended” syntax
4
5. First came microformats
• microformats emerged around 2005
• some key principles
- start by solving simple, specific problems
- design for humans first, machines second
• wide deployment
- used on billions of Web pages
- usage share was at 94% vis-a-vis competing
formats (before microdata, anyway)
• formats exist for marking up Atom feeds, calendars,
addresses and contact info, geo-location, multimedia,
news, products, recipes, reviews, resumes, social
relationships, etc.
5
7. then came RDFa
• RDFa aims to bridge the gap between human-
oriented HTML and machine-oriented RDF
documents
• provides XHTML attributes to indicate machine-
understandable information
• uses the RDF data model, and Semantic Web
vocabularies directly
7
9. last but not least, microdata
• microdata syntax is based on nested groups of name-
value pairs
• HTML microdata specification includes
- an unambiguous parsing model
- an algorithm to convert microdata to RDF
• compatible with the Semantic Web via mappings
9
11. microdata properties
• annotate an item with text-valued properties using
the “itemprop” attribute
<div itemscope>
<p>My name is <span
itemprop="name">Daniel</span>.</p>
</div>
11
12. multiple values are OK
• as in RDF, you can have two properties, for the same
item (subject) with the same value (object)
<div itemscope>
<p>Flavors in my favorite ice cream:</p>
<ul>
<li itemprop="flavor">Lemon sorbet</li>
<li itemprop="flavor">Apricot sorbet</li>
</ul>
</div>
12
13. item types
• these correspond to classes in RDF
<section itemscope itemtype="http://example.org/
animals#cat">
<h1 itemprop="name">Hedral</h1>
<p itemprop="desc">Hedral is a male american domestic
shorthair, with a fluffy black fur with white paws
and belly.</p>
<img itemprop="img" src="hedral.jpeg" alt=""
title="Hedral, age 18 months">
</section>
13
14. global IDs
• items may be given global identifiers, which are URLs
• they may be, but do not need to be Semantic Web
URIs
<dl itemscope
itemtype="http://vocab.example.net/book"
itemid="urn:isbn:0-330-34032-8">
<dt>Title
<dd itemprop="title">The Reality Dysfunction
<dt>Author
<dd itemprop="author">Peter F. Hamilton
<dt>Publication date
<dd><time itemprop="pubdate" datetime="1996-01-26">26
January 1996</time>
</dl>
14
16. the schema.org vocabulary
• schema.org is one of a number of microdata
vocabularies
• it is a shared collection of microdata schemas for use
by webmasters
• includes a type hierarchy, like an RDFS schema
- starts with top-level Thing and DataType types
- properties are inherited by descendant types
16
17. Why should you use schema.org?
There are several reasons.
17
23. schema.rdfs.org
• maintains schema.org ↔ RDF mappings
- there are mappings for BIBO, DBpedia, Dublin
Core, FOAF, GoodRelations, SIOC, and WordNet
• also provides examples, tutorials, and data dumps
See: http://schema.rdfs.org/mappings.html
23
24. schema.org tools
• Google’s Rich Snippets Testing Tool
• schema.org libraries are available in Java,
JavaScript, Perl, PHP, Python, and Ruby
• there are schema.org modules for Drupal, Joomla!,
WordPress, and Virtuoso
• online tools include microdata extractors, generators
and validators
• sindice.com supports microdata
See: http://schema.rdfs.org/tools.html
24
25. schema.org extensions
• there are dozens of schema.org community proposals
- they extend existing schema.org vocabulary
• several have already been accepted into schema.org,
incl.
- Job Postings
- IPTC/rNews integration
- User Comments
• others: Comics, Learning Resources, TV and Radio,
Software Application, etc.
25
28. the Dataset vocabulary: types
• DataCatalog
- a collection of datasets
- e.g. the International Open Government Data catalog
• Dataset
- an individual, abstract data set
- e.g. a data set about seismic hazard zones near San
Francisco
• DataDownload
- a dataset in downloadable form
- e.g. an RDF/XML dump of the seismic hazard zones
data set
28
29. the Dataset vocabulary: properties
• catalog
- the catalog containing a dataset
• dataset
- a dataset contained in a catalog
• distribution
- a data download for a dataset
• keyword
- the topic of a dataset
• spatial
- the spatial extent of a data set (e.g. United States)
29
30. Dataset extension RDF
• the Dataset extension maps to a subset of the Data
Catalog Vocabulary (DCAT)
• many other types and properties are inherited from
schema.org
• collectively, they cover
- around 2/3 of DCAT, and
- around half of the Asset Description Metadata
Schema (ADMS)
30
33. Google extracts this data
Item
Type: http://schema.org/dataset
name = Seismic Hazard Zones
url = http://www.datasf.org/story.php?title=seismic-hazard-zones-
description = The dataset represents the Liquefaction and Landslide Zones [...]
spatial = Item( 1 )
publisher = Item( 2 )
Item 1
Type: http://schema.org/country
name = United States
Item 2
Type: http://schema.org/organization
name = Department of Technology
33
34. Resources
• HTML microdata
- http://www.w3.org/TR/microdata
• Schema.RDFS.org
- http://schema.rdfs.org
• W3C Web Schemas group (public-vocabs@w3c.org)
- http://lists.w3.org/Archives/Public/public-vocabs
• The Dataset proposal
- http://www.w3.org/wiki/WebSchemas/Datasets
• Rich Snippets Testing Tool
- http://google.com/webmasters/tools/richsnippets
34
35. Credits
• word clouds by
- http://wordle.net
• deployment statistics discovered using Sindice and
Sindice4j
- http://sindice.com
- http://sindice4j.googlecode.com
35