Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

semantic markup using schema.org

20 548 vues

Publié le

A basic intro to microdata and schema.org, along with a new schema.org extension for datasets and data catalogs. "TWed" talk April 4, 2012.

Publié dans : Formation, Technologie

semantic markup using schema.org

  1. Joshua ShinavierWednesday Nights in the Tetherless World (TWed) April 4th, 2012
  2. Outline• rich snippets • microformats • RDFa • microdata• microdata syntax• schema.org • deployment • mappings, tools, extensions• the Dataset extension 2
  3. 3
  4. the three syntaxes• several solutions for embedding semantic data in Web pages• three syntaxes known (by Google) as “rich snippets” - microformats - RDFa - HTML microdata• all three are supported by Google, while - microdata is the “recommended” syntax 4
  5. First came microformats• microformats emerged around 2005• some key principles - start by solving simple, specific problems - design for humans first, machines second• wide deployment - used on billions of Web pages - usage share was at 94% vis-a-vis competing formats (before microdata, anyway)• formats exist for marking up Atom feeds, calendars, addresses and contact info, geo-location, multimedia, news, products, recipes, reviews, resumes, social relationships, etc. 5
  6. microformats example<div class="vcard"> <a class="fn org url" href="http://www.commerce.net/">CommerceNet</a> <div class="adr"> <span class="type">Work</span>: <div class="street-address">169 University Avenue</div> <span class="locality">Palo Alto</span>, <abbr class="region" title="California">CA</abbr>&nbsp;&nbsp; <span class="postal-code">94301</span> <div class="country-name">USA</div> </div> <div class="tel"> <span class="type">Work</span> +1-650-289-4040 </div> <div>Email: <span class="email">info@commerce.net</span> </div></div> 6
  7. then came RDFa• RDFa aims to bridge the gap between human- oriented HTML and machine-oriented RDF documents• provides XHTML attributes to indicate machine- understandable information• uses the RDF data model, and Semantic Web vocabularies directly 7
  8. RDFa example<div typeof="foaf:Person" xmlns:foaf="http://xmlns.com/foaf/0.1/"> <p property="foaf:name"> Alice Birpemswick </p> <p> Email: <a rel="foaf:mbox"href="mailto:alice@example.com">alice@example.com</a> </p> <p> Phone: <a rel="foaf:phone" href="tel:+1-617-555-7332">+1 617.555.7332</a> </p></div> 8
  9. last but not least, microdata• microdata syntax is based on nested groups of name- value pairs• HTML microdata specification includes - an unambiguous parsing model - an algorithm to convert microdata to RDF• compatible with the Semantic Web via mappings 9
  10. 10
  11. microdata properties • annotate an item with text-valued properties using the “itemprop” attribute<div itemscope> <p>My name is <spanitemprop="name">Daniel</span>.</p></div> 11
  12. multiple values are OK • as in RDF, you can have two properties, for the same item (subject) with the same value (object)<div itemscope> <p>Flavors in my favorite ice cream:</p> <ul> <li itemprop="flavor">Lemon sorbet</li> <li itemprop="flavor">Apricot sorbet</li> </ul></div> 12
  13. item types • these correspond to classes in RDF<section itemscope itemtype="http://example.org/animals#cat"> <h1 itemprop="name">Hedral</h1> <p itemprop="desc">Hedral is a male american domestic shorthair, with a fluffy black fur with white pawsand belly.</p> <img itemprop="img" src="hedral.jpeg" alt=""title="Hedral, age 18 months"></section> 13
  14. global IDs • items may be given global identifiers, which are URLs • they may be, but do not need to be Semantic Web URIs<dl itemscope itemtype="http://vocab.example.net/book" itemid="urn:isbn:0-330-34032-8"> <dt>Title <dd itemprop="title">The Reality Dysfunction <dt>Author <dd itemprop="author">Peter F. Hamilton <dt>Publication date <dd><time itemprop="pubdate" datetime="1996-01-26">26January 1996</time></dl> 14
  15. 15
  16. the schema.org vocabulary• schema.org is one of a number of microdata vocabularies• it is a shared collection of microdata schemas for use by webmasters• includes a type hierarchy, like an RDFS schema - starts with top-level Thing and DataType types - properties are inherited by descendant types 16
  17. Why should you use schema.org? There are several reasons. 17
  18. current schema.org types (there are around 300 of them) 18
  19. In terms of deployment... ...a few key types stand out. 19
  20. Top types type occurrences relativeProduct 5001966 0.27689260175PostalAddress 1437388 0.07956913403WebPage 1402426 0.07763375119Offer 1267545 0.07016717684Book 1111463 0.06152698395Person 968737 0.05362613587AggregateRating 780967 0.04323179816GeoCoordinates 546586 0.03025722678LocalBusiness 544662 0.03015072039Article 525487 0.02908925463Place 490433 0.02714877897Residence 451652 0.02500198869ItemPage 421911 0.02335562347Organization 405876 0.02246797792Blog 268582 0.01486782772 20
  21. Who’s using it?Over 1,000 domains found (through Sindice) 21
  22. Some early adopters domain occurrences relativewww.couponcabin.com 3662 0.04400596www.digifotopro.nl 2852 0.034272255www.weg.de 2336 0.028071525futpedia.globo.com 2003 0.02406989www.the-plug.com 2001 0.024045857www.virtualtourist.com 1953 0.023469044gdgt.com 1857 0.02231542www.notasdeprensa.es 1564 0.018794463www.libreriadelsanto.it 1294 0.015549894liriklaguindonesia.net 1274 0.015309556www.direct2florist.com 1080 0.012978273www.bluefountainmedia.com 1065 0.01279802www.alphabetsigns.com 1059 0.012725918www.tasit.com 1004 0.012064988www.teachstreet.com 1001 0.012028937 22
  23. schema.rdfs.org• maintains schema.org ↔ RDF mappings - there are mappings for BIBO, DBpedia, Dublin Core, FOAF, GoodRelations, SIOC, and WordNet• also provides examples, tutorials, and data dumps See: http://schema.rdfs.org/mappings.html 23
  24. schema.org tools• Google’s Rich Snippets Testing Tool• schema.org libraries are available in Java, JavaScript, Perl, PHP, Python, and Ruby• there are schema.org modules for Drupal, Joomla!, WordPress, and Virtuoso• online tools include microdata extractors, generators and validators• sindice.com supports microdata See: http://schema.rdfs.org/tools.html 24
  25. schema.org extensions• there are dozens of schema.org community proposals - they extend existing schema.org vocabulary• several have already been accepted into schema.org, incl. - Job Postings - IPTC/rNews integration - User Comments• others: Comics, Learning Resources, TV and Radio, Software Application, etc. 25
  26. 26
  27. motivation: open government data 27
  28. the Dataset vocabulary: types• DataCatalog - a collection of datasets - e.g. the International Open Government Data catalog• Dataset - an individual, abstract data set - e.g. a data set about seismic hazard zones near San Francisco• DataDownload - a dataset in downloadable form - e.g. an RDF/XML dump of the seismic hazard zones data set 28
  29. the Dataset vocabulary: properties• catalog - the catalog containing a dataset• dataset - a dataset contained in a catalog• distribution - a data download for a dataset• keyword - the topic of a dataset• spatial - the spatial extent of a data set (e.g. United States) 29
  30. Dataset extension RDF• the Dataset extension maps to a subset of the Data Catalog Vocabulary (DCAT)• many other types and properties are inherited from schema.org• collectively, they cover - around 2/3 of DCAT, and - around half of the Asset Description Metadata Schema (ADMS) 30
  31. Dataset example (microdata)<div itemscope="itemscope" itemid="http://logd.tw.rpi.edu/source/datasf-org/dataset/catalog/datasf.org/version/2011-Jun-07/thing_89"itemtype="http://schema.org/Dataset"> <a href="http://www.datasf.org/story.php?title=seismic-hazard-zones-"><span itemprop="name"> <b>Seismic Hazard Zones</b> </span></a> <div><meta itemprop="url" content="http://www.datasf.org/story.php?title=seismic-hazard-zones-"/> <span itemprop="description">The dataset represents the Liquefactionand Landslide Zones [...]</span></div> <div><i>Country:</i> <a href="http://dbpedia.org/resource/United_States"><spanitemprop="spatial" itemscope="itemscope" itemtype="http://schema.org/Country"> <span itemprop="name">United States</span> </span> </a></div> <div><i>Publisher:</i> <span itemprop="publisher" itemscope="itemscope" itemtype="http://schema.org/Organization"> <span itemprop="name">Department of Technology</span> </span> </div></div> 31
  32. Dataset example (RDFa)<div about="http://logd.tw.rpi.edu/source/datasf-org/dataset/catalog/datasf.org/version/2011-Jun-07/thing_89" typeof="dcat:Dataset"> <div><b><a href="http://www.datasf.org/story.php?title=seismic-hazard-zones-"> <span property="dcterms:title">Seismic Hazard Zones</span> </a></b></div> <div property="dcterms:description">The dataset represents theLiquefaction and Landslide Zones [...]</div> <div rel="dcterms:spatial" resource="http://dbpedia.org/resource/United_States"><i>Country:</i> <a href="http://dbpedia.org/resource/United_States"> <span about="http://dbpedia.org/resource/United_States"typeof="adms:Country"> <span property="dcterms:title">United States</span> </span> </a> </div> <div rel="dcterms:publisher"><i>Publisher:</i> <span typeof="foaf:Organization"> <span property="dcterms:title">Department of Technology</span> </span> </div></div> 32
  33. Google extracts this dataItemType: http://schema.org/datasetname = Seismic Hazard Zonesurl = http://www.datasf.org/story.php?title=seismic-hazard-zones-description = The dataset represents the Liquefaction and Landslide Zones [...]spatial = Item( 1 )publisher = Item( 2 )Item 1Type: http://schema.org/countryname = United StatesItem 2Type: http://schema.org/organizationname = Department of Technology 33
  34. Resources• HTML microdata - http://www.w3.org/TR/microdata• Schema.RDFS.org - http://schema.rdfs.org• W3C Web Schemas group (public-vocabs@w3c.org) - http://lists.w3.org/Archives/Public/public-vocabs• The Dataset proposal - http://www.w3.org/wiki/WebSchemas/Datasets• Rich Snippets Testing Tool - http://google.com/webmasters/tools/richsnippets 34
  35. Credits• word clouds by - http://wordle.net• deployment statistics discovered using Sindice and Sindice4j - http://sindice.com - http://sindice4j.googlecode.com 35
  36. Thanks!• Tetherless World Constellation • http://tw.rpi.edu• Contact: • josh@fortytwo.net, @joshsh 36
  37. 37