SlideShare une entreprise Scribd logo
1  sur  44
Télécharger pour lire hors ligne
'Schema.org and One Hundred Years of Search'

                            Libraries, Media and The Semantic Web

                           BBC Academy, March 28th 2012, London


                                 Dan Brickley <danbri@danbri.org>



Friday, March 30, 2012
In 20 minutes

                     • Introduce you to the schema.org initiative
                     • Revisit 'the Web before the Web' of 1912
                     • Use this to describe what's new with
                         schema.org, ... and the practical choices we
                         face when scaling to billions of users and
                         pages



Friday, March 30, 2012
Intro: Dan Brickley
                   • Ex-W3C, helped start Semantic Web project
                   • Worked on RDF/S, FOAF, SKOS & other
                         standards around W3C
                   • Currently danbri@google.com working on
                         <http://schema.org/> project
                   • See also <http://danbri.org/>, @danbri

Friday, March 30, 2012
Back to 1912

Friday, March 30, 2012
Friday, March 30, 2012
■ The Republic of China is proclaimed.
                   ■ Albert Berry makes the first parachute jump from a moving airplane.
                   ■ Prague Party Conference: Vladimir Lenin and the Bolshevik Party
                     break away from the rest of the Russian Social Democratic Labour
                     Party.
                   ■ France establishes a protectorate over Morocco.
                   ■ RMS Titanic strikes an iceberg in the northern Atlantic Ocean.
                   ■ Paramount Pictures, the oldest American motion picture studio still in
                     operation, is founded
                   ■ Albania declares independence from the Ottoman Empire.
                   ■ First Balkan War
                   ■ Alan Turing, British mathematician is born

                   ■ Semantic search over structured data goes mainstream, in Belgium.


                           source: http://en.wikipedia.org/wiki/1912
Friday, March 30, 2012
Credit and thanks: W. Boyd Rayward
Friday, March 30, 2012
Sample queries from 1912

Moteur Diesel. Philosophie des mathematiques. Les pecheries
au Maroc et sur la cote d'Espagne. Finances Bulgares.
Gyroscope. Culte de feu. Motocolture (garden). Evolution de
la dent humaine. Emigration italienne. Casier civil. Chemin de
fer de bagdad (railroad...). Planete Mars. Suffrage universel.
Nevrose traumatique.        Eugenism. Le saumon; Saumons
manques et repeches. Boomerang. Fabrication del la
cyanamide. Emigration des Juifs. Intoxications par le tabac.
Quantite d'huile d'olive importee en Belgique. Jurisprudence
des compagnies d'assurances en Angleterre, Hollande et
Danemark...

Friday, March 30, 2012
Friday, March 30, 2012
Search before search
                     •   Paul Otlet, "the man who dreamed the Internet", http://
                         www.youtube.com/watch?v=fmsOI5SdLkE

                     •   "The International Centre organises collections of world-wide
                         importance. These collections are the International Museum, the
                         International Library, the International Bibliographic Catalogue and
                         the Universal Documentary Archives. These collections are
                         conceived as parts of one universal body of documentation, as an
                         encyclopedic survey of human knowledge, as an enormous
                         intellectual warehouse of books, documents, catalogues and
                         scientific objects."

                     •   Start at http://en.wikipedia.org/wiki/Mundaneum for full whole story




Friday, March 30, 2012
Libraries, media & ...?
                     • Universal Decimal Classification (UDC)
                         used in many 1000s of libraries today
                     • In BBC archive for 40 years, as 'Lonclass'
                     • Shows the challenge and promise of
                         structured description
                     • So what's in Lonclass? What's not in Lonclass!

Friday, March 30, 2012
Friday, March 30, 2012
Friday, March 30, 2012
Friday, March 30, 2012
Friday, March 30, 2012
Friday, March 30, 2012
Friday, March 30, 2012
Friday, March 30, 2012
Friday, March 30, 2012
Friday, March 30, 2012
Lonclass by example
                     •   R672:32.007(47)YELTSIN:342.518.1THATCHER
                         “TWO SHOTS OF MARGARET THATCHER
                         AND BORIS YELTSIN”

                     •   [BRITISH AEROSPACE].007.11PEARCE:
                         656.881:342.518.1THATCHER “LETTER TO MRS
                         THATCHER FROM SIR AUSTIN PEARCE”

                     •   656.881:301.162.721:32.007THATCHER:
                         654.192.731TV-AM “MARGARET THATCHER'S
                         LETTER OF APOLOGY TO TV AM”


Friday, March 30, 2012
Compositional Semantics


                     •   656.881:301.162.721 “LETTERS OF APOLOGY”

                     •   656.881 “LETTERS (POSTAL SERVICES)”

                     •   656.881:06.022.6 “RESIGNATION LETTERS”

                     •   654.192.731TV-AM “TV AM (TELEVISION AM)”


       (this work pre-dated modern linguistics, never mind computing...)

Friday, March 30, 2012
Archives and classification

                     • Lonclass tells a story of the world; of this
                         country at least; and a lot about the rest
                     • It is huge - 1000s of terms, composite
                         sentence-like codes, and rather sparse
                     • It began with UDC in 1890s, and remains
                         key to BBC's media archives even today



Friday, March 30, 2012
Friday, March 30, 2012
And now for something new.



Friday, March 30, 2012
Schema.org
                     • Search engine collaboration:
                      • Google, Bing,Yahoo! & Yandex
                     • Simple factual data for better search
                     • Launched June 2011, schema.org schema
                      • 300 classes, 261 properties & growing
                      • discussions: W3C WebSchemas group
Friday, March 30, 2012
Example: Google Rich Snippets




                         From: http://www.google.com/webmasters/tools/richsnippets
                         See also Yandex's http://webmaster.yandex.ru/microtest.xml
Friday, March 30, 2012
On IMDB:
     <div id="content-2-wide" itemscope itemtype="http://schema.org/CreativeWork">


    <div class="txt-block">
     <h4 class="inline">Stars:</h4>
     <a onclick="(new Image()).src='/rg/title-overview/star-1/images/b.gif?link=%2Fname
    %2Fnm0010930%2F';" href="/name/nm0010930/" itemprop="actors">Douglas Adams</a>,
    <a onclick="(new Image()).src='/rg/title-overview/star-2/images/b.gif?link=%2Fname
    %2Fnm0048982%2F';" href="/name/nm0048982/" itemprop="actors">Tom Baker</a> and <a
    onclick="(new Image()).src='/rg/title-overview/star-3/images/b.gif?link=%2Fname
    %2Fnm3035100%2F';" href="/name/nm3035100/" itemprop="actors">Hans Peter Brondmo</
    a>
    </div>

   <div class="star-box" itemprop="aggregateRating" itemscope itemtype="http://schema.org/AggregateRating">



    Linked Data: see http://www.imdb.com/name/nm0010930/ for schema.org markup describing Douglas
    Adams as a http://schema.org/Person (jobTitle, birthDate, description, performerIn, ...).


Friday, March 30, 2012
What’s in the schema?

                     •    Classes (types) e.g. LocalBusiness, Person,
                          Organization,VideoObject, TVSeries...
                     • Properties (attributes) e.g. openingHours,
                          transcript, productionCompany, streetAddress
                     • That’s all - a dictionary of terms, used for
                          annotating data within normal Web pages



Friday, March 30, 2012
CreativeWork
                                             event
                                                       UserInteraction

                         LocalBusiness
                                                      intangible

                                   place            Organization

                                                 CivicStructure
                                         Landform



Friday, March 30, 2012
Another example:




Friday, March 30, 2012
<div itemscope itemtype="http://schema.org/Restaurant">

   <span itemprop="name">GreatFood</span>

   <div itemprop="address" itemscope itemtype="http://schema.org/PostalAddress">
     <span itemprop="streetAddress">1901 Lemur Ave</span>
     <span itemprop="addressLocality">Sunnyvale</span>,
     <span itemprop="addressRegion">CA</span>
     <span itemprop="postalCode">94086</span>
   </div>

   <span itemprop="telephone">(408) 714-1489</span>
   <a itemprop="url" href="http://www.dishdash.com">www.greatfood.com</a>

   Hours:
   <meta itemprop="openingHours" content="Mo-Sa 11:00-14:30">Mon-Sat 11am-2:30pm
   <meta itemprop="openingHours" content="Mo-Th 17:00-21:30">Mon-Thu 5pm-9:30pm
   <meta itemprop="openingHours" content="Fr-Sa 17:00-22:00">Fri-Sat 5pm-10:00pm

   Categories:
   <span itemprop="servesCuisine">Middle Eastern</span>,
   <span itemprop="servesCuisine">Mediterranean</span>

</div>




Friday, March 30, 2012
Schema.org scope
             • In-page structured data for search
             • Not asking an unconstrained “so, how do we
                     describe cars?”, but “how can we improve
                     markup on existing pages that describe
                     cars?” (or Comics, SoftwareApps, Sports, ...)
             • Simplify publisher/webmaster experience
             • Record agreements between search engines
             • Central use case: augmented search results
Friday, March 30, 2012
Friday, March 30, 2012
Schema.org and UDC
                     • In many ways the opposite of UDC
                     • Small (by contrast), pragmatic, Web-based
                     • Yet by Semantic Web standards and
                         culture, it is a big 'centralised' schema
                     • The art is finding ways to decentralise
                         without creating chaos
                     • We don't want to re-invent UDC, or
                         Wikipedia; but integrate such things into
                         simple descriptive templates for search
Friday, March 30, 2012
Lots missing! e.g. sports
                  • Current vocabulary emphasizes 'points of
                         interest' on a map and sporting activities
                         rather than sports content 'as entertainment'
                  • We also have terms to describe videos, TV
                         shows etc., ...but no sports-specifics yet
                  • How deep to go? How to integrate with
                         existing vocabulary? How to identify players,
                         teams, kinds of 'football'? Video clips for that
                         'hand of God' goal?
Friday, March 30, 2012
http://www.w3.org/wiki/WebSchemas/SchemaDotOrgProposals




   Job postings (done), rNews(done), Comics, Learning,
   ScholaryArticle, Software, Events, Genealogy, Real Estate,
   eCommerce, Health, Sports, Transport,Vehicles, Comments,
   Datasets, Bio, ... (+bugfixes, integration, ...)
Friday, March 30, 2012
Everything overlaps
                                                                 *




                     • We added JobPosting; what if the job was
                         sports-related?
                     • We're adding educational markup; does it
                         help describe sports education, training?
                     • Is there a sports perspective on the health/
                         medical vocabulary we're working on?
                     • Can't coordinate everything! Pragmatism...
                                                                     * 'intertwingularity'

Friday, March 30, 2012
Practicalities
                     • Delegation to external sources for
                         enumerations and detail
                         • e.g. country codes from UN FAO or
                           Wikipedia/DBpedia/Wikidata
                     • We don’t want to create big enumerations
                      • all the countries? sports? things that go on maps?
                     • Decentralised subclassing & property values
Friday, March 30, 2012
Process
                     • Search partners retain ultimate oversight
                     • W3C hosts community group, discussion,
                         wiki and proposal tracking
                     • Web Schemas group - planning monthly
                         telecons at W3C, based around proposals
                     • Evolving, pragmatic, collaborative

Friday, March 30, 2012
Compositional
                         Semantics revisited
                     • If we have SportsCentre and Karate, we
                         can we describe a Karate Club?
                     • If we have recipes vocab, and medical
                         vocab, and restaurants, can we describe
                         allergy free food?
                     • If UN have country codes, Wikipedia list
                         religions, ... then we just re-use those


Friday, March 30, 2012
And libraries
                     • If the library world share their controlled
                         vocabularies as open SKOS linked data
                     • ...can we plug them directly into
                         schema.org descriptions?
                         • of videos? news? scholarly articles? (yes)
                     • Why re-invent when you can collaborate?

Friday, March 30, 2012
WebSchemas public-vocabs list

                     • Schema.org process
                      • Looking for rough consensus and
                            incremental improvements
                          • Realistic examples, simplicity for
                            publishers, and re-use of existing
                            vocabulary are important
                          • <http://www.w3.org/wiki/WebSchemas/>
Friday, March 30, 2012
Friday, March 30, 2012

Contenu connexe

Plus de Dan Brickley

SemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in PracticeSemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in PracticeDan Brickley
 
XMPP, TV and the Semantic Web
XMPP, TV and the Semantic WebXMPP, TV and the Semantic Web
XMPP, TV and the Semantic WebDan Brickley
 
Describing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDescribing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDan Brickley
 
NoTube User Model slides
NoTube User Model slidesNoTube User Model slides
NoTube User Model slidesDan Brickley
 
Dagstuhl FOAF history talk
Dagstuhl FOAF history talkDagstuhl FOAF history talk
Dagstuhl FOAF history talkDan Brickley
 
Understanding the Standards Gap
Understanding the Standards GapUnderstanding the Standards Gap
Understanding the Standards GapDan Brickley
 
Introducing Joost Widgets (2007 talk)
Introducing Joost Widgets (2007 talk)Introducing Joost Widgets (2007 talk)
Introducing Joost Widgets (2007 talk)Dan Brickley
 
BBC SemWeb panel: Where does OpenID fit in?
BBC SemWeb panel: Where does OpenID fit in?BBC SemWeb panel: Where does OpenID fit in?
BBC SemWeb panel: Where does OpenID fit in?Dan Brickley
 
How To Make Friends And Inference People
How To Make Friends And Inference PeopleHow To Make Friends And Inference People
How To Make Friends And Inference PeopleDan Brickley
 
One Big Happy Family
One Big Happy FamilyOne Big Happy Family
One Big Happy FamilyDan Brickley
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)Dan Brickley
 
"Whatever I can get..."
"Whatever I can get...""Whatever I can get..."
"Whatever I can get..."Dan Brickley
 

Plus de Dan Brickley (16)

SemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in PracticeSemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in Practice
 
Acronym Soup
Acronym SoupAcronym Soup
Acronym Soup
 
XMPP, TV and the Semantic Web
XMPP, TV and the Semantic WebXMPP, TV and the Semantic Web
XMPP, TV and the Semantic Web
 
Describing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDescribing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classification
 
NoTube User Model slides
NoTube User Model slidesNoTube User Model slides
NoTube User Model slides
 
Dagstuhl FOAF history talk
Dagstuhl FOAF history talkDagstuhl FOAF history talk
Dagstuhl FOAF history talk
 
Understanding the Standards Gap
Understanding the Standards GapUnderstanding the Standards Gap
Understanding the Standards Gap
 
Introducing Joost Widgets (2007 talk)
Introducing Joost Widgets (2007 talk)Introducing Joost Widgets (2007 talk)
Introducing Joost Widgets (2007 talk)
 
When?
When?When?
When?
 
BBC foaf talk
BBC foaf talkBBC foaf talk
BBC foaf talk
 
BBC SemWeb panel: Where does OpenID fit in?
BBC SemWeb panel: Where does OpenID fit in?BBC SemWeb panel: Where does OpenID fit in?
BBC SemWeb panel: Where does OpenID fit in?
 
How To Make Friends And Inference People
How To Make Friends And Inference PeopleHow To Make Friends And Inference People
How To Make Friends And Inference People
 
One Big Happy Family
One Big Happy FamilyOne Big Happy Family
One Big Happy Family
 
Foaf Openid Milan
Foaf Openid MilanFoaf Openid Milan
Foaf Openid Milan
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)
 
"Whatever I can get..."
"Whatever I can get...""Whatever I can get..."
"Whatever I can get..."
 

Dernier

Top 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTop 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTopCSSGallery
 
The New Cloud World Order Is FinOps (Slideshow)
The New Cloud World Order Is FinOps (Slideshow)The New Cloud World Order Is FinOps (Slideshow)
The New Cloud World Order Is FinOps (Slideshow)codyslingerland1
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdfThe Good Food Institute
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch TuesdayIvanti
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxHow to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxKaustubhBhavsar6
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIVijayananda Mohire
 
Extra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfExtra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfInfopole1
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and businessFrancesco Corti
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingMAGNIntelligence
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Alkin Tezuysal
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updateadam112203
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0DanBrown980551
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4DianaGray10
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Libraryshyamraj55
 
AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024Brian Pichman
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameKapil Thakar
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTxtailishbaloch
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosErol GIRAUDY
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 

Dernier (20)

Top 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTop 10 Squarespace Development Companies
Top 10 Squarespace Development Companies
 
The New Cloud World Order Is FinOps (Slideshow)
The New Cloud World Order Is FinOps (Slideshow)The New Cloud World Order Is FinOps (Slideshow)
The New Cloud World Order Is FinOps (Slideshow)
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch Tuesday
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxHow to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptx
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAI
 
Extra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfExtra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdf
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and business
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced Computing
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 update
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Library
 
AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First Frame
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenarios
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 

Schema.org and One Hundred Years of Search

  • 1. 'Schema.org and One Hundred Years of Search' Libraries, Media and The Semantic Web BBC Academy, March 28th 2012, London Dan Brickley <danbri@danbri.org> Friday, March 30, 2012
  • 2. In 20 minutes • Introduce you to the schema.org initiative • Revisit 'the Web before the Web' of 1912 • Use this to describe what's new with schema.org, ... and the practical choices we face when scaling to billions of users and pages Friday, March 30, 2012
  • 3. Intro: Dan Brickley • Ex-W3C, helped start Semantic Web project • Worked on RDF/S, FOAF, SKOS & other standards around W3C • Currently danbri@google.com working on <http://schema.org/> project • See also <http://danbri.org/>, @danbri Friday, March 30, 2012
  • 4. Back to 1912 Friday, March 30, 2012
  • 6. ■ The Republic of China is proclaimed. ■ Albert Berry makes the first parachute jump from a moving airplane. ■ Prague Party Conference: Vladimir Lenin and the Bolshevik Party break away from the rest of the Russian Social Democratic Labour Party. ■ France establishes a protectorate over Morocco. ■ RMS Titanic strikes an iceberg in the northern Atlantic Ocean. ■ Paramount Pictures, the oldest American motion picture studio still in operation, is founded ■ Albania declares independence from the Ottoman Empire. ■ First Balkan War ■ Alan Turing, British mathematician is born ■ Semantic search over structured data goes mainstream, in Belgium. source: http://en.wikipedia.org/wiki/1912 Friday, March 30, 2012
  • 7. Credit and thanks: W. Boyd Rayward Friday, March 30, 2012
  • 8. Sample queries from 1912 Moteur Diesel. Philosophie des mathematiques. Les pecheries au Maroc et sur la cote d'Espagne. Finances Bulgares. Gyroscope. Culte de feu. Motocolture (garden). Evolution de la dent humaine. Emigration italienne. Casier civil. Chemin de fer de bagdad (railroad...). Planete Mars. Suffrage universel. Nevrose traumatique. Eugenism. Le saumon; Saumons manques et repeches. Boomerang. Fabrication del la cyanamide. Emigration des Juifs. Intoxications par le tabac. Quantite d'huile d'olive importee en Belgique. Jurisprudence des compagnies d'assurances en Angleterre, Hollande et Danemark... Friday, March 30, 2012
  • 10. Search before search • Paul Otlet, "the man who dreamed the Internet", http:// www.youtube.com/watch?v=fmsOI5SdLkE • "The International Centre organises collections of world-wide importance. These collections are the International Museum, the International Library, the International Bibliographic Catalogue and the Universal Documentary Archives. These collections are conceived as parts of one universal body of documentation, as an encyclopedic survey of human knowledge, as an enormous intellectual warehouse of books, documents, catalogues and scientific objects." • Start at http://en.wikipedia.org/wiki/Mundaneum for full whole story Friday, March 30, 2012
  • 11. Libraries, media & ...? • Universal Decimal Classification (UDC) used in many 1000s of libraries today • In BBC archive for 40 years, as 'Lonclass' • Shows the challenge and promise of structured description • So what's in Lonclass? What's not in Lonclass! Friday, March 30, 2012
  • 21. Lonclass by example • R672:32.007(47)YELTSIN:342.518.1THATCHER “TWO SHOTS OF MARGARET THATCHER AND BORIS YELTSIN” • [BRITISH AEROSPACE].007.11PEARCE: 656.881:342.518.1THATCHER “LETTER TO MRS THATCHER FROM SIR AUSTIN PEARCE” • 656.881:301.162.721:32.007THATCHER: 654.192.731TV-AM “MARGARET THATCHER'S LETTER OF APOLOGY TO TV AM” Friday, March 30, 2012
  • 22. Compositional Semantics • 656.881:301.162.721 “LETTERS OF APOLOGY” • 656.881 “LETTERS (POSTAL SERVICES)” • 656.881:06.022.6 “RESIGNATION LETTERS” • 654.192.731TV-AM “TV AM (TELEVISION AM)” (this work pre-dated modern linguistics, never mind computing...) Friday, March 30, 2012
  • 23. Archives and classification • Lonclass tells a story of the world; of this country at least; and a lot about the rest • It is huge - 1000s of terms, composite sentence-like codes, and rather sparse • It began with UDC in 1890s, and remains key to BBC's media archives even today Friday, March 30, 2012
  • 25. And now for something new. Friday, March 30, 2012
  • 26. Schema.org • Search engine collaboration: • Google, Bing,Yahoo! & Yandex • Simple factual data for better search • Launched June 2011, schema.org schema • 300 classes, 261 properties & growing • discussions: W3C WebSchemas group Friday, March 30, 2012
  • 27. Example: Google Rich Snippets From: http://www.google.com/webmasters/tools/richsnippets See also Yandex's http://webmaster.yandex.ru/microtest.xml Friday, March 30, 2012
  • 28. On IMDB: <div id="content-2-wide" itemscope itemtype="http://schema.org/CreativeWork"> <div class="txt-block"> <h4 class="inline">Stars:</h4> <a onclick="(new Image()).src='/rg/title-overview/star-1/images/b.gif?link=%2Fname %2Fnm0010930%2F';" href="/name/nm0010930/" itemprop="actors">Douglas Adams</a>, <a onclick="(new Image()).src='/rg/title-overview/star-2/images/b.gif?link=%2Fname %2Fnm0048982%2F';" href="/name/nm0048982/" itemprop="actors">Tom Baker</a> and <a onclick="(new Image()).src='/rg/title-overview/star-3/images/b.gif?link=%2Fname %2Fnm3035100%2F';" href="/name/nm3035100/" itemprop="actors">Hans Peter Brondmo</ a> </div> <div class="star-box" itemprop="aggregateRating" itemscope itemtype="http://schema.org/AggregateRating"> Linked Data: see http://www.imdb.com/name/nm0010930/ for schema.org markup describing Douglas Adams as a http://schema.org/Person (jobTitle, birthDate, description, performerIn, ...). Friday, March 30, 2012
  • 29. What’s in the schema? • Classes (types) e.g. LocalBusiness, Person, Organization,VideoObject, TVSeries... • Properties (attributes) e.g. openingHours, transcript, productionCompany, streetAddress • That’s all - a dictionary of terms, used for annotating data within normal Web pages Friday, March 30, 2012
  • 30. CreativeWork event UserInteraction LocalBusiness intangible place Organization CivicStructure Landform Friday, March 30, 2012
  • 32. <div itemscope itemtype="http://schema.org/Restaurant"> <span itemprop="name">GreatFood</span> <div itemprop="address" itemscope itemtype="http://schema.org/PostalAddress"> <span itemprop="streetAddress">1901 Lemur Ave</span> <span itemprop="addressLocality">Sunnyvale</span>, <span itemprop="addressRegion">CA</span> <span itemprop="postalCode">94086</span> </div> <span itemprop="telephone">(408) 714-1489</span> <a itemprop="url" href="http://www.dishdash.com">www.greatfood.com</a> Hours: <meta itemprop="openingHours" content="Mo-Sa 11:00-14:30">Mon-Sat 11am-2:30pm <meta itemprop="openingHours" content="Mo-Th 17:00-21:30">Mon-Thu 5pm-9:30pm <meta itemprop="openingHours" content="Fr-Sa 17:00-22:00">Fri-Sat 5pm-10:00pm Categories: <span itemprop="servesCuisine">Middle Eastern</span>, <span itemprop="servesCuisine">Mediterranean</span> </div> Friday, March 30, 2012
  • 33. Schema.org scope • In-page structured data for search • Not asking an unconstrained “so, how do we describe cars?”, but “how can we improve markup on existing pages that describe cars?” (or Comics, SoftwareApps, Sports, ...) • Simplify publisher/webmaster experience • Record agreements between search engines • Central use case: augmented search results Friday, March 30, 2012
  • 35. Schema.org and UDC • In many ways the opposite of UDC • Small (by contrast), pragmatic, Web-based • Yet by Semantic Web standards and culture, it is a big 'centralised' schema • The art is finding ways to decentralise without creating chaos • We don't want to re-invent UDC, or Wikipedia; but integrate such things into simple descriptive templates for search Friday, March 30, 2012
  • 36. Lots missing! e.g. sports • Current vocabulary emphasizes 'points of interest' on a map and sporting activities rather than sports content 'as entertainment' • We also have terms to describe videos, TV shows etc., ...but no sports-specifics yet • How deep to go? How to integrate with existing vocabulary? How to identify players, teams, kinds of 'football'? Video clips for that 'hand of God' goal? Friday, March 30, 2012
  • 37. http://www.w3.org/wiki/WebSchemas/SchemaDotOrgProposals Job postings (done), rNews(done), Comics, Learning, ScholaryArticle, Software, Events, Genealogy, Real Estate, eCommerce, Health, Sports, Transport,Vehicles, Comments, Datasets, Bio, ... (+bugfixes, integration, ...) Friday, March 30, 2012
  • 38. Everything overlaps * • We added JobPosting; what if the job was sports-related? • We're adding educational markup; does it help describe sports education, training? • Is there a sports perspective on the health/ medical vocabulary we're working on? • Can't coordinate everything! Pragmatism... * 'intertwingularity' Friday, March 30, 2012
  • 39. Practicalities • Delegation to external sources for enumerations and detail • e.g. country codes from UN FAO or Wikipedia/DBpedia/Wikidata • We don’t want to create big enumerations • all the countries? sports? things that go on maps? • Decentralised subclassing & property values Friday, March 30, 2012
  • 40. Process • Search partners retain ultimate oversight • W3C hosts community group, discussion, wiki and proposal tracking • Web Schemas group - planning monthly telecons at W3C, based around proposals • Evolving, pragmatic, collaborative Friday, March 30, 2012
  • 41. Compositional Semantics revisited • If we have SportsCentre and Karate, we can we describe a Karate Club? • If we have recipes vocab, and medical vocab, and restaurants, can we describe allergy free food? • If UN have country codes, Wikipedia list religions, ... then we just re-use those Friday, March 30, 2012
  • 42. And libraries • If the library world share their controlled vocabularies as open SKOS linked data • ...can we plug them directly into schema.org descriptions? • of videos? news? scholarly articles? (yes) • Why re-invent when you can collaborate? Friday, March 30, 2012
  • 43. WebSchemas public-vocabs list • Schema.org process • Looking for rough consensus and incremental improvements • Realistic examples, simplicity for publishers, and re-use of existing vocabulary are important • <http://www.w3.org/wiki/WebSchemas/> Friday, March 30, 2012