SlideShare une entreprise Scribd logo
1  sur  45
Télécharger pour lire hors ligne
What a long, strange trip it’s been 
R.V.Guha 
Google 
schema.org
Outline of talk 
• The context 
– How did we end up where we are 
• Schema.org 
– What it is, status of adoption 
– Schema.org principles, how does it work 
• Looking ahead 
– Next Generation Applications 
schema.org
About 18 years ago, … 
• People started thinking about structured data on the web 
– A few people from Netscape, Microsoft and W3C got together @MIT 
• Trying to make sense of a flurry of activity/proposals 
– XML, MCF, CDF, Sitemaps, … 
• There were a number of problems 
– PICS, Meta data, sitemaps, … 
• But one unifying idea 
schema.org
Context: The Web for humans 
Structured 
Data 
Web server 
HTML 
schema.org
Goal: Web for Machines & Humans 
Structured 
Data 
Web server 
Apps 
schema.org
What does that mean? 
birthplace 
Chuck Norris 
Ryan, Oklahama 
birthdate 
March 10th 1940 
Actor 
type 
- Notable points 
- Graph Data Model 
- Common Vocabulary 
schema.org
How do we get there? 
• How does the author give us the graph 
– Data Model: Graph vs tree vs … 
– Syntax 
– Vocabulary 
– Identifiers for objects 
• Why should the author give us the graph? 
schema.org
Going depth first 
• Many heated battles 
– Lot of proposals, standards, companies, … 
• Data model 
– Trees vs DLGs vs Vertical specific vs who needs one? 
• Syntax 
– XML vs RDF vs json vs … 
• Model theory anyone 
– We need one vs who cares vs what’s that? 
schema.org
Timeline of ‘standards’ 
• ‘96: Meta Content Framework (MCF) (Apple) 
• ’97: MCF using XML (Netscape)  RDF, CDF 
• ’99 -- : RDF, RDFS 
• ’01 -- : DAML, OWL, OWL EL, OWL QL, OWL RL 
• ’03: Microformats 
• And many many many more … SPARQL, Turtle, N3, GRDDL, 
R2RML, FOAF, SIOC, SKOS, … 
• Lots of bells & whistles: model theory, inference, type systems, 
… 
schema.org
But something was missing … 
• Fewer than 1000 sites were using these standards 
• Something was clearly missing and it wasn’t more language 
features 
• We had forgotten the ‘Why’ part of the problem 
• The RSS story 
schema.org
’07 - :Rise of the consumers 
• Yahoo! Search Monkey, Google Rich Snippets, Facebook Open 
Graph 
• Offer webmasters a simple value proposition 
• Search engines to webmasters: 
– You give us data … we make your results nicer 
• Usage begins to take off 
– 1000x increase in markup’ed up pages in 3 years 
schema.org
Yahoo Search Monkey 
• Give websites control over snippet presentation 
• Moderate adoption 
– Targeted at high end developers 
– Too many choices 
schema.org
Google Rich Snippets: Reviews 
schema.org
Google Rich Snippets: Events 
schema.org
Google Rich Snippets 
• Multi-syntax 
• Adhoc vocabulary for each vertical 
• Very clear carrot 
• Lots of experimentation on UI 
• Moderately successful: 10ks of sites 
• Scaling issues with vocabulary 
schema.org
Situation in 2010 
• Too many choices/decisions for webmasters 
– Divergence in vocabularies 
• Too much fragmentation 
• N versions of person, address, … 
• A lot of bad/wrong markup 
– ~25% for micro-formats, ~40% with RDFA 
– Some spam, mostly unintended mistakes 
• Absolute adoption numbers still rather low 
– Less than 100k sites 
schema.org
Schema.org 
• Work started in August 2010 
– Google, Yahoo!, Microsoft & then Yandex 
• Goals: 
– One vocabulary understood by all the search engines 
– Make it very easy for the webmaster 
• It is A vocabulary. Not The vocabulary. 
– Webmasters can use it together other vocabs 
– We might not understand the other vocabs. Others might 
schema.org
Schema.org: Major sites 
• News: Nytimes, guardian.com, bbc.co.uk, 
• Movies: imdb, rottentomatoes, movies.com 
• Jobs / careers: careerjet.com, monster.com, indeed.com 
• People: linkedin.com, 
• Products: ebay.com, alibaba.com, sears.com, cafepress.com, 
sulit.com, fotolia.com 
• Videos: youtube, dailymotion, frequency.com, vinebox.com 
• Medical: cvs.com, drugs.com 
• Local: yelp.com, allmenus.com, urbanspoon.com 
• Events: wherevent.com, meetup.com, zillow.com, eventful 
• Music: last.fm, myspace.com, soundcloud.com 
schema.org
Schema.org principles: Simplicity 
• Simple things should be simple 
– For webmasters, not necessarily for consumers of markup 
– Webmasters shouldn’t have to deal with N namespaces 
• Complex things should be possible 
– Advanced webmasters should be able to mix and match 
vocabularies 
• Syntax 
– Microdata, usability studies 
– RDFa, json-ld, … 
schema.org
Schema.org principles: Simplicity 
• Can’t expect webmasters to understand Knowledge 
Representation, Semantic Web Query Languages, etc. 
• It has to fit in with existing workflows 
– A posteriori ‘markup tools’ don’t work 
• Avoid KR system driven artifacts 
– Multiple domain / range for attributes 
– No classes like ‘Agent’ 
– Categories and attributes should be concrete 
schema.org
Schema.org principles: Simplicity 
• Copy and edit as the default mode for authors 
– It is not a linear spec, but a tree of examples 
• Vocabularies 
– Authors only need to have local view 
– But schema.org tries to have a single global coherent 
vocabulary 
schema.org
Schema.org principles: Incremental 
• Started simple 
– ~ 100 categories at launch 
• Applies to every area 
– Add complexity after adoption 
– now ~1200 vocab items 
– Go back and fill in the blanks 
• Move fast, accept mistakes, iterate fast 
schema.org
Schema.org Principles: URIs 
• ~1000s of terms like Actor, birthdate 
– ~10s for most sites 
– Common across sites 
• ~10ks of terms like USA 
– External enumerations 
Chuck Norris 
birthplace 
• ~1b-100b terms like Chuck Norris and Ryan, Oklahama 
– Cannot expect agreement on these 
– Reference by description 
– Consumers can reconcile entity references 
Ryan, Oklahama 
March 10th 1940 
Actor 
type 
citizenOf 
USA 
birthdate 
schema.org
An Actor 
named 
Chuck Norris 
March 10th 1940 
citizenOf 
USA 
birthdate 
A city named Ryan 
In the state OK 
birthplace 
birthdate 
March 10th 1940 
An Actor 
named 
Chuck Norris + 
spouse 
A Person named 
Geena O’Kelley 
= 
Chuck Norris 
USA 
Ryan, Oklahama 
birthplace 
spouse 
March 10th 1940 
Actor 
type 
citizenOf 
birthdate 
Geena O’Kelley 
schema.org
Schema.org Principles: Collaborations 
• Most discussions on public W3C lists 
• Work closely with interest communities 
• Work with others to incorporate their vocabularies 
– We give them attribution on schema.org 
– Webmasters should not have to worry about where each 
piece of the vocabulary came from 
– Webmasters can mix and match vocabs 
schema.org
Schema.org Principles: Collaborations 
• IPTC /NYTimes / Getty with rNews 
• Martin Hepp with Good Relations 
• US Veterans, Whitehouse, Indeed.com with Job Posting 
• Creative Commons with LRMI 
• NIH National Library of Medicine for Medical vocab. 
• Bibextend, Highwire Press for Bibliographic vocabulary 
• Benetech for Accessibility 
• BBC, European Broadcasting Union for TV & Radio schema 
• Stackexchange, SKOS group for message board 
• Lots and lots and lots of individuals 
schema.org
Schema.org Principles: Partners 
• Partner with Authoring platforms 
– Drupal, Wordpress, Blogger, YouTube 
• Drupal 8 
– Schema.org markup for many types 
• News articles, comments, users, events, … 
– More schema.org types can be created by site author 
– Markup in HTML5 & RDFa Lite 
– Will come out early 2015 
schema.org
Recent Additions 
• From Nouns to Verbs: Actions 
– Object  potential actions 
– Constraints on actions 
– E.g., ThorMovie  Stream, Buy, … 
• Introducing time: Roles 
– E.g., Joe Montana played for the SF 49ers from 1979 to 
1992 in the position QuarterBack 
schema.org
Recent Additions 
• Scholarly work, Comics, Serials, … 
• Communications: TV, Radio, Q&A, … 
• Accessibility 
• Commerce: Reservations 
• Sports 
• Buyer/Seller, etc. 
• Bibtex 
• The ontology is growing … 
– ~800 properties 
– ~600 classes 
schema.org
Looking forward 
• Schema.org is doing better than we expected 
– Thanks to millions of webmasters! 
• But this is not the final goal 
– Just the means to the next generation of applications 
• First generation of applications 
– Rich presentation of search results 
• Many new applications 
– Related to search and beyond 
schema.org
Newer Applications: Knowledge Graph 
schema.org
Newer Applications: Knowledge Graph 
schema.org
Non search applications: Google Now 
User profile 
(google.com/now/topics) 
+ 
structured data feeds 
schema.org
Pinterest: Schema.org for Rich Pins 
schema.org
Reservations  Personal Assistant 
• Open Table website  confirmation email  
Android Reminder 
schema.org
Vertical Search 
• Structured data in search 
– Web search: annotate search results 
OR 
– Filtering based on structured data 
• Only in specialized corpus 
• Ecommerce, real estate, etc. 
• How about filtering based on structured data across the web? 
schema.org
Google Rich Snippets: Recipe View 
schema.org
Web scale vertical search 
• Searching for Veteran friendly jobs 
schema.org
Web Scale custom vertical search 
• Build your own custom vertical search engine 
– Google does the heavy lifting: crawling, indexing, etc. 
– You specify the schema.org restricts 
– APIs to help build your own UI 
• Searches over all pages on the web with a certain 
schema.org markup 
• Demo 
schema.org
Scientific Data Publishing 
• US Govt alone spends over $60B/yr on scientific 
research 
• Primary output of most of this research is data 
– Most of the data is thrown away 
– All that is published are papers 
• We would like the data published in a easily reusable 
form 
schema.org
Case study: Clinical Trials 
• Clinical trials 
• 4000+ clinical trials at any time in the US alone 
• Almost all the data ‘thrown away’ 
• All that gets published is a textual ‘abstract’ 
• Many of the trials are redundant 
• Earlier trials have the data 
• Assumptions, etc. cannot be re-examined 
• Longitudinal studies extremely hard, but super important 
• Having all the clinical trial data on the web, in a 
common schema will make this much easier! 
schema.org
Case study: SkyServer 
• Huge amount of astronomy data 
• Jim Gray, NASA and others brought it all together, 
normalized it and made it available on the web 
• Has changed the way astronomy research takes place 
• Students in Africa getting PhDs without leaving Africa! 
• Radio/Ultra-violet/Visible light data easily brought together 
• Caveats 
• SQL biased, not distributed, not scalable 
• All normalization done by hand, once 
• Small number of data sources 
• But shows that it can be done … 
schema.org
First steps for scientific data publication 
• OPTC directive for data from federally funded research to be 
freely available 
• Formation of new ‘Data Science’ institute inside NIH 
• Seeing traction in scientific data on the web 
• Lot of interest in creating schemas 
• Public repositories for scientific data starting 
schema.org
Concluding 
• Structured data on the web is now ‘web scale’ 
• Schema.org has got traction and is evolving 
• The most interesting applications are yet to come 
schema.org
Questions? 
schema.org

Contenu connexe

Tendances

Schema.org - An Extending Influence
Schema.org - An Extending InfluenceSchema.org - An Extending Influence
Schema.org - An Extending InfluenceRichard Wallis
 
LD4L OCLC Data Strategy
LD4L OCLC Data StrategyLD4L OCLC Data Strategy
LD4L OCLC Data StrategyRichard Wallis
 
Web Driven Revolution For Library Data
Web Driven Revolution For Library DataWeb Driven Revolution For Library Data
Web Driven Revolution For Library DataRichard Wallis
 
Schema.org: Where did that come from!
Schema.org: Where did that come from!Schema.org: Where did that come from!
Schema.org: Where did that come from!Richard Wallis
 
Contextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data FoundationContextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data FoundationRichard Wallis
 
Microdata for Dummies
Microdata for DummiesMicrodata for Dummies
Microdata for Dummiesgiurca
 
WorldCat, Works, and Schema.org
WorldCat, Works, and Schema.orgWorldCat, Works, and Schema.org
WorldCat, Works, and Schema.orgRichard Wallis
 
The Web of Data is Our Opportunity
The Web of Data is Our OpportunityThe Web of Data is Our Opportunity
The Web of Data is Our OpportunityRichard Wallis
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in LibrariesRichard Wallis
 
Schema.org where did that come from?
Schema.org where did that come from?Schema.org where did that come from?
Schema.org where did that come from?Richard Wallis
 
Using schema.org to improve SEO
Using schema.org to improve SEOUsing schema.org to improve SEO
Using schema.org to improve SEOscorlosquet
 
Entification: The Route to 'Useful' Library Data
Entification: The Route to 'Useful' Library DataEntification: The Route to 'Useful' Library Data
Entification: The Route to 'Useful' Library DataRichard Wallis
 
Linked data for Ebook discovery
Linked data for Ebook discoveryLinked data for Ebook discovery
Linked data for Ebook discoveryRichard Wallis
 
Designing Linked Data Software & Services for Libraries
Designing Linked Data Software & Services for LibrariesDesigning Linked Data Software & Services for Libraries
Designing Linked Data Software & Services for LibrariesRichard Wallis
 
They have left the building: The Web Route to Library Users
They have left the building: The Web Route to Library UsersThey have left the building: The Web Route to Library Users
They have left the building: The Web Route to Library UsersRichard Wallis
 
Linked Data Challenge and Opportunity
Linked Data Challenge and OpportunityLinked Data Challenge and Opportunity
Linked Data Challenge and OpportunityRichard Wallis
 
Structured Data: It's All About the Graph!
Structured Data: It's All About the Graph!Structured Data: It's All About the Graph!
Structured Data: It's All About the Graph!Richard Wallis
 
Microformats I: What & Why
Microformats I: What & WhyMicroformats I: What & Why
Microformats I: What & WhyRachael L Moore
 
Creating Web APIs with JSON-LD and RDF
Creating Web APIs with JSON-LD and RDFCreating Web APIs with JSON-LD and RDF
Creating Web APIs with JSON-LD and RDFdonaldlsmithjr
 

Tendances (20)

Schema.org - An Extending Influence
Schema.org - An Extending InfluenceSchema.org - An Extending Influence
Schema.org - An Extending Influence
 
LD4L OCLC Data Strategy
LD4L OCLC Data StrategyLD4L OCLC Data Strategy
LD4L OCLC Data Strategy
 
Web Driven Revolution For Library Data
Web Driven Revolution For Library DataWeb Driven Revolution For Library Data
Web Driven Revolution For Library Data
 
Schema.org: Where did that come from!
Schema.org: Where did that come from!Schema.org: Where did that come from!
Schema.org: Where did that come from!
 
Contextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data FoundationContextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data Foundation
 
Microdata for Dummies
Microdata for DummiesMicrodata for Dummies
Microdata for Dummies
 
WorldCat, Works, and Schema.org
WorldCat, Works, and Schema.orgWorldCat, Works, and Schema.org
WorldCat, Works, and Schema.org
 
Linked Data and OCLC
Linked Data and OCLCLinked Data and OCLC
Linked Data and OCLC
 
The Web of Data is Our Opportunity
The Web of Data is Our OpportunityThe Web of Data is Our Opportunity
The Web of Data is Our Opportunity
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in Libraries
 
Schema.org where did that come from?
Schema.org where did that come from?Schema.org where did that come from?
Schema.org where did that come from?
 
Using schema.org to improve SEO
Using schema.org to improve SEOUsing schema.org to improve SEO
Using schema.org to improve SEO
 
Entification: The Route to 'Useful' Library Data
Entification: The Route to 'Useful' Library DataEntification: The Route to 'Useful' Library Data
Entification: The Route to 'Useful' Library Data
 
Linked data for Ebook discovery
Linked data for Ebook discoveryLinked data for Ebook discovery
Linked data for Ebook discovery
 
Designing Linked Data Software & Services for Libraries
Designing Linked Data Software & Services for LibrariesDesigning Linked Data Software & Services for Libraries
Designing Linked Data Software & Services for Libraries
 
They have left the building: The Web Route to Library Users
They have left the building: The Web Route to Library UsersThey have left the building: The Web Route to Library Users
They have left the building: The Web Route to Library Users
 
Linked Data Challenge and Opportunity
Linked Data Challenge and OpportunityLinked Data Challenge and Opportunity
Linked Data Challenge and Opportunity
 
Structured Data: It's All About the Graph!
Structured Data: It's All About the Graph!Structured Data: It's All About the Graph!
Structured Data: It's All About the Graph!
 
Microformats I: What & Why
Microformats I: What & WhyMicroformats I: What & Why
Microformats I: What & Why
 
Creating Web APIs with JSON-LD and RDF
Creating Web APIs with JSON-LD and RDFCreating Web APIs with JSON-LD and RDF
Creating Web APIs with JSON-LD and RDF
 

En vedette

Kdd 2014 tutorial bringing structure to text - chi
Kdd 2014 tutorial   bringing structure to text - chiKdd 2014 tutorial   bringing structure to text - chi
Kdd 2014 tutorial bringing structure to text - chiBarbara Starr
 
SEO y Web Semántica en Congreso Web
SEO y Web Semántica en Congreso WebSEO y Web Semántica en Congreso Web
SEO y Web Semántica en Congreso WebLakil Essady
 
Neuromarketing aplicado a la web
Neuromarketing aplicado a la webNeuromarketing aplicado a la web
Neuromarketing aplicado a la webNatzir Turrado
 
Reputación on line en buscadores. Propuesta metodológica para empresas
Reputación on line en buscadores. Propuesta metodológica para empresasReputación on line en buscadores. Propuesta metodológica para empresas
Reputación on line en buscadores. Propuesta metodológica para empresasEsther Checa
 
Cómo gestionar el Brand Search Multipantalla con SEO
Cómo gestionar el Brand Search Multipantalla con SEOCómo gestionar el Brand Search Multipantalla con SEO
Cómo gestionar el Brand Search Multipantalla con SEOEsther Checa
 
Gestion de la Reputacion online multidispositivo en Buscadores para empresas ...
Gestion de la Reputacion online multidispositivo en Buscadores para empresas ...Gestion de la Reputacion online multidispositivo en Buscadores para empresas ...
Gestion de la Reputacion online multidispositivo en Buscadores para empresas ...Esther Checa
 

En vedette (6)

Kdd 2014 tutorial bringing structure to text - chi
Kdd 2014 tutorial   bringing structure to text - chiKdd 2014 tutorial   bringing structure to text - chi
Kdd 2014 tutorial bringing structure to text - chi
 
SEO y Web Semántica en Congreso Web
SEO y Web Semántica en Congreso WebSEO y Web Semántica en Congreso Web
SEO y Web Semántica en Congreso Web
 
Neuromarketing aplicado a la web
Neuromarketing aplicado a la webNeuromarketing aplicado a la web
Neuromarketing aplicado a la web
 
Reputación on line en buscadores. Propuesta metodológica para empresas
Reputación on line en buscadores. Propuesta metodológica para empresasReputación on line en buscadores. Propuesta metodológica para empresas
Reputación on line en buscadores. Propuesta metodológica para empresas
 
Cómo gestionar el Brand Search Multipantalla con SEO
Cómo gestionar el Brand Search Multipantalla con SEOCómo gestionar el Brand Search Multipantalla con SEO
Cómo gestionar el Brand Search Multipantalla con SEO
 
Gestion de la Reputacion online multidispositivo en Buscadores para empresas ...
Gestion de la Reputacion online multidispositivo en Buscadores para empresas ...Gestion de la Reputacion online multidispositivo en Buscadores para empresas ...
Gestion de la Reputacion online multidispositivo en Buscadores para empresas ...
 

Similaire à Semantic Web and Schema.org

Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Rahul Jain
 
CILIP Conference - x metadata evolution the final mile - Richard Wallis
CILIP Conference - x metadata evolution the final mile - Richard WallisCILIP Conference - x metadata evolution the final mile - Richard Wallis
CILIP Conference - x metadata evolution the final mile - Richard WallisCILIP
 
Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overviewAmit Sheth
 
Schema.org Structured data the What, Why, & How
Schema.org Structured data the What, Why, & HowSchema.org Structured data the What, Why, & How
Schema.org Structured data the What, Why, & HowRichard Wallis
 
A review of the state of the art in Machine Learning on the Semantic Web
A review of the state of the art in Machine Learning on the Semantic WebA review of the state of the art in Machine Learning on the Semantic Web
A review of the state of the art in Machine Learning on the Semantic WebSimon Price
 
Large-Scale Semantic Search
Large-Scale Semantic SearchLarge-Scale Semantic Search
Large-Scale Semantic SearchRoi Blanco
 
Creating an Open Source Genealogical Search Engine with Apache Solr
Creating an Open Source Genealogical Search Engine with Apache SolrCreating an Open Source Genealogical Search Engine with Apache Solr
Creating an Open Source Genealogical Search Engine with Apache SolrBrooke Ganz
 
ISWC GoodRelations Tutorial Part 2
ISWC GoodRelations Tutorial Part 2ISWC GoodRelations Tutorial Part 2
ISWC GoodRelations Tutorial Part 2Martin Hepp
 
GoodRelations Tutorial Part 2
GoodRelations Tutorial Part 2GoodRelations Tutorial Part 2
GoodRelations Tutorial Part 2guestecacad2
 
From Ambition to Go Live SWIB.pdf
From Ambition to Go Live SWIB.pdfFrom Ambition to Go Live SWIB.pdf
From Ambition to Go Live SWIB.pdfRichardWallis3
 
From Ambition to Go Live
From Ambition to Go LiveFrom Ambition to Go Live
From Ambition to Go LiveRichard Wallis
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Lucidworks
 
Mining Web content for Enhanced Search
Mining Web content for Enhanced Search Mining Web content for Enhanced Search
Mining Web content for Enhanced Search Roi Blanco
 
Halko_santafe_2015
Halko_santafe_2015Halko_santafe_2015
Halko_santafe_2015Nathan Halko
 
What happened to the Semantic Web?
What happened to the Semantic Web?What happened to the Semantic Web?
What happened to the Semantic Web?Peter Mika
 
Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Peter Mika
 

Similaire à Semantic Web and Schema.org (20)

NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Imp...
NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Imp...NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Imp...
NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Imp...
 
Haifa
HaifaHaifa
Haifa
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )
 
CILIP Conference - x metadata evolution the final mile - Richard Wallis
CILIP Conference - x metadata evolution the final mile - Richard WallisCILIP Conference - x metadata evolution the final mile - Richard Wallis
CILIP Conference - x metadata evolution the final mile - Richard Wallis
 
Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overview
 
Schema.org Structured data the What, Why, & How
Schema.org Structured data the What, Why, & HowSchema.org Structured data the What, Why, & How
Schema.org Structured data the What, Why, & How
 
A review of the state of the art in Machine Learning on the Semantic Web
A review of the state of the art in Machine Learning on the Semantic WebA review of the state of the art in Machine Learning on the Semantic Web
A review of the state of the art in Machine Learning on the Semantic Web
 
Semantic web
Semantic webSemantic web
Semantic web
 
Large-Scale Semantic Search
Large-Scale Semantic SearchLarge-Scale Semantic Search
Large-Scale Semantic Search
 
Ir1
Ir1Ir1
Ir1
 
Creating an Open Source Genealogical Search Engine with Apache Solr
Creating an Open Source Genealogical Search Engine with Apache SolrCreating an Open Source Genealogical Search Engine with Apache Solr
Creating an Open Source Genealogical Search Engine with Apache Solr
 
ISWC GoodRelations Tutorial Part 2
ISWC GoodRelations Tutorial Part 2ISWC GoodRelations Tutorial Part 2
ISWC GoodRelations Tutorial Part 2
 
GoodRelations Tutorial Part 2
GoodRelations Tutorial Part 2GoodRelations Tutorial Part 2
GoodRelations Tutorial Part 2
 
From Ambition to Go Live SWIB.pdf
From Ambition to Go Live SWIB.pdfFrom Ambition to Go Live SWIB.pdf
From Ambition to Go Live SWIB.pdf
 
From Ambition to Go Live
From Ambition to Go LiveFrom Ambition to Go Live
From Ambition to Go Live
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
 
Mining Web content for Enhanced Search
Mining Web content for Enhanced Search Mining Web content for Enhanced Search
Mining Web content for Enhanced Search
 
Halko_santafe_2015
Halko_santafe_2015Halko_santafe_2015
Halko_santafe_2015
 
What happened to the Semantic Web?
What happened to the Semantic Web?What happened to the Semantic Web?
What happened to the Semantic Web?
 
Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012
 

Semantic Web and Schema.org

  • 1. What a long, strange trip it’s been R.V.Guha Google schema.org
  • 2. Outline of talk • The context – How did we end up where we are • Schema.org – What it is, status of adoption – Schema.org principles, how does it work • Looking ahead – Next Generation Applications schema.org
  • 3. About 18 years ago, … • People started thinking about structured data on the web – A few people from Netscape, Microsoft and W3C got together @MIT • Trying to make sense of a flurry of activity/proposals – XML, MCF, CDF, Sitemaps, … • There were a number of problems – PICS, Meta data, sitemaps, … • But one unifying idea schema.org
  • 4. Context: The Web for humans Structured Data Web server HTML schema.org
  • 5. Goal: Web for Machines & Humans Structured Data Web server Apps schema.org
  • 6. What does that mean? birthplace Chuck Norris Ryan, Oklahama birthdate March 10th 1940 Actor type - Notable points - Graph Data Model - Common Vocabulary schema.org
  • 7. How do we get there? • How does the author give us the graph – Data Model: Graph vs tree vs … – Syntax – Vocabulary – Identifiers for objects • Why should the author give us the graph? schema.org
  • 8. Going depth first • Many heated battles – Lot of proposals, standards, companies, … • Data model – Trees vs DLGs vs Vertical specific vs who needs one? • Syntax – XML vs RDF vs json vs … • Model theory anyone – We need one vs who cares vs what’s that? schema.org
  • 9. Timeline of ‘standards’ • ‘96: Meta Content Framework (MCF) (Apple) • ’97: MCF using XML (Netscape)  RDF, CDF • ’99 -- : RDF, RDFS • ’01 -- : DAML, OWL, OWL EL, OWL QL, OWL RL • ’03: Microformats • And many many many more … SPARQL, Turtle, N3, GRDDL, R2RML, FOAF, SIOC, SKOS, … • Lots of bells & whistles: model theory, inference, type systems, … schema.org
  • 10. But something was missing … • Fewer than 1000 sites were using these standards • Something was clearly missing and it wasn’t more language features • We had forgotten the ‘Why’ part of the problem • The RSS story schema.org
  • 11. ’07 - :Rise of the consumers • Yahoo! Search Monkey, Google Rich Snippets, Facebook Open Graph • Offer webmasters a simple value proposition • Search engines to webmasters: – You give us data … we make your results nicer • Usage begins to take off – 1000x increase in markup’ed up pages in 3 years schema.org
  • 12. Yahoo Search Monkey • Give websites control over snippet presentation • Moderate adoption – Targeted at high end developers – Too many choices schema.org
  • 13. Google Rich Snippets: Reviews schema.org
  • 14. Google Rich Snippets: Events schema.org
  • 15. Google Rich Snippets • Multi-syntax • Adhoc vocabulary for each vertical • Very clear carrot • Lots of experimentation on UI • Moderately successful: 10ks of sites • Scaling issues with vocabulary schema.org
  • 16. Situation in 2010 • Too many choices/decisions for webmasters – Divergence in vocabularies • Too much fragmentation • N versions of person, address, … • A lot of bad/wrong markup – ~25% for micro-formats, ~40% with RDFA – Some spam, mostly unintended mistakes • Absolute adoption numbers still rather low – Less than 100k sites schema.org
  • 17. Schema.org • Work started in August 2010 – Google, Yahoo!, Microsoft & then Yandex • Goals: – One vocabulary understood by all the search engines – Make it very easy for the webmaster • It is A vocabulary. Not The vocabulary. – Webmasters can use it together other vocabs – We might not understand the other vocabs. Others might schema.org
  • 18. Schema.org: Major sites • News: Nytimes, guardian.com, bbc.co.uk, • Movies: imdb, rottentomatoes, movies.com • Jobs / careers: careerjet.com, monster.com, indeed.com • People: linkedin.com, • Products: ebay.com, alibaba.com, sears.com, cafepress.com, sulit.com, fotolia.com • Videos: youtube, dailymotion, frequency.com, vinebox.com • Medical: cvs.com, drugs.com • Local: yelp.com, allmenus.com, urbanspoon.com • Events: wherevent.com, meetup.com, zillow.com, eventful • Music: last.fm, myspace.com, soundcloud.com schema.org
  • 19. Schema.org principles: Simplicity • Simple things should be simple – For webmasters, not necessarily for consumers of markup – Webmasters shouldn’t have to deal with N namespaces • Complex things should be possible – Advanced webmasters should be able to mix and match vocabularies • Syntax – Microdata, usability studies – RDFa, json-ld, … schema.org
  • 20. Schema.org principles: Simplicity • Can’t expect webmasters to understand Knowledge Representation, Semantic Web Query Languages, etc. • It has to fit in with existing workflows – A posteriori ‘markup tools’ don’t work • Avoid KR system driven artifacts – Multiple domain / range for attributes – No classes like ‘Agent’ – Categories and attributes should be concrete schema.org
  • 21. Schema.org principles: Simplicity • Copy and edit as the default mode for authors – It is not a linear spec, but a tree of examples • Vocabularies – Authors only need to have local view – But schema.org tries to have a single global coherent vocabulary schema.org
  • 22. Schema.org principles: Incremental • Started simple – ~ 100 categories at launch • Applies to every area – Add complexity after adoption – now ~1200 vocab items – Go back and fill in the blanks • Move fast, accept mistakes, iterate fast schema.org
  • 23. Schema.org Principles: URIs • ~1000s of terms like Actor, birthdate – ~10s for most sites – Common across sites • ~10ks of terms like USA – External enumerations Chuck Norris birthplace • ~1b-100b terms like Chuck Norris and Ryan, Oklahama – Cannot expect agreement on these – Reference by description – Consumers can reconcile entity references Ryan, Oklahama March 10th 1940 Actor type citizenOf USA birthdate schema.org
  • 24. An Actor named Chuck Norris March 10th 1940 citizenOf USA birthdate A city named Ryan In the state OK birthplace birthdate March 10th 1940 An Actor named Chuck Norris + spouse A Person named Geena O’Kelley = Chuck Norris USA Ryan, Oklahama birthplace spouse March 10th 1940 Actor type citizenOf birthdate Geena O’Kelley schema.org
  • 25. Schema.org Principles: Collaborations • Most discussions on public W3C lists • Work closely with interest communities • Work with others to incorporate their vocabularies – We give them attribution on schema.org – Webmasters should not have to worry about where each piece of the vocabulary came from – Webmasters can mix and match vocabs schema.org
  • 26. Schema.org Principles: Collaborations • IPTC /NYTimes / Getty with rNews • Martin Hepp with Good Relations • US Veterans, Whitehouse, Indeed.com with Job Posting • Creative Commons with LRMI • NIH National Library of Medicine for Medical vocab. • Bibextend, Highwire Press for Bibliographic vocabulary • Benetech for Accessibility • BBC, European Broadcasting Union for TV & Radio schema • Stackexchange, SKOS group for message board • Lots and lots and lots of individuals schema.org
  • 27. Schema.org Principles: Partners • Partner with Authoring platforms – Drupal, Wordpress, Blogger, YouTube • Drupal 8 – Schema.org markup for many types • News articles, comments, users, events, … – More schema.org types can be created by site author – Markup in HTML5 & RDFa Lite – Will come out early 2015 schema.org
  • 28. Recent Additions • From Nouns to Verbs: Actions – Object  potential actions – Constraints on actions – E.g., ThorMovie  Stream, Buy, … • Introducing time: Roles – E.g., Joe Montana played for the SF 49ers from 1979 to 1992 in the position QuarterBack schema.org
  • 29. Recent Additions • Scholarly work, Comics, Serials, … • Communications: TV, Radio, Q&A, … • Accessibility • Commerce: Reservations • Sports • Buyer/Seller, etc. • Bibtex • The ontology is growing … – ~800 properties – ~600 classes schema.org
  • 30. Looking forward • Schema.org is doing better than we expected – Thanks to millions of webmasters! • But this is not the final goal – Just the means to the next generation of applications • First generation of applications – Rich presentation of search results • Many new applications – Related to search and beyond schema.org
  • 31. Newer Applications: Knowledge Graph schema.org
  • 32. Newer Applications: Knowledge Graph schema.org
  • 33. Non search applications: Google Now User profile (google.com/now/topics) + structured data feeds schema.org
  • 34. Pinterest: Schema.org for Rich Pins schema.org
  • 35. Reservations  Personal Assistant • Open Table website  confirmation email  Android Reminder schema.org
  • 36. Vertical Search • Structured data in search – Web search: annotate search results OR – Filtering based on structured data • Only in specialized corpus • Ecommerce, real estate, etc. • How about filtering based on structured data across the web? schema.org
  • 37. Google Rich Snippets: Recipe View schema.org
  • 38. Web scale vertical search • Searching for Veteran friendly jobs schema.org
  • 39. Web Scale custom vertical search • Build your own custom vertical search engine – Google does the heavy lifting: crawling, indexing, etc. – You specify the schema.org restricts – APIs to help build your own UI • Searches over all pages on the web with a certain schema.org markup • Demo schema.org
  • 40. Scientific Data Publishing • US Govt alone spends over $60B/yr on scientific research • Primary output of most of this research is data – Most of the data is thrown away – All that is published are papers • We would like the data published in a easily reusable form schema.org
  • 41. Case study: Clinical Trials • Clinical trials • 4000+ clinical trials at any time in the US alone • Almost all the data ‘thrown away’ • All that gets published is a textual ‘abstract’ • Many of the trials are redundant • Earlier trials have the data • Assumptions, etc. cannot be re-examined • Longitudinal studies extremely hard, but super important • Having all the clinical trial data on the web, in a common schema will make this much easier! schema.org
  • 42. Case study: SkyServer • Huge amount of astronomy data • Jim Gray, NASA and others brought it all together, normalized it and made it available on the web • Has changed the way astronomy research takes place • Students in Africa getting PhDs without leaving Africa! • Radio/Ultra-violet/Visible light data easily brought together • Caveats • SQL biased, not distributed, not scalable • All normalization done by hand, once • Small number of data sources • But shows that it can be done … schema.org
  • 43. First steps for scientific data publication • OPTC directive for data from federally funded research to be freely available • Formation of new ‘Data Science’ institute inside NIH • Seeing traction in scientific data on the web • Lot of interest in creating schemas • Public repositories for scientific data starting schema.org
  • 44. Concluding • Structured data on the web is now ‘web scale’ • Schema.org has got traction and is evolving • The most interesting applications are yet to come schema.org