SlideShare une entreprise Scribd logo
1  sur  27
A Web-scale Study of the Adoption and
Evolution of the schema.org Vocabulary
over Time
Robert Meusel, Christian Bizer and
Heiko Paulheim
2
Motivation - LOD Cloud with 1.000 data providers
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
3
Motivation - schema.org MD with 700k data providers
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
4
Microdata in a Nutshell
 Adding structured information to web pages
• By marking up contents and entities
 Arbitrary vocabularies are possible
• Practically, only schema.org is deployed on a large scale
• Plus its historical predecessor: data-vocabulary.org
 Similar to RDFa
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
<div itemscope itemtype="http://schema.org/PostalAddress">
<span itemprop="name">Data and Web Science Group</span>
<span itemprop="addressLocality">Mannheim</span>,
<span itemprop="postalCode">68131</span>
<span itemprop="addressCountry">Germany</span>
</div>
5
Schema.org in a Nutshell
 Vocabulary for marking up entities on web pages
• 675 classes and 965 properties (as of May 2015, release 2.0)
 Promoted and consumes by major search engine companies
• Google, Bing, Yahoo!, and Yandex
• Google Rich Snippets
 Community-driven
evolution and
development
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
6
Schema.org in a Nutshell – Coverage
 Schema.org has incorporated some popular vocabularies, like:
• Good Relations (2012)
• W3C BibExtend (2014)
• MusicBrainz vocabulary (2015)
• Automotive Ontology (2015)
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
7
Microdata with Schema.org in HTML Pages
<html>
…
<body>
…
<div id="main-section" class="performance left" data-
sku="M17242_580“>
<h1> Predator Instinct FG Fußballschuh
</h1>
<div>
<meta content="EUR">
<span
data-sale-price="219.95">219,95</span>
…
</body>
</html>
HTML pages embed directly
markup languages to annotate
items using different vocabularies
<html>
…
<body>
…
<div id="main-section" class="performance left" data-
sku="M17242_580" itemscope
itemtype="http://schema.org/Product">
<h1 itemprop="name"> Predator Instinct FG Fußballschuh
</h1>
<div itemscope itemtype="http://schema.org/Offer"
itemprop="offers">
<meta itemprop="priceCurrency" content="EUR">
<span itemprop="price" data-sale-
price="219.95">219,95</span>
…
</body>
</html>
1._:node1 <http://www.w3.org/1999/02/22-rdf-syntax-
ns#type> <http://schema.org/Product> .
2._:node1 <http://schema.org/Product/name> "Predator
Instinct FG Fußballschuh"@de .
3._:node1 <http://www.w3.org/1999/02/22-rdf-syntax-
ns#type> <http://schema.org/Offer> .
4._:node1 <http://schema.org/Offer/price>
"219,95"@de .
5._:node1 <http://schema.org/Offer/priceCurrency>
"EUR" .
6.…
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
8
Wrap-Up
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
 Semantic annotations are used by more and more websites
 Entities on websites become machine-readable and machine-
understandable
 schema.org together with Microdata is a success story
• Promoted by search engine companies
• Deployed by over 17% of all websites [1] (over 700k data providers)
 Usage is more compliant to the schema than e.g. LOD [2]
[1] http://webdatacommons.org/structureddata/2014-12/stats/stats.html
[2] Meusel and Paulheim, ESWC 2015
9
Digging for Reasons
 So, Microdata is more often deployed and is often more
schema compliant, although there are millions of uncontrolled
providers with different skill sets
 But why? Some hypotheses…
• Availability of documentation
• Tool support
• Business incentive
• Schema flexibility
 Can we confirm/reject those from looking at the data?
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
10
A Diachronic Perspective
 Versions of schema.org are archived over time
• Plus: there are several crawl releases per year
• i.e., we can look at change over time
 If we look at both schema and deployed data, we may observe
• Adoption rates of schema changes
• Data-first changes to the schema
• Convergence or divergence of deployed data
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
11
A Diachronic Perspective
 Three releases of WDC Microdata corpus [1]
• 2012, 2013, and 2014
 Versions of schema.org that were valid
• At the beginning of the crawl
• At the end of the crawl
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
[1] http://webdatacommons.org/structureddata
12
Top-down Adoption
 How fast are changes in the schema adopted?
• New classes/properties
• Deprecations
• Domain/range changes
 Measuring adoption: challenges
• Different crawls
• Overall growth of deployed schema.org
 Measure: normalized usage increase (nui) from i to j:
• nui(s)>1.05: usage of schema element s has increased significantly
• nui(s)<0.95: usage of schema element s has decreased significantly
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
13
Top-down Adoption
 Adoption of new classes and properties
• Almost half of all introduced classes are never used!
• Similar for new properties
 Reasons
• Bulk-addition of vocabularies
• not every term is equally needed
• e.g., medical vocabulary
• Blind spot of our approach
• some terms are mainly for e-mail markup
• e.g., Actions
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
SURPRISE!
14
Top-down Adoption
 Main domains of positive adoption
• Meta data for web content
(schema.org/Website has the highest nui)
• Broadcasting (e.g., TV Episodes)
• Questions & Answers
• Postal addresses
 Classes featured in Google Rich Snippets
• Still growth on high level (tens of thousands of data providers)
• But nui(s)<0.95
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
Yellow Pages
Search Engine Listings
Collaboration
with BBC and EBU
Influence of CMS adoption
Q&A Pages, such as
Stackoverflow
15
Top-down Adoption
 Adoption of domain/range changes
• Again: rather low overall adoption
 Adopted well for
• Products (height, width, itemCondition, …)
• Broadcasting domain (episode, actor, ...)
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
Search Engine Listings
Collaboration
with BBC and EBU
16
Top-down Adoption
 Adoption of deprecations
• Works well (29 out of 32 have a significantly low nui)
 Exceptions
• s:map (← s:hasMap)
• s:maps (← s:hasMap)
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
For Google Maps
(lots of outdated tutorials)
17
Bottom-up Evolution
 Martin Luther
• Started the protestant church
• A success story, too (like schema.org)
• (i.e., 800 million adopters worldwide)
 Famous quote:
• “Man muss […] dem gemeinen Mann aufs Maul schauen”
• (roughly:
“You have to listen to the way the common man really speaks.”)
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
Martin Luther,
1483-1546
Disclaimer:
I do not speak for the
protestant church.
18
Bottom-up Evolution
 Are new features in the schema first used “inofficially”?
• New classes/properties
• Domain/range changes
 Instrument for measurement: ROC curves
• True positives mapped against false positives
• tp: elements used before
• fp: elements not used before
• Ranking by #PLDs
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
19
Bottom-up Evolution
 There are some mild influences observable
• Stronger for domain/range changes
• especially range changes
• Weaker for new classes/properties
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
2012→ 2013 2013→ 2014 2012→ 2014
classes properties domains ranges
20
Bottom-up Evolution
 Extension mechanism
• Allows for user-defined classes/properties
• Those become subclasses implicitly
 Analysis over time
• No measurable impact on standard evolution
• “Inofficial” use is likelier than use of extension mechanism
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
s:Product/ElectronicProduct
s:price/reducedPrice
21
Overall Convergence
 Measuring convergence
• i.e., homogeneity of descriptions of classes
• Example: two instances of s:LocalBusiness
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
_:1
_:2 “Birmingham”
“Main Street 24”
s:LocalBusiness
s:PostalAddress _:1
_:2 “Liverpool”
“Church Street 1”
s:LocalBusiness
s:PostalAddress
22
Overall Convergence
 Recap
• RDF from Microdata is a set of trees
• i.e., we can enumerate all paths to leaf nodes
(omitting literals)
 Example:
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
_:1
_:2 “Liverpool”
“Church Street 1”
s:LocalBusiness
s:PostalAddress
rdf:type-s:LocalBusiness,
s:address-rdf:type-s:PostalAddress,
s:address-s:addressLocality,
s:address-s:streetAddress
23
Overall Convergence
 Using all paths, we can compute the entropy for each class as
 A low entropy refers to a high homogeneity
 We normalize both by maximum entropy
and the total number of paths
• i.e., we use normalized entropy rate as a measure for homogeneity
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
24
Overall Convergence
 Observations
• Overall entropy decreases over time
 Classes with high convergence rates
• WebSite, Blog, …
• Hotel, Restaurant, …
• Product, Offer, …
• Rating, Review
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
Influence of CMS adoption
Yellow pages
Google Rich Snippets
...all of the above
25
Key Adoption Drivers
 Search Engine Optimization
• Web site providers want to be high in Google rankings
• Direct business incentive!
 Tool adoption
• Major CMSs use schema.org
 Standard Agility
• schema.org: 25 revisions in last three years
• cf. FOAF: six revisions in last eight years
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
26
Summary
 Both ways, top-down and bottom-up adoptions can be
observed
 Homogeneity of deployed schema increase over time
 Described empirical data-driven study reveals valuable insights
to understand how and why schema.org is a success story
 Observed key drivers and obstacles can also help to understand
and analysis adoption of other standards, e.g. LOD
 More fine-grained insights might be revealed when extending
the analysis corpus to the mailing list archive and issue tracker
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
27
Thank you! Questions? Feedback?
Raw data can be found on the website of WebDataCommons:
http://webdatacommons.org/structureddata/
More interesting datasets and analysis:
http://webdatacommons.org/index.html
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
Acknowledgement
The extraction and analysis of the datasets was supported
by AWS in Education Grant.

Contenu connexe

Tendances

Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Globus
 
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...Data Beers
 
The RDF Report Card: Beyond the Triple Count
The RDF Report Card: Beyond the Triple CountThe RDF Report Card: Beyond the Triple Count
The RDF Report Card: Beyond the Triple CountLeigh Dodds
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceGlobus
 
Globus: Enabling the Open Storage Network
Globus: Enabling the Open Storage NetworkGlobus: Enabling the Open Storage Network
Globus: Enabling the Open Storage NetworkGlobus
 
Grid Computing July 2009
Grid Computing July 2009Grid Computing July 2009
Grid Computing July 2009Ian Foster
 
Analytics and Access to the UK web archive
Analytics and Access to the UK web archiveAnalytics and Access to the UK web archive
Analytics and Access to the UK web archiveLewis Crawford
 
The Modern Palimpsest
The Modern PalimpsestThe Modern Palimpsest
The Modern PalimpsestLeigh Dodds
 
Mining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerMining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerHeiko Paulheim
 
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?Martin Hepp
 
Scalable Web Data Management using RDF
Scalable Web Data Management using RDF  Scalable Web Data Management using RDF
Scalable Web Data Management using RDF Navid Sedighpour
 
Extending Tables with Data from over a Million Websites
 Extending Tables with Data from over a Million Websites Extending Tables with Data from over a Million Websites
Extending Tables with Data from over a Million WebsitesChris Bizer
 
20090701 Climate Data Staging
20090701 Climate Data Staging20090701 Climate Data Staging
20090701 Climate Data StagingHenning Bergmeyer
 
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Informationballoon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference InformationKai Schlegel
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataGiorgos Santipantakis
 
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open DataMuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data21Style
 

Tendances (18)

Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)
 
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
 
The RDF Report Card: Beyond the Triple Count
The RDF Report Card: Beyond the Triple CountThe RDF Report Card: Beyond the Triple Count
The RDF Report Card: Beyond the Triple Count
 
Shawn-Averkamp-feb25
Shawn-Averkamp-feb25Shawn-Averkamp-feb25
Shawn-Averkamp-feb25
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials Science
 
SomeSlides
SomeSlidesSomeSlides
SomeSlides
 
Globus: Enabling the Open Storage Network
Globus: Enabling the Open Storage NetworkGlobus: Enabling the Open Storage Network
Globus: Enabling the Open Storage Network
 
Grid Computing July 2009
Grid Computing July 2009Grid Computing July 2009
Grid Computing July 2009
 
Analytics and Access to the UK web archive
Analytics and Access to the UK web archiveAnalytics and Access to the UK web archive
Analytics and Access to the UK web archive
 
The Modern Palimpsest
The Modern PalimpsestThe Modern Palimpsest
The Modern Palimpsest
 
Mining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerMining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMiner
 
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
 
Scalable Web Data Management using RDF
Scalable Web Data Management using RDF  Scalable Web Data Management using RDF
Scalable Web Data Management using RDF
 
Extending Tables with Data from over a Million Websites
 Extending Tables with Data from over a Million Websites Extending Tables with Data from over a Million Websites
Extending Tables with Data from over a Million Websites
 
20090701 Climate Data Staging
20090701 Climate Data Staging20090701 Climate Data Staging
20090701 Climate Data Staging
 
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Informationballoon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open DataMuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
 

Similaire à A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time

The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014Robert Meusel
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...MakoLab SA
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org sopekmir
 
Scalability andefficiencypres
Scalability andefficiencypresScalability andefficiencypres
Scalability andefficiencypresNekoGato
 
How to Optimize Your Drupal Site with Structured Content
How to Optimize Your Drupal Site with Structured ContentHow to Optimize Your Drupal Site with Structured Content
How to Optimize Your Drupal Site with Structured ContentAcquia
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commonsJesse Wang
 
Schema.org where did that come from?
Schema.org where did that come from?Schema.org where did that come from?
Schema.org where did that come from?Richard Wallis
 
How city of chicago boosts their sap business objects environment prepares fo...
How city of chicago boosts their sap business objects environment prepares fo...How city of chicago boosts their sap business objects environment prepares fo...
How city of chicago boosts their sap business objects environment prepares fo...Sebastien Goiffon
 
A possible future role of schema.org for business reporting
A possible future role of schema.org for business reportingA possible future role of schema.org for business reporting
A possible future role of schema.org for business reportingsopekmir
 
Implimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled TechnologyImplimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled TechnologyIndiana Online Users Group
 
Accelerating Delivery of Data Products - The EBSCO Way
Accelerating Delivery of Data Products - The EBSCO WayAccelerating Delivery of Data Products - The EBSCO Way
Accelerating Delivery of Data Products - The EBSCO WayMongoDB
 
Telling the World and Our Users What We Have
Telling the World and Our Users What We HaveTelling the World and Our Users What We Have
Telling the World and Our Users What We HaveRichard Wallis
 
Leveraging SKOS to trace the overhaul of the STW Thesaurus for Economics
Leveraging SKOS to trace the overhaul of the STW Thesaurus for EconomicsLeveraging SKOS to trace the overhaul of the STW Thesaurus for Economics
Leveraging SKOS to trace the overhaul of the STW Thesaurus for EconomicsJoachim Neubert
 
Intern Project Showcase.pptx
Intern Project Showcase.pptxIntern Project Showcase.pptx
Intern Project Showcase.pptxritikgarg48
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Rahul Jain
 
Hypermedia System Architecture for a Web of Things
Hypermedia System Architecture for a Web of ThingsHypermedia System Architecture for a Web of Things
Hypermedia System Architecture for a Web of ThingsMichael Koster
 
Case study: Life Cycle Management for SAP BusinessObjects platform as well as...
Case study: Life Cycle Management for SAP BusinessObjects platform as well as...Case study: Life Cycle Management for SAP BusinessObjects platform as well as...
Case study: Life Cycle Management for SAP BusinessObjects platform as well as...Sebastien Goiffon
 
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
[DSC DACH 23] The Modern Data Stack - Bogdan PirvuDataScienceConferenc1
 
Monoliths, Migrations, and Microservices
Monoliths, Migrations, and MicroservicesMonoliths, Migrations, and Microservices
Monoliths, Migrations, and MicroservicesRandy Shoup
 

Similaire à A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time (20)

The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org
 
Scalability andefficiencypres
Scalability andefficiencypresScalability andefficiencypres
Scalability andefficiencypres
 
How to Optimize Your Drupal Site with Structured Content
How to Optimize Your Drupal Site with Structured ContentHow to Optimize Your Drupal Site with Structured Content
How to Optimize Your Drupal Site with Structured Content
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
Schema.org where did that come from?
Schema.org where did that come from?Schema.org where did that come from?
Schema.org where did that come from?
 
How city of chicago boosts their sap business objects environment prepares fo...
How city of chicago boosts their sap business objects environment prepares fo...How city of chicago boosts their sap business objects environment prepares fo...
How city of chicago boosts their sap business objects environment prepares fo...
 
A possible future role of schema.org for business reporting
A possible future role of schema.org for business reportingA possible future role of schema.org for business reporting
A possible future role of schema.org for business reporting
 
Implimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled TechnologyImplimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled Technology
 
Accelerating Delivery of Data Products - The EBSCO Way
Accelerating Delivery of Data Products - The EBSCO WayAccelerating Delivery of Data Products - The EBSCO Way
Accelerating Delivery of Data Products - The EBSCO Way
 
Telling the World and Our Users What We Have
Telling the World and Our Users What We HaveTelling the World and Our Users What We Have
Telling the World and Our Users What We Have
 
Leveraging SKOS to trace the overhaul of the STW Thesaurus for Economics
Leveraging SKOS to trace the overhaul of the STW Thesaurus for EconomicsLeveraging SKOS to trace the overhaul of the STW Thesaurus for Economics
Leveraging SKOS to trace the overhaul of the STW Thesaurus for Economics
 
Intern Project Showcase.pptx
Intern Project Showcase.pptxIntern Project Showcase.pptx
Intern Project Showcase.pptx
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )
 
Hypermedia System Architecture for a Web of Things
Hypermedia System Architecture for a Web of ThingsHypermedia System Architecture for a Web of Things
Hypermedia System Architecture for a Web of Things
 
Case study: Life Cycle Management for SAP BusinessObjects platform as well as...
Case study: Life Cycle Management for SAP BusinessObjects platform as well as...Case study: Life Cycle Management for SAP BusinessObjects platform as well as...
Case study: Life Cycle Management for SAP BusinessObjects platform as well as...
 
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
 
Couchbase 3.0.2 d1
Couchbase 3.0.2  d1Couchbase 3.0.2  d1
Couchbase 3.0.2 d1
 
Monoliths, Migrations, and Microservices
Monoliths, Migrations, and MicroservicesMonoliths, Migrations, and Microservices
Monoliths, Migrations, and Microservices
 

Dernier

Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLkantirani197
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxSilpa
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Silpa
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Silpa
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxDiariAli
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.Silpa
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...Monika Rani
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Silpa
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptxryanrooker
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Silpa
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE
 

Dernier (20)

Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 

A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time

  • 1. A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time Robert Meusel, Christian Bizer and Heiko Paulheim
  • 2. 2 Motivation - LOD Cloud with 1.000 data providers A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
  • 3. 3 Motivation - schema.org MD with 700k data providers A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
  • 4. 4 Microdata in a Nutshell  Adding structured information to web pages • By marking up contents and entities  Arbitrary vocabularies are possible • Practically, only schema.org is deployed on a large scale • Plus its historical predecessor: data-vocabulary.org  Similar to RDFa A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 <div itemscope itemtype="http://schema.org/PostalAddress"> <span itemprop="name">Data and Web Science Group</span> <span itemprop="addressLocality">Mannheim</span>, <span itemprop="postalCode">68131</span> <span itemprop="addressCountry">Germany</span> </div>
  • 5. 5 Schema.org in a Nutshell  Vocabulary for marking up entities on web pages • 675 classes and 965 properties (as of May 2015, release 2.0)  Promoted and consumes by major search engine companies • Google, Bing, Yahoo!, and Yandex • Google Rich Snippets  Community-driven evolution and development A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
  • 6. 6 Schema.org in a Nutshell – Coverage  Schema.org has incorporated some popular vocabularies, like: • Good Relations (2012) • W3C BibExtend (2014) • MusicBrainz vocabulary (2015) • Automotive Ontology (2015) A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
  • 7. 7 Microdata with Schema.org in HTML Pages <html> … <body> … <div id="main-section" class="performance left" data- sku="M17242_580“> <h1> Predator Instinct FG Fußballschuh </h1> <div> <meta content="EUR"> <span data-sale-price="219.95">219,95</span> … </body> </html> HTML pages embed directly markup languages to annotate items using different vocabularies <html> … <body> … <div id="main-section" class="performance left" data- sku="M17242_580" itemscope itemtype="http://schema.org/Product"> <h1 itemprop="name"> Predator Instinct FG Fußballschuh </h1> <div itemscope itemtype="http://schema.org/Offer" itemprop="offers"> <meta itemprop="priceCurrency" content="EUR"> <span itemprop="price" data-sale- price="219.95">219,95</span> … </body> </html> 1._:node1 <http://www.w3.org/1999/02/22-rdf-syntax- ns#type> <http://schema.org/Product> . 2._:node1 <http://schema.org/Product/name> "Predator Instinct FG Fußballschuh"@de . 3._:node1 <http://www.w3.org/1999/02/22-rdf-syntax- ns#type> <http://schema.org/Offer> . 4._:node1 <http://schema.org/Offer/price> "219,95"@de . 5._:node1 <http://schema.org/Offer/priceCurrency> "EUR" . 6.… A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
  • 8. 8 Wrap-Up A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015  Semantic annotations are used by more and more websites  Entities on websites become machine-readable and machine- understandable  schema.org together with Microdata is a success story • Promoted by search engine companies • Deployed by over 17% of all websites [1] (over 700k data providers)  Usage is more compliant to the schema than e.g. LOD [2] [1] http://webdatacommons.org/structureddata/2014-12/stats/stats.html [2] Meusel and Paulheim, ESWC 2015
  • 9. 9 Digging for Reasons  So, Microdata is more often deployed and is often more schema compliant, although there are millions of uncontrolled providers with different skill sets  But why? Some hypotheses… • Availability of documentation • Tool support • Business incentive • Schema flexibility  Can we confirm/reject those from looking at the data? A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
  • 10. 10 A Diachronic Perspective  Versions of schema.org are archived over time • Plus: there are several crawl releases per year • i.e., we can look at change over time  If we look at both schema and deployed data, we may observe • Adoption rates of schema changes • Data-first changes to the schema • Convergence or divergence of deployed data A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
  • 11. 11 A Diachronic Perspective  Three releases of WDC Microdata corpus [1] • 2012, 2013, and 2014  Versions of schema.org that were valid • At the beginning of the crawl • At the end of the crawl A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 [1] http://webdatacommons.org/structureddata
  • 12. 12 Top-down Adoption  How fast are changes in the schema adopted? • New classes/properties • Deprecations • Domain/range changes  Measuring adoption: challenges • Different crawls • Overall growth of deployed schema.org  Measure: normalized usage increase (nui) from i to j: • nui(s)>1.05: usage of schema element s has increased significantly • nui(s)<0.95: usage of schema element s has decreased significantly A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
  • 13. 13 Top-down Adoption  Adoption of new classes and properties • Almost half of all introduced classes are never used! • Similar for new properties  Reasons • Bulk-addition of vocabularies • not every term is equally needed • e.g., medical vocabulary • Blind spot of our approach • some terms are mainly for e-mail markup • e.g., Actions A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 SURPRISE!
  • 14. 14 Top-down Adoption  Main domains of positive adoption • Meta data for web content (schema.org/Website has the highest nui) • Broadcasting (e.g., TV Episodes) • Questions & Answers • Postal addresses  Classes featured in Google Rich Snippets • Still growth on high level (tens of thousands of data providers) • But nui(s)<0.95 A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 Yellow Pages Search Engine Listings Collaboration with BBC and EBU Influence of CMS adoption Q&A Pages, such as Stackoverflow
  • 15. 15 Top-down Adoption  Adoption of domain/range changes • Again: rather low overall adoption  Adopted well for • Products (height, width, itemCondition, …) • Broadcasting domain (episode, actor, ...) A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 Search Engine Listings Collaboration with BBC and EBU
  • 16. 16 Top-down Adoption  Adoption of deprecations • Works well (29 out of 32 have a significantly low nui)  Exceptions • s:map (← s:hasMap) • s:maps (← s:hasMap) A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 For Google Maps (lots of outdated tutorials)
  • 17. 17 Bottom-up Evolution  Martin Luther • Started the protestant church • A success story, too (like schema.org) • (i.e., 800 million adopters worldwide)  Famous quote: • “Man muss […] dem gemeinen Mann aufs Maul schauen” • (roughly: “You have to listen to the way the common man really speaks.”) A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 Martin Luther, 1483-1546 Disclaimer: I do not speak for the protestant church.
  • 18. 18 Bottom-up Evolution  Are new features in the schema first used “inofficially”? • New classes/properties • Domain/range changes  Instrument for measurement: ROC curves • True positives mapped against false positives • tp: elements used before • fp: elements not used before • Ranking by #PLDs A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
  • 19. 19 Bottom-up Evolution  There are some mild influences observable • Stronger for domain/range changes • especially range changes • Weaker for new classes/properties A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 2012→ 2013 2013→ 2014 2012→ 2014 classes properties domains ranges
  • 20. 20 Bottom-up Evolution  Extension mechanism • Allows for user-defined classes/properties • Those become subclasses implicitly  Analysis over time • No measurable impact on standard evolution • “Inofficial” use is likelier than use of extension mechanism A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 s:Product/ElectronicProduct s:price/reducedPrice
  • 21. 21 Overall Convergence  Measuring convergence • i.e., homogeneity of descriptions of classes • Example: two instances of s:LocalBusiness A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 _:1 _:2 “Birmingham” “Main Street 24” s:LocalBusiness s:PostalAddress _:1 _:2 “Liverpool” “Church Street 1” s:LocalBusiness s:PostalAddress
  • 22. 22 Overall Convergence  Recap • RDF from Microdata is a set of trees • i.e., we can enumerate all paths to leaf nodes (omitting literals)  Example: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 _:1 _:2 “Liverpool” “Church Street 1” s:LocalBusiness s:PostalAddress rdf:type-s:LocalBusiness, s:address-rdf:type-s:PostalAddress, s:address-s:addressLocality, s:address-s:streetAddress
  • 23. 23 Overall Convergence  Using all paths, we can compute the entropy for each class as  A low entropy refers to a high homogeneity  We normalize both by maximum entropy and the total number of paths • i.e., we use normalized entropy rate as a measure for homogeneity A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
  • 24. 24 Overall Convergence  Observations • Overall entropy decreases over time  Classes with high convergence rates • WebSite, Blog, … • Hotel, Restaurant, … • Product, Offer, … • Rating, Review A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 Influence of CMS adoption Yellow pages Google Rich Snippets ...all of the above
  • 25. 25 Key Adoption Drivers  Search Engine Optimization • Web site providers want to be high in Google rankings • Direct business incentive!  Tool adoption • Major CMSs use schema.org  Standard Agility • schema.org: 25 revisions in last three years • cf. FOAF: six revisions in last eight years A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
  • 26. 26 Summary  Both ways, top-down and bottom-up adoptions can be observed  Homogeneity of deployed schema increase over time  Described empirical data-driven study reveals valuable insights to understand how and why schema.org is a success story  Observed key drivers and obstacles can also help to understand and analysis adoption of other standards, e.g. LOD  More fine-grained insights might be revealed when extending the analysis corpus to the mailing list archive and issue tracker A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
  • 27. 27 Thank you! Questions? Feedback? Raw data can be found on the website of WebDataCommons: http://webdatacommons.org/structureddata/ More interesting datasets and analysis: http://webdatacommons.org/index.html A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 Acknowledgement The extraction and analysis of the datasets was supported by AWS in Education Grant.