SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
Validator and preview 
for the JobPosting data model 
of Schema.org 
Jindřich Mynarz 
Department of Information and Knowledge 
Engineering, 
University of Economics, Prague 
EC-WEB 2014, September 2, 2014
Motivation 
● Improving usability of vocabularies 
● Provide feedback on the use of 
vocabularies 
● Make vocabulary specification executable 
● Help ensure basic level of data quality 
● Capture application-specific requirements 
for data in validation rules
DámePráci.eu project 
“Matching jobs with unemployed 
through semantic data” 
Data model using Schema.org with 
an extension for the job market. 
Application for searching through job postings 
aggregated from distinct sources: 
www.damepraci.cz (in Czech)
Validation method 
● Rule-based, schema-aware 
validation 
● Operates in the RDF data model 
● Focuses on semantic errors, beyond well-formed 
markup 
● Partial open world assumption 
● Implemented as SPARQL 1.1 CONSTRUCT 
queries 
● Error reporting via SPIN RDF vocabulary
Background knowledge 
schema.org 
+ extension for job market (RDFS) 
+ external enumerations: 
● ISO 4217 currency codes (SKOS) 
● ISO 639-1 language codes (SKOS) 
Loaded in separate named graphs that the 
validation rules can reference.
Validation rules 
● Data completeness 
● Distinction between datatype and object 
properties 
● Conflicting data 
● Datatype violations 
● Invalid codes
Data completeness 
● At least 1 instance 
of schema:JobPosting 
● Other type information (class membership, 
datatypes) left optional 
● Empty literals 
● Conditionally required data (e.g., 
compensation + currency)
Distinction between datatype 
and object properties 
● Object properties with literal objects instead 
of URIs or blank nodes (and vice versa for 
datatype properties) 
● Simpler syntax of datatype 
properties 
○ Avoiding nested objects or difficulties with finding an 
object's URI 
● May be a symptom of incorrectly nested 
HTML elements
Conflicting data 
● Mutually-exclusive properties 
○ schema:jobLocation 
+ schema:isRemoteWork true 
● Cardinality violation for functional properties 
with > 1 object 
○ schema:startDate, schema:currency, schema: 
availableVacancies 
● Incompatible class membership inferences 
○ schema:domainIncludes, schema:rangeIncludes 
○ Incompatible class membership is instantiation of 2+ 
distinct classes that are not in rdfs:subClassOf 
relation.
Datatype violations 
● Regular expressions, casting errors 
of XPath datatype constructor functions 
● Date and time formats (xsd:date, xsd: 
duration) 
○ Not conforming to regular expressions 
○ Non-existent dates 
○ Dates from the future 
● Interval limits 
○ Positive integers for schema:availableVacancies
Invalid codes 
● Based on lookup in code lists enumerating 
every valid code 
● Includes language codes (ISO 639-1) and 
currency codes (ISO 4217)
Implementation 
Ruby on Rails web application 
backed by Jena Fuseki SPARQL 1.1 endpoint. 
● Validates both RDFa and HTML5 Microdata 
● Czech and English localization 
● Validation results in HTML or JSON-LD 
● RSpec tests for each validation rule 
● Open source: https://github.com/OPLZZ/job-posting-validator
Demo: bit.ly/broken-job-posting
Preview
Experimental validation 
of a JobPosting corpus 
● 1332 seed URLs from 752 distinct 
pay-level domains obtained via Google 
Custom Search Engine restricted to schema: 
JobPosting 
● Sample of 42 872 web pages obtained 
by crawling seed URLs 
● Each page validated, validation results 
in JSON-LD loaded to Elasticsearch 
for exploration
Most common errors
Datatype property used 
as object property 
Most common path to error: schema:title 
Possible cause: incorrect understanding of 
markup precedence rules: 
<a property="title" href="#title">SEO guru</a> 
[] schema:title <#title> . 
[] schema:title "SEO guru" .
Empty literal value 
Most common path to error: schema: 
addressRegion 
Possible cause: incomplete data used to 
generate HTML from fixed templates 
Less common in manually marked-up HTML
Incorrect character case 
in schema:Postaladdress 
Both RDFa and HTML5 Microdata are case-sensitive. 
Spread across 116 unique PLDs. 
“The default mode of authoring [Schema.org 
markup] is copy and edit.” — R.V. Guha
Object property used 
as datatype property 
Most common path to error: schema:jobLocation 
Common cause: simpler markup without intermediate 
resources 
<p property="jobLocation"> 
<p rel="jobLocation"> 
Munich 
<p rel="address"> 
</p> 
<p property= 
"addressLocality"> 
Munich 
</p> 
</p> 
</p>
Unsuccessful experiments 
Web Data Commons 
● Errors smoothed by extraction to RDF 
● Not suitable as a source of seed URLs: job 
postings disappear quickly 
Veterans Job Bank 
● Data from few PLDs, lacks variety 
● Severe restrictions on automated downloads 
through its API
Questions? 
Acknowledgements: 
The presented research was partially supported by the project 
of Operational Programme Human Resources and Employment no. CZ. 
1.04/5.1.01/77.00440. 
Image credits: 
Check List designed by Arthur Shlain from the thenounproject.com 
Puzzle designed by John from the thenounproject.com

Contenu connexe

Tendances

Tendances (20)

The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise
 
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioDo it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
 
Linked Data for Czech Legislation
Linked Data for Czech LegislationLinked Data for Czech Legislation
Linked Data for Czech Legislation
 
Semantic Web Challenges for Visualisation and Visual Analytics
Semantic Web Challenges for Visualisation and Visual AnalyticsSemantic Web Challenges for Visualisation and Visual Analytics
Semantic Web Challenges for Visualisation and Visual Analytics
 
Semantic Technologies in ST&DL
Semantic Technologies in ST&DLSemantic Technologies in ST&DL
Semantic Technologies in ST&DL
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
 
Freire model api
Freire model apiFreire model api
Freire model api
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphs
 
Deriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataDeriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF Data
 
A Semantic Data Model for Web Applications
A Semantic Data Model for Web ApplicationsA Semantic Data Model for Web Applications
A Semantic Data Model for Web Applications
 
Ephedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federationEphedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federation
 
McDanold-1-jun15
McDanold-1-jun15McDanold-1-jun15
McDanold-1-jun15
 
Let your data shine... with OpenRefine
Let your data shine... with OpenRefineLet your data shine... with OpenRefine
Let your data shine... with OpenRefine
 
LODStats (Presentation for KESW2013 System Demo)
LODStats (Presentation for KESW2013 System Demo)LODStats (Presentation for KESW2013 System Demo)
LODStats (Presentation for KESW2013 System Demo)
 
Linked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureLinked Data Experiences at Springer Nature
Linked Data Experiences at Springer Nature
 
Towards digitizing scholarly communication
Towards digitizing scholarly communicationTowards digitizing scholarly communication
Towards digitizing scholarly communication
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020
 
Library Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic ControlLibrary Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic Control
 
Lauruhn-5-jun15
Lauruhn-5-jun15Lauruhn-5-jun15
Lauruhn-5-jun15
 
Annotating Scholarly Works - the W3C Open Annotation Model
Annotating Scholarly Works - the W3C Open Annotation ModelAnnotating Scholarly Works - the W3C Open Annotation Model
Annotating Scholarly Works - the W3C Open Annotation Model
 

En vedette

Apresentaçao swing crash
Apresentaçao swing crashApresentaçao swing crash
Apresentaçao swing crash
Tiago Malheiros
 
Pitch Like a Boss
Pitch Like a BossPitch Like a Boss
Pitch Like a Boss
Inês Silva
 
Barriers to the diffusion of the VSM (Nuno Rosa, 2016)
Barriers to the diffusion of the VSM (Nuno Rosa, 2016)Barriers to the diffusion of the VSM (Nuno Rosa, 2016)
Barriers to the diffusion of the VSM (Nuno Rosa, 2016)
Nuno Rosa
 

En vedette (20)

Apresentaçao swing crash
Apresentaçao swing crashApresentaçao swing crash
Apresentaçao swing crash
 
Pensar Digital
Pensar DigitalPensar Digital
Pensar Digital
 
Pitch Like a Boss
Pitch Like a BossPitch Like a Boss
Pitch Like a Boss
 
Agent Eighteen 2010 Mockup
Agent Eighteen 2010 MockupAgent Eighteen 2010 Mockup
Agent Eighteen 2010 Mockup
 
Bash Introduction
Bash IntroductionBash Introduction
Bash Introduction
 
Prosolvers CH
Prosolvers CHProsolvers CH
Prosolvers CH
 
Incubate Camp 2nd
Incubate Camp 2ndIncubate Camp 2nd
Incubate Camp 2nd
 
Funding ideas in a globally connected world – a social approach
Funding ideas in a globally connected world – a social approachFunding ideas in a globally connected world – a social approach
Funding ideas in a globally connected world – a social approach
 
Talk ja ye-nuno_freitas_1set2012
Talk ja ye-nuno_freitas_1set2012Talk ja ye-nuno_freitas_1set2012
Talk ja ye-nuno_freitas_1set2012
 
Scheduled releases @ Commit Porto 2016
Scheduled releases @ Commit Porto 2016Scheduled releases @ Commit Porto 2016
Scheduled releases @ Commit Porto 2016
 
Set n'match
Set n'matchSet n'match
Set n'match
 
GoClapp Pitch deck v2.0
GoClapp Pitch deck v2.0GoClapp Pitch deck v2.0
GoClapp Pitch deck v2.0
 
Apresentação
ApresentaçãoApresentação
Apresentação
 
From LIBEs’ framework to users experience of LIBE courses: analysing the Port...
From LIBEs’ framework to users experience of LIBE courses: analysing the Port...From LIBEs’ framework to users experience of LIBE courses: analysing the Port...
From LIBEs’ framework to users experience of LIBE courses: analysing the Port...
 
Delivering presentations - dicas de apresentação (not!)
Delivering presentations - dicas de apresentação (not!)Delivering presentations - dicas de apresentação (not!)
Delivering presentations - dicas de apresentação (not!)
 
Niiiws short
Niiiws short Niiiws short
Niiiws short
 
Barriers to the diffusion of the VSM (Nuno Rosa, 2016)
Barriers to the diffusion of the VSM (Nuno Rosa, 2016)Barriers to the diffusion of the VSM (Nuno Rosa, 2016)
Barriers to the diffusion of the VSM (Nuno Rosa, 2016)
 
Customer Development - Entrepreneurs Break
Customer Development - Entrepreneurs BreakCustomer Development - Entrepreneurs Break
Customer Development - Entrepreneurs Break
 
Launching tech products
Launching tech productsLaunching tech products
Launching tech products
 
Beta start @ beside
Beta start @ besideBeta start @ beside
Beta start @ beside
 

Similaire à EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org

Www Search Engine But Not In Perl
Www Search Engine But Not In PerlWww Search Engine But Not In Perl
Www Search Engine But Not In Perl
Konstantin Ivinsky
 

Similaire à EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org (20)

Scalable database, Scalable language @ JDC 2013
Scalable database, Scalable language @ JDC 2013Scalable database, Scalable language @ JDC 2013
Scalable database, Scalable language @ JDC 2013
 
Semantic framework for web scraping.
Semantic framework for web scraping.Semantic framework for web scraping.
Semantic framework for web scraping.
 
API
APIAPI
API
 
Stream processing: The Matrix Revolutions
Stream processing: The Matrix RevolutionsStream processing: The Matrix Revolutions
Stream processing: The Matrix Revolutions
 
SELF - Becoming a Rails Developer - The Rest of the Story
SELF - Becoming a Rails Developer - The Rest of the StorySELF - Becoming a Rails Developer - The Rest of the Story
SELF - Becoming a Rails Developer - The Rest of the Story
 
Chado-XML
Chado-XMLChado-XML
Chado-XML
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
 
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
 
JSON-LD Update
JSON-LD UpdateJSON-LD Update
JSON-LD Update
 
Linked services
Linked servicesLinked services
Linked services
 
JSON-LD and SHACL for Knowledge Graphs
JSON-LD and SHACL for Knowledge GraphsJSON-LD and SHACL for Knowledge Graphs
JSON-LD and SHACL for Knowledge Graphs
 
Www Search Engine But Not In Perl
Www Search Engine But Not In PerlWww Search Engine But Not In Perl
Www Search Engine But Not In Perl
 
L18 Object Relational Mapping
L18 Object Relational MappingL18 Object Relational Mapping
L18 Object Relational Mapping
 
Semantika Introduction
Semantika IntroductionSemantika Introduction
Semantika Introduction
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at last
 
NLP and the Web
NLP and the WebNLP and the Web
NLP and the Web
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2
 
Neo4j graph database
Neo4j graph databaseNeo4j graph database
Neo4j graph database
 
Node js crash course session 5
Node js crash course   session 5Node js crash course   session 5
Node js crash course session 5
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org

  • 1. Validator and preview for the JobPosting data model of Schema.org Jindřich Mynarz Department of Information and Knowledge Engineering, University of Economics, Prague EC-WEB 2014, September 2, 2014
  • 2. Motivation ● Improving usability of vocabularies ● Provide feedback on the use of vocabularies ● Make vocabulary specification executable ● Help ensure basic level of data quality ● Capture application-specific requirements for data in validation rules
  • 3. DámePráci.eu project “Matching jobs with unemployed through semantic data” Data model using Schema.org with an extension for the job market. Application for searching through job postings aggregated from distinct sources: www.damepraci.cz (in Czech)
  • 4. Validation method ● Rule-based, schema-aware validation ● Operates in the RDF data model ● Focuses on semantic errors, beyond well-formed markup ● Partial open world assumption ● Implemented as SPARQL 1.1 CONSTRUCT queries ● Error reporting via SPIN RDF vocabulary
  • 5. Background knowledge schema.org + extension for job market (RDFS) + external enumerations: ● ISO 4217 currency codes (SKOS) ● ISO 639-1 language codes (SKOS) Loaded in separate named graphs that the validation rules can reference.
  • 6. Validation rules ● Data completeness ● Distinction between datatype and object properties ● Conflicting data ● Datatype violations ● Invalid codes
  • 7. Data completeness ● At least 1 instance of schema:JobPosting ● Other type information (class membership, datatypes) left optional ● Empty literals ● Conditionally required data (e.g., compensation + currency)
  • 8. Distinction between datatype and object properties ● Object properties with literal objects instead of URIs or blank nodes (and vice versa for datatype properties) ● Simpler syntax of datatype properties ○ Avoiding nested objects or difficulties with finding an object's URI ● May be a symptom of incorrectly nested HTML elements
  • 9. Conflicting data ● Mutually-exclusive properties ○ schema:jobLocation + schema:isRemoteWork true ● Cardinality violation for functional properties with > 1 object ○ schema:startDate, schema:currency, schema: availableVacancies ● Incompatible class membership inferences ○ schema:domainIncludes, schema:rangeIncludes ○ Incompatible class membership is instantiation of 2+ distinct classes that are not in rdfs:subClassOf relation.
  • 10. Datatype violations ● Regular expressions, casting errors of XPath datatype constructor functions ● Date and time formats (xsd:date, xsd: duration) ○ Not conforming to regular expressions ○ Non-existent dates ○ Dates from the future ● Interval limits ○ Positive integers for schema:availableVacancies
  • 11. Invalid codes ● Based on lookup in code lists enumerating every valid code ● Includes language codes (ISO 639-1) and currency codes (ISO 4217)
  • 12. Implementation Ruby on Rails web application backed by Jena Fuseki SPARQL 1.1 endpoint. ● Validates both RDFa and HTML5 Microdata ● Czech and English localization ● Validation results in HTML or JSON-LD ● RSpec tests for each validation rule ● Open source: https://github.com/OPLZZ/job-posting-validator
  • 15. Experimental validation of a JobPosting corpus ● 1332 seed URLs from 752 distinct pay-level domains obtained via Google Custom Search Engine restricted to schema: JobPosting ● Sample of 42 872 web pages obtained by crawling seed URLs ● Each page validated, validation results in JSON-LD loaded to Elasticsearch for exploration
  • 17. Datatype property used as object property Most common path to error: schema:title Possible cause: incorrect understanding of markup precedence rules: <a property="title" href="#title">SEO guru</a> [] schema:title <#title> . [] schema:title "SEO guru" .
  • 18. Empty literal value Most common path to error: schema: addressRegion Possible cause: incomplete data used to generate HTML from fixed templates Less common in manually marked-up HTML
  • 19. Incorrect character case in schema:Postaladdress Both RDFa and HTML5 Microdata are case-sensitive. Spread across 116 unique PLDs. “The default mode of authoring [Schema.org markup] is copy and edit.” — R.V. Guha
  • 20. Object property used as datatype property Most common path to error: schema:jobLocation Common cause: simpler markup without intermediate resources <p property="jobLocation"> <p rel="jobLocation"> Munich <p rel="address"> </p> <p property= "addressLocality"> Munich </p> </p> </p>
  • 21. Unsuccessful experiments Web Data Commons ● Errors smoothed by extraction to RDF ● Not suitable as a source of seed URLs: job postings disappear quickly Veterans Job Bank ● Data from few PLDs, lacks variety ● Severe restrictions on automated downloads through its API
  • 22. Questions? Acknowledgements: The presented research was partially supported by the project of Operational Programme Human Resources and Employment no. CZ. 1.04/5.1.01/77.00440. Image credits: Check List designed by Arthur Shlain from the thenounproject.com Puzzle designed by John from the thenounproject.com