SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Andreas Blumauer
CEO & Managing Partner
Semantic Web Company &
PoolParty Semantic Suite
TAXONOMY QUALITY
ASSESSMENT:
TOOLS & TECHNIQUES
Taxonomy
Boot Camp 2016
Washington, DC
1
INTRODUCTION
2
Semantic Web
Company
founder &
CEO of
Andreas
Blumauer
developer and
vendor of
2004
founded
5.5
current
Version
active at
based on
Vienna
located
part of Taxonomy Knowledge Graph
standard for
part of is a
>200serves customers
Ontology
manages
part ofis a
Aspects of
Taxonomy Quality
Types of taxonomy quality metrics,
and for which scenarios they are relevant
3
Why is taxonomy
quality important?
Some examples for
quality issues and
their possible
consequences
4 ▸ Missing labels
▹ AGROVOC (FAO) defines concepts in 25 different languages. While most concepts have
English labels attached, only 38% have German labels.
▹ This can be a problem for multilingual applications that rely on label translations.
▸ Orphan concepts
▹ An orphan concept is a concept that has no semantic relation with any other concept.
Although it might have attached lexical labels, it lacks valuable context information.
▹ This can be crucial for retrieval tasks such as search query expansion.
▸ Mismatch between content and taxonomy
▹ There are only minor overlaps between the scope of the documents (or data) to be
indexed and the scope of the controlled vocabulary in use.
▹ This leads to a sparse enrichment of the document index by semantic information.
See also: Finding quality issues in SKOS vocabularies
(Christian Mader, Bernhard Haslhofer, Antoine Isaac)
Taxonomy quality
issues are more
frequently
observed than
some might expect
5
See also: Finding quality issues in SKOS vocabularies
Taxonomy quality
criteria and issues
at different levels
6
1. Formal integrity conditions based on SKOS
▹ Construction of well-formed and consistent data to promote interoperability
▹ Example: No two concepts may be connected by both related and broader transitive
▹ Read more: SKOS: A Guide for Information Professionals (Jane Frazier)
2. Labeling and documentation issues
▹ Construction of taxonomies that allow support for complex retrieval tasks
▹ Example: No two concepts of a concept scheme may have the same preferred label
▹ Read more: SKOS Primer (Antoine Isaac / Ed Summers)
3. Structural issues
▹ Logic-based based processing of taxonomies
▹ Example: Avoidance of hierarchical cycles
▹ Read more: Key choices in the design of SKOS (Thomas Baker et al)
4. Content coverage
▹ Development of taxonomies that reflect well the scope of represented content
▹ Example: Avoid maintaining subtrees that only have limited occurrences in a representative
document corpus
▹ Read more: Corpus management with PoolParty
5. Network topological issues (experimental)
▹ (Co-)occurrences of concepts in a corpus should be reflected in the network topology of a
knowledge graph
▹ Example: Nodes/concepts with high betweenness centrality should occur correspondingly
in a reference document corpus
Why are
standards-based
technologies and
tools so important
when it comes to
taxonomy quality
management?
7
Spreadsheet editors are still the most common type of software application
being used for taxonomy management. They cannot measure quality automatically.
‘Good’ quality
depends on the
usage scenario
8
Example: Google Product Taxonomy has no synonyms at all, only hierarchical relations
How to pick the
most relevant
quality criteria for a
taxonomy project
9
PoolParty supports various application scenarios. Quality checks can be enforced,
reported, or ignored.
How to pick the
most relevant
quality criteria for a
taxonomy project
10 ▸ General purpose thesaurus vs.
Custom enterprise taxonomy
▹ Custom enterprise taxonomies can be developed specifically on top of reference corpora
▹ General purpose thesauri are frequently used in the context of linked data environments
→ Linked data specific issues become more important
■ Missing In-Links
■ Missing Out-Links
■ Broken Links
■ Undefined SKOS Resources
■ HTTP URI Scheme Violation
See also: PoolParty SKOS Quality Checker based on qSKOS
Taxonomy
Quality Metrics
How quality issues can be unveiled
and how insights can be used for further improvements
11
Repair label issues
12
Repair structural
issues
13
Unveil mismatch
between taxonomy
and document
corpus
14 Content Manager
Integrator
Taxonomist/
Ontologist
Thesaurus
Server
Extractor
PowerTagging
uses API
is user of
is user of
is basis of
is basis of
Index
annotates
enriches
Corpus Learning/
Semantic Analysis
CMS
extends
is basis of
analyzes
uses API
Unveil mismatch
between taxonomy
and document
corpus
15
PoolParty extracts concepts not being used in a reference corpus at all and provides
suggestions how those concepts could be reworked or extended to become relevant.
Unveil mismatch
between taxonomy
and document
corpus
16
PoolParty extracts relevant candidate concepts based on a deep corpus analysis.
Unveil mismatch
between taxonomy
and document
corpus
17
PoolParty suggest possible ‘right places’ for the candidate concepts within the approved
taxonomy.
Unveil network
topological issues
18
Example: STW Thesaurus for Economics
Unveil network
topological issues
19
Example: STW Thesaurus for Economics - Top 10 thesaurus concepts (betweenness)
Combined analysis
over network
topology and
reference corpus
20
Example: STW Thesaurus for Economics and reference corpus about ‘Crude Oil Market’
Combined analysis
over network
topology and
reference corpus
21
Example: STW Thesaurus for Economics and reference corpus about ‘Crude Oil Market’
Combined analysis
over network
topology and
reference corpus:
Correlation
Betweenness &
Document
Frequency
22
Example: STW Thesaurus for Economics and reference corpus about ‘Crude Oil Market’
Techniques and Tools
How they help to assess
Taxonomy Quality
23
BARTOC.org
Basel Register of
Thesauri,
Ontologies &
Classifications
▸ Unveil Taxonomy Quality by the Wisdom of the Crowd
24
qSKOS
▸ qSKOS is a tool for finding quality issues in SKOS vocabularies
▸ Available as free online service at http://qskos.poolparty.biz/
▸ SKOS taxonomy being analyzed with regards to 24 issues
25
PoolParty Import
Validator
26
▸ RDF Validation to go beyond SKOS
▸ Checks are defined in RDF, repair strategies also defined as RDF
▸ 15 checks have been integrated
Shapes Constraint
Language (SHACL)
▸ “Do for RDF what XML Schema does for XML”
▸ Language for validating RDF graphs against a set of conditions
▸ SHACL shape graphs are used to validate that data graphs satisfy a set of
conditions
▸ Current status: W3C Working Draft (14 August 2016)
See also: Towards maintainable constraint validation and repair for taxonomies:
The PoolParty approach (Christian Mader and Monika Solanki)
27
GET YOUR
TEST ACCOUNT
GET CERTIFIED
28
Get your test account at
www.poolparty.biz/demo
Get certified at
www.poolparty.biz/academy/
CONNECT
Andreas Blumauer
CEO, Semantic Web Company
▸ a.blumauer@semantic-web.at
▸ http://at.linkedin.com/in/andreasblumauer
▸ https://twitter.com/semwebcompany
▸ https://www.poolparty.biz
▸ https://www.semantic-web.at
29
© Semantic Web Company - http://www.semantic-web.at/ and http://www.poolparty.biz/

Contenu connexe

En vedette

Taxonomies for E-commerce
Taxonomies for E-commerceTaxonomies for E-commerce
Taxonomies for E-commerceHeather Hedden
 
Understanding Website Taxonomy
Understanding Website TaxonomyUnderstanding Website Taxonomy
Understanding Website TaxonomyIksula
 
PROPEL . Austrian's Roadmap for Enterprise Linked Data
PROPEL . Austrian's Roadmap for Enterprise Linked DataPROPEL . Austrian's Roadmap for Enterprise Linked Data
PROPEL . Austrian's Roadmap for Enterprise Linked DataSemantic Web Company
 
Taxonomy Is User Experience
Taxonomy Is User ExperienceTaxonomy Is User Experience
Taxonomy Is User ExperienceDave Cooksey
 
Taxonomies and Ontologies – The Yin and Yang of Knowledge Modelling
Taxonomies and Ontologies – The Yin and Yang of Knowledge ModellingTaxonomies and Ontologies – The Yin and Yang of Knowledge Modelling
Taxonomies and Ontologies – The Yin and Yang of Knowledge ModellingSemantic Web Company
 
Financing options for brouwersdam in zuid holland
Financing options for brouwersdam in zuid hollandFinancing options for brouwersdam in zuid holland
Financing options for brouwersdam in zuid hollandEIP Water
 
Pivot Conference '13 - Snackable Take Home Lessons
Pivot Conference '13 - Snackable Take Home LessonsPivot Conference '13 - Snackable Take Home Lessons
Pivot Conference '13 - Snackable Take Home LessonsMichoel Ogince
 
The A-to-Z Guide to SlideShare
The A-to-Z Guide to SlideShareThe A-to-Z Guide to SlideShare
The A-to-Z Guide to SlideShareBarry Feldman
 

En vedette (9)

Taxonomies for E-commerce
Taxonomies for E-commerceTaxonomies for E-commerce
Taxonomies for E-commerce
 
Understanding Website Taxonomy
Understanding Website TaxonomyUnderstanding Website Taxonomy
Understanding Website Taxonomy
 
PROPEL . Austrian's Roadmap for Enterprise Linked Data
PROPEL . Austrian's Roadmap for Enterprise Linked DataPROPEL . Austrian's Roadmap for Enterprise Linked Data
PROPEL . Austrian's Roadmap for Enterprise Linked Data
 
Taxonomy Is User Experience
Taxonomy Is User ExperienceTaxonomy Is User Experience
Taxonomy Is User Experience
 
Taxonomies and Ontologies – The Yin and Yang of Knowledge Modelling
Taxonomies and Ontologies – The Yin and Yang of Knowledge ModellingTaxonomies and Ontologies – The Yin and Yang of Knowledge Modelling
Taxonomies and Ontologies – The Yin and Yang of Knowledge Modelling
 
Financing options for brouwersdam in zuid holland
Financing options for brouwersdam in zuid hollandFinancing options for brouwersdam in zuid holland
Financing options for brouwersdam in zuid holland
 
Pivot Conference '13 - Snackable Take Home Lessons
Pivot Conference '13 - Snackable Take Home LessonsPivot Conference '13 - Snackable Take Home Lessons
Pivot Conference '13 - Snackable Take Home Lessons
 
Blooms taxonomy powerpoint
Blooms taxonomy powerpointBlooms taxonomy powerpoint
Blooms taxonomy powerpoint
 
The A-to-Z Guide to SlideShare
The A-to-Z Guide to SlideShareThe A-to-Z Guide to SlideShare
The A-to-Z Guide to SlideShare
 

Similaire à Taxonomy Quality Assessment

PoolParty Semantic Suite - Functional Overview
PoolParty Semantic Suite - Functional OverviewPoolParty Semantic Suite - Functional Overview
PoolParty Semantic Suite - Functional OverviewSemantic Web Company
 
Transforming knowledge management for climate action
Transforming knowledge management for climate action  Transforming knowledge management for climate action
Transforming knowledge management for climate action weADAPT
 
Low Hanging Fruit Breakout Discussion #2
Low Hanging Fruit Breakout Discussion #2 Low Hanging Fruit Breakout Discussion #2
Low Hanging Fruit Breakout Discussion #2 Pistoia Alliance
 
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Jenn Riley
 
Expressing Concept Schemes & Competency Frameworks in CTDL
Expressing Concept Schemes & Competency Frameworks in CTDLExpressing Concept Schemes & Competency Frameworks in CTDL
Expressing Concept Schemes & Competency Frameworks in CTDLCredential Engine
 
SKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategiesSKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategiesSemantic Web Company
 
PoolParty Semantic Suite: Management Briefing and Functional Overview
PoolParty Semantic Suite: Management Briefing and Functional Overview PoolParty Semantic Suite: Management Briefing and Functional Overview
PoolParty Semantic Suite: Management Briefing and Functional Overview Martin Kaltenböck
 
Metadata: Digital Humanties
Metadata: Digital HumantiesMetadata: Digital Humanties
Metadata: Digital HumantiesMatthew Miguez
 
SWT Lecture Session 7 - Advanced uses of RDFS
SWT Lecture Session 7 - Advanced uses of RDFSSWT Lecture Session 7 - Advanced uses of RDFS
SWT Lecture Session 7 - Advanced uses of RDFSMariano Rodriguez-Muro
 
Building Bridges with Taxonomy: Enabling Semantic Integration
Building Bridges with Taxonomy: Enabling Semantic IntegrationBuilding Bridges with Taxonomy: Enabling Semantic Integration
Building Bridges with Taxonomy: Enabling Semantic IntegrationDesign for Context
 
Content Analysis Keys Reuse
Content Analysis Keys ReuseContent Analysis Keys Reuse
Content Analysis Keys ReuseClearPath, LLC
 
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web María Poveda Villalón
 
Cataloging roundtable discussion questions
Cataloging roundtable discussion questionsCataloging roundtable discussion questions
Cataloging roundtable discussion questionsrobin fay
 
S doherty counting_dragons_dita-reuse
S doherty counting_dragons_dita-reuseS doherty counting_dragons_dita-reuse
S doherty counting_dragons_dita-reuseStan Doherty
 
Text Analytics for Non-Experts
Text Analytics for Non-ExpertsText Analytics for Non-Experts
Text Analytics for Non-ExpertsSynaptica, LLC
 
IWMW 2002: The Value of Metadata and How to Realise It
IWMW 2002: The Value of Metadata and How to Realise ItIWMW 2002: The Value of Metadata and How to Realise It
IWMW 2002: The Value of Metadata and How to Realise ItIWMW
 

Similaire à Taxonomy Quality Assessment (20)

PoolParty Semantic Suite - Functional Overview
PoolParty Semantic Suite - Functional OverviewPoolParty Semantic Suite - Functional Overview
PoolParty Semantic Suite - Functional Overview
 
Transforming knowledge management for climate action
Transforming knowledge management for climate action  Transforming knowledge management for climate action
Transforming knowledge management for climate action
 
Low Hanging Fruit Breakout Discussion #2
Low Hanging Fruit Breakout Discussion #2 Low Hanging Fruit Breakout Discussion #2
Low Hanging Fruit Breakout Discussion #2
 
Aiim motorola-taxo-integration-03-15-10-cg
Aiim motorola-taxo-integration-03-15-10-cgAiim motorola-taxo-integration-03-15-10-cg
Aiim motorola-taxo-integration-03-15-10-cg
 
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
 
Expressing Concept Schemes & Competency Frameworks in CTDL
Expressing Concept Schemes & Competency Frameworks in CTDLExpressing Concept Schemes & Competency Frameworks in CTDL
Expressing Concept Schemes & Competency Frameworks in CTDL
 
SKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategiesSKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategies
 
PoolParty Semantic Suite: Management Briefing and Functional Overview
PoolParty Semantic Suite: Management Briefing and Functional Overview PoolParty Semantic Suite: Management Briefing and Functional Overview
PoolParty Semantic Suite: Management Briefing and Functional Overview
 
Metadata: Digital Humanties
Metadata: Digital HumantiesMetadata: Digital Humanties
Metadata: Digital Humanties
 
SWT Lecture Session 7 - Advanced uses of RDFS
SWT Lecture Session 7 - Advanced uses of RDFSSWT Lecture Session 7 - Advanced uses of RDFS
SWT Lecture Session 7 - Advanced uses of RDFS
 
Building Bridges with Taxonomy: Enabling Semantic Integration
Building Bridges with Taxonomy: Enabling Semantic IntegrationBuilding Bridges with Taxonomy: Enabling Semantic Integration
Building Bridges with Taxonomy: Enabling Semantic Integration
 
Taxonomies and Metadata
Taxonomies and MetadataTaxonomies and Metadata
Taxonomies and Metadata
 
Content Analysis Keys Reuse
Content Analysis Keys ReuseContent Analysis Keys Reuse
Content Analysis Keys Reuse
 
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
 
Taxonomy Governance and Iteration
Taxonomy Governance and IterationTaxonomy Governance and Iteration
Taxonomy Governance and Iteration
 
Cataloging roundtable discussion questions
Cataloging roundtable discussion questionsCataloging roundtable discussion questions
Cataloging roundtable discussion questions
 
S doherty counting_dragons_dita-reuse
S doherty counting_dragons_dita-reuseS doherty counting_dragons_dita-reuse
S doherty counting_dragons_dita-reuse
 
Text Analytics for Non-Experts
Text Analytics for Non-ExpertsText Analytics for Non-Experts
Text Analytics for Non-Experts
 
IWMW 2002: The Value of Metadata and How to Realise It
IWMW 2002: The Value of Metadata and How to Realise ItIWMW 2002: The Value of Metadata and How to Realise It
IWMW 2002: The Value of Metadata and How to Realise It
 
Taxonomy Governance
Taxonomy GovernanceTaxonomy Governance
Taxonomy Governance
 

Plus de Semantic Web Company

How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...
How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...
How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...Semantic Web Company
 
Introduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AIIntroduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AISemantic Web Company
 
Deep Text Analytics - How to extract hidden information and aboutness from text
Deep Text Analytics - How to extract hidden information and aboutness from textDeep Text Analytics - How to extract hidden information and aboutness from text
Deep Text Analytics - How to extract hidden information and aboutness from textSemantic Web Company
 
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management SystemLeveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management SystemSemantic Web Company
 
Linking SharePoint Documents with Structured Data
Linking SharePoint Documents with Structured DataLinking SharePoint Documents with Structured Data
Linking SharePoint Documents with Structured DataSemantic Web Company
 
The Fast Track to Knowledge Engineering
The Fast Track to Knowledge EngineeringThe Fast Track to Knowledge Engineering
The Fast Track to Knowledge EngineeringSemantic Web Company
 
Leveraging Taxonomy Management with Machine Learning
Leveraging Taxonomy Management with Machine LearningLeveraging Taxonomy Management with Machine Learning
Leveraging Taxonomy Management with Machine LearningSemantic Web Company
 
PoolParty GraphSearch - The Fusion of Search, Recommendation and Analytics
PoolParty GraphSearch - The Fusion of Search, Recommendation and AnalyticsPoolParty GraphSearch - The Fusion of Search, Recommendation and Analytics
PoolParty GraphSearch - The Fusion of Search, Recommendation and AnalyticsSemantic Web Company
 
Semantics as the Basis of Advanced Cognitive Computing
Semantics as the Basis of Advanced Cognitive ComputingSemantics as the Basis of Advanced Cognitive Computing
Semantics as the Basis of Advanced Cognitive ComputingSemantic Web Company
 
PoolParty 6.0 - Climbing the Semantic Ladder
PoolParty 6.0 - Climbing the Semantic LadderPoolParty 6.0 - Climbing the Semantic Ladder
PoolParty 6.0 - Climbing the Semantic LadderSemantic Web Company
 
PoolParty Semantic Suite - Release 6.0 (Technical Overview)
PoolParty Semantic Suite - Release 6.0 (Technical Overview)PoolParty Semantic Suite - Release 6.0 (Technical Overview)
PoolParty Semantic Suite - Release 6.0 (Technical Overview)Semantic Web Company
 
PoolParty Semantic Suite - Release 5.5
PoolParty Semantic Suite - Release 5.5PoolParty Semantic Suite - Release 5.5
PoolParty Semantic Suite - Release 5.5Semantic Web Company
 
PowerTagging for Sharepoint and Office 365
PowerTagging for Sharepoint and Office 365PowerTagging for Sharepoint and Office 365
PowerTagging for Sharepoint and Office 365Semantic Web Company
 
From SKOS over SKOS-XL to Custom Ontologies
From SKOS over SKOS-XL to Custom OntologiesFrom SKOS over SKOS-XL to Custom Ontologies
From SKOS over SKOS-XL to Custom OntologiesSemantic Web Company
 
PoolParty Semantic Suite: Solutions for Sustainable Development: The Climate ...
PoolParty Semantic Suite: Solutions for Sustainable Development: The Climate ...PoolParty Semantic Suite: Solutions for Sustainable Development: The Climate ...
PoolParty Semantic Suite: Solutions for Sustainable Development: The Climate ...Semantic Web Company
 

Plus de Semantic Web Company (20)

How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...
How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...
How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...
 
Introduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AIIntroduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AI
 
Deep Text Analytics - How to extract hidden information and aboutness from text
Deep Text Analytics - How to extract hidden information and aboutness from textDeep Text Analytics - How to extract hidden information and aboutness from text
Deep Text Analytics - How to extract hidden information and aboutness from text
 
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management SystemLeveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
 
Linking SharePoint Documents with Structured Data
Linking SharePoint Documents with Structured DataLinking SharePoint Documents with Structured Data
Linking SharePoint Documents with Structured Data
 
The Fast Track to Knowledge Engineering
The Fast Track to Knowledge EngineeringThe Fast Track to Knowledge Engineering
The Fast Track to Knowledge Engineering
 
Semantic AI
Semantic AISemantic AI
Semantic AI
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 
PoolParty Semantic Classifier
PoolParty Semantic ClassifierPoolParty Semantic Classifier
PoolParty Semantic Classifier
 
Leveraging Taxonomy Management with Machine Learning
Leveraging Taxonomy Management with Machine LearningLeveraging Taxonomy Management with Machine Learning
Leveraging Taxonomy Management with Machine Learning
 
Taxonomies put in the right place
Taxonomies put in the right placeTaxonomies put in the right place
Taxonomies put in the right place
 
PoolParty GraphSearch - The Fusion of Search, Recommendation and Analytics
PoolParty GraphSearch - The Fusion of Search, Recommendation and AnalyticsPoolParty GraphSearch - The Fusion of Search, Recommendation and Analytics
PoolParty GraphSearch - The Fusion of Search, Recommendation and Analytics
 
Semantics as the Basis of Advanced Cognitive Computing
Semantics as the Basis of Advanced Cognitive ComputingSemantics as the Basis of Advanced Cognitive Computing
Semantics as the Basis of Advanced Cognitive Computing
 
Structured Content Meets Taxonomy
Structured Content Meets TaxonomyStructured Content Meets Taxonomy
Structured Content Meets Taxonomy
 
PoolParty 6.0 - Climbing the Semantic Ladder
PoolParty 6.0 - Climbing the Semantic LadderPoolParty 6.0 - Climbing the Semantic Ladder
PoolParty 6.0 - Climbing the Semantic Ladder
 
PoolParty Semantic Suite - Release 6.0 (Technical Overview)
PoolParty Semantic Suite - Release 6.0 (Technical Overview)PoolParty Semantic Suite - Release 6.0 (Technical Overview)
PoolParty Semantic Suite - Release 6.0 (Technical Overview)
 
PoolParty Semantic Suite - Release 5.5
PoolParty Semantic Suite - Release 5.5PoolParty Semantic Suite - Release 5.5
PoolParty Semantic Suite - Release 5.5
 
PowerTagging for Sharepoint and Office 365
PowerTagging for Sharepoint and Office 365PowerTagging for Sharepoint and Office 365
PowerTagging for Sharepoint and Office 365
 
From SKOS over SKOS-XL to Custom Ontologies
From SKOS over SKOS-XL to Custom OntologiesFrom SKOS over SKOS-XL to Custom Ontologies
From SKOS over SKOS-XL to Custom Ontologies
 
PoolParty Semantic Suite: Solutions for Sustainable Development: The Climate ...
PoolParty Semantic Suite: Solutions for Sustainable Development: The Climate ...PoolParty Semantic Suite: Solutions for Sustainable Development: The Climate ...
PoolParty Semantic Suite: Solutions for Sustainable Development: The Climate ...
 

Dernier

Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROmotivationalword821
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 

Dernier (20)

Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTRO
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 

Taxonomy Quality Assessment

  • 1. Andreas Blumauer CEO & Managing Partner Semantic Web Company & PoolParty Semantic Suite TAXONOMY QUALITY ASSESSMENT: TOOLS & TECHNIQUES Taxonomy Boot Camp 2016 Washington, DC 1
  • 2. INTRODUCTION 2 Semantic Web Company founder & CEO of Andreas Blumauer developer and vendor of 2004 founded 5.5 current Version active at based on Vienna located part of Taxonomy Knowledge Graph standard for part of is a >200serves customers Ontology manages part ofis a
  • 3. Aspects of Taxonomy Quality Types of taxonomy quality metrics, and for which scenarios they are relevant 3
  • 4. Why is taxonomy quality important? Some examples for quality issues and their possible consequences 4 ▸ Missing labels ▹ AGROVOC (FAO) defines concepts in 25 different languages. While most concepts have English labels attached, only 38% have German labels. ▹ This can be a problem for multilingual applications that rely on label translations. ▸ Orphan concepts ▹ An orphan concept is a concept that has no semantic relation with any other concept. Although it might have attached lexical labels, it lacks valuable context information. ▹ This can be crucial for retrieval tasks such as search query expansion. ▸ Mismatch between content and taxonomy ▹ There are only minor overlaps between the scope of the documents (or data) to be indexed and the scope of the controlled vocabulary in use. ▹ This leads to a sparse enrichment of the document index by semantic information. See also: Finding quality issues in SKOS vocabularies (Christian Mader, Bernhard Haslhofer, Antoine Isaac)
  • 5. Taxonomy quality issues are more frequently observed than some might expect 5 See also: Finding quality issues in SKOS vocabularies
  • 6. Taxonomy quality criteria and issues at different levels 6 1. Formal integrity conditions based on SKOS ▹ Construction of well-formed and consistent data to promote interoperability ▹ Example: No two concepts may be connected by both related and broader transitive ▹ Read more: SKOS: A Guide for Information Professionals (Jane Frazier) 2. Labeling and documentation issues ▹ Construction of taxonomies that allow support for complex retrieval tasks ▹ Example: No two concepts of a concept scheme may have the same preferred label ▹ Read more: SKOS Primer (Antoine Isaac / Ed Summers) 3. Structural issues ▹ Logic-based based processing of taxonomies ▹ Example: Avoidance of hierarchical cycles ▹ Read more: Key choices in the design of SKOS (Thomas Baker et al) 4. Content coverage ▹ Development of taxonomies that reflect well the scope of represented content ▹ Example: Avoid maintaining subtrees that only have limited occurrences in a representative document corpus ▹ Read more: Corpus management with PoolParty 5. Network topological issues (experimental) ▹ (Co-)occurrences of concepts in a corpus should be reflected in the network topology of a knowledge graph ▹ Example: Nodes/concepts with high betweenness centrality should occur correspondingly in a reference document corpus
  • 7. Why are standards-based technologies and tools so important when it comes to taxonomy quality management? 7 Spreadsheet editors are still the most common type of software application being used for taxonomy management. They cannot measure quality automatically.
  • 8. ‘Good’ quality depends on the usage scenario 8 Example: Google Product Taxonomy has no synonyms at all, only hierarchical relations
  • 9. How to pick the most relevant quality criteria for a taxonomy project 9 PoolParty supports various application scenarios. Quality checks can be enforced, reported, or ignored.
  • 10. How to pick the most relevant quality criteria for a taxonomy project 10 ▸ General purpose thesaurus vs. Custom enterprise taxonomy ▹ Custom enterprise taxonomies can be developed specifically on top of reference corpora ▹ General purpose thesauri are frequently used in the context of linked data environments → Linked data specific issues become more important ■ Missing In-Links ■ Missing Out-Links ■ Broken Links ■ Undefined SKOS Resources ■ HTTP URI Scheme Violation See also: PoolParty SKOS Quality Checker based on qSKOS
  • 11. Taxonomy Quality Metrics How quality issues can be unveiled and how insights can be used for further improvements 11
  • 14. Unveil mismatch between taxonomy and document corpus 14 Content Manager Integrator Taxonomist/ Ontologist Thesaurus Server Extractor PowerTagging uses API is user of is user of is basis of is basis of Index annotates enriches Corpus Learning/ Semantic Analysis CMS extends is basis of analyzes uses API
  • 15. Unveil mismatch between taxonomy and document corpus 15 PoolParty extracts concepts not being used in a reference corpus at all and provides suggestions how those concepts could be reworked or extended to become relevant.
  • 16. Unveil mismatch between taxonomy and document corpus 16 PoolParty extracts relevant candidate concepts based on a deep corpus analysis.
  • 17. Unveil mismatch between taxonomy and document corpus 17 PoolParty suggest possible ‘right places’ for the candidate concepts within the approved taxonomy.
  • 18. Unveil network topological issues 18 Example: STW Thesaurus for Economics
  • 19. Unveil network topological issues 19 Example: STW Thesaurus for Economics - Top 10 thesaurus concepts (betweenness)
  • 20. Combined analysis over network topology and reference corpus 20 Example: STW Thesaurus for Economics and reference corpus about ‘Crude Oil Market’
  • 21. Combined analysis over network topology and reference corpus 21 Example: STW Thesaurus for Economics and reference corpus about ‘Crude Oil Market’
  • 22. Combined analysis over network topology and reference corpus: Correlation Betweenness & Document Frequency 22 Example: STW Thesaurus for Economics and reference corpus about ‘Crude Oil Market’
  • 23. Techniques and Tools How they help to assess Taxonomy Quality 23
  • 24. BARTOC.org Basel Register of Thesauri, Ontologies & Classifications ▸ Unveil Taxonomy Quality by the Wisdom of the Crowd 24
  • 25. qSKOS ▸ qSKOS is a tool for finding quality issues in SKOS vocabularies ▸ Available as free online service at http://qskos.poolparty.biz/ ▸ SKOS taxonomy being analyzed with regards to 24 issues 25
  • 26. PoolParty Import Validator 26 ▸ RDF Validation to go beyond SKOS ▸ Checks are defined in RDF, repair strategies also defined as RDF ▸ 15 checks have been integrated
  • 27. Shapes Constraint Language (SHACL) ▸ “Do for RDF what XML Schema does for XML” ▸ Language for validating RDF graphs against a set of conditions ▸ SHACL shape graphs are used to validate that data graphs satisfy a set of conditions ▸ Current status: W3C Working Draft (14 August 2016) See also: Towards maintainable constraint validation and repair for taxonomies: The PoolParty approach (Christian Mader and Monika Solanki) 27
  • 28. GET YOUR TEST ACCOUNT GET CERTIFIED 28 Get your test account at www.poolparty.biz/demo Get certified at www.poolparty.biz/academy/
  • 29. CONNECT Andreas Blumauer CEO, Semantic Web Company ▸ a.blumauer@semantic-web.at ▸ http://at.linkedin.com/in/andreasblumauer ▸ https://twitter.com/semwebcompany ▸ https://www.poolparty.biz ▸ https://www.semantic-web.at 29 © Semantic Web Company - http://www.semantic-web.at/ and http://www.poolparty.biz/