SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Andreas Blumauer
CEO & Managing Partner
Semantic Web Company &
PoolParty Semantic Suite
TAXONOMY QUALITY
ASSESSMENT:
TOOLS & TECHNIQUES
Taxonomy
Boot Camp 2016
Washington, DC
1
INTRODUCTION
2
Semantic Web
Company
founder &
CEO of
Andreas
Blumauer
developer and
vendor of
2004
founded
5.5
current
Version
active at
based on
Vienna
located
part of Taxonomy Knowledge Graph
standard for
part of is a
>200serves customers
Ontology
manages
part ofis a
Aspects of
Taxonomy Quality
Types of taxonomy quality metrics,
and for which scenarios they are relevant
3
Why is taxonomy
quality important?
Some examples for
quality issues and
their possible
consequences
4 ▸ Missing labels
▹ AGROVOC (FAO) defines concepts in 25 different languages. While most concepts have
English labels attached, only 38% have German labels.
▹ This can be a problem for multilingual applications that rely on label translations.
▸ Orphan concepts
▹ An orphan concept is a concept that has no semantic relation with any other concept.
Although it might have attached lexical labels, it lacks valuable context information.
▹ This can be crucial for retrieval tasks such as search query expansion.
▸ Mismatch between content and taxonomy
▹ There are only minor overlaps between the scope of the documents (or data) to be
indexed and the scope of the controlled vocabulary in use.
▹ This leads to a sparse enrichment of the document index by semantic information.
See also: Finding quality issues in SKOS vocabularies
(Christian Mader, Bernhard Haslhofer, Antoine Isaac)
Taxonomy quality
issues are more
frequently
observed than
some might expect
5
See also: Finding quality issues in SKOS vocabularies
Taxonomy quality
criteria and issues
at different levels
6
1. Formal integrity conditions based on SKOS
▹ Construction of well-formed and consistent data to promote interoperability
▹ Example: No two concepts may be connected by both related and broader transitive
▹ Read more: SKOS: A Guide for Information Professionals (Jane Frazier)
2. Labeling and documentation issues
▹ Construction of taxonomies that allow support for complex retrieval tasks
▹ Example: No two concepts of a concept scheme may have the same preferred label
▹ Read more: SKOS Primer (Antoine Isaac / Ed Summers)
3. Structural issues
▹ Logic-based based processing of taxonomies
▹ Example: Avoidance of hierarchical cycles
▹ Read more: Key choices in the design of SKOS (Thomas Baker et al)
4. Content coverage
▹ Development of taxonomies that reflect well the scope of represented content
▹ Example: Avoid maintaining subtrees that only have limited occurrences in a representative
document corpus
▹ Read more: Corpus management with PoolParty
5. Network topological issues (experimental)
▹ (Co-)occurrences of concepts in a corpus should be reflected in the network topology of a
knowledge graph
▹ Example: Nodes/concepts with high betweenness centrality should occur correspondingly
in a reference document corpus
Why are
standards-based
technologies and
tools so important
when it comes to
taxonomy quality
management?
7
Spreadsheet editors are still the most common type of software application
being used for taxonomy management. They cannot measure quality automatically.
‘Good’ quality
depends on the
usage scenario
8
Example: Google Product Taxonomy has no synonyms at all, only hierarchical relations
How to pick the
most relevant
quality criteria for a
taxonomy project
9
PoolParty supports various application scenarios. Quality checks can be enforced,
reported, or ignored.
How to pick the
most relevant
quality criteria for a
taxonomy project
10 ▸ General purpose thesaurus vs.
Custom enterprise taxonomy
▹ Custom enterprise taxonomies can be developed specifically on top of reference corpora
▹ General purpose thesauri are frequently used in the context of linked data environments
→ Linked data specific issues become more important
■ Missing In-Links
■ Missing Out-Links
■ Broken Links
■ Undefined SKOS Resources
■ HTTP URI Scheme Violation
See also: PoolParty SKOS Quality Checker based on qSKOS
Taxonomy
Quality Metrics
How quality issues can be unveiled
and how insights can be used for further improvements
11
Repair label issues
12
Repair structural
issues
13
Unveil mismatch
between taxonomy
and document
corpus
14 Content Manager
Integrator
Taxonomist/
Ontologist
Thesaurus
Server
Extractor
PowerTagging
uses API
is user of
is user of
is basis of
is basis of
Index
annotates
enriches
Corpus Learning/
Semantic Analysis
CMS
extends
is basis of
analyzes
uses API
Unveil mismatch
between taxonomy
and document
corpus
15
PoolParty extracts concepts not being used in a reference corpus at all and provides
suggestions how those concepts could be reworked or extended to become relevant.
Unveil mismatch
between taxonomy
and document
corpus
16
PoolParty extracts relevant candidate concepts based on a deep corpus analysis.
Unveil mismatch
between taxonomy
and document
corpus
17
PoolParty suggest possible ‘right places’ for the candidate concepts within the approved
taxonomy.
Unveil network
topological issues
18
Example: STW Thesaurus for Economics
Unveil network
topological issues
19
Example: STW Thesaurus for Economics - Top 10 thesaurus concepts (betweenness)
Combined analysis
over network
topology and
reference corpus
20
Example: STW Thesaurus for Economics and reference corpus about ‘Crude Oil Market’
Combined analysis
over network
topology and
reference corpus
21
Example: STW Thesaurus for Economics and reference corpus about ‘Crude Oil Market’
Combined analysis
over network
topology and
reference corpus:
Correlation
Betweenness &
Document
Frequency
22
Example: STW Thesaurus for Economics and reference corpus about ‘Crude Oil Market’
Techniques and Tools
How they help to assess
Taxonomy Quality
23
BARTOC.org
Basel Register of
Thesauri,
Ontologies &
Classifications
▸ Unveil Taxonomy Quality by the Wisdom of the Crowd
24
qSKOS
▸ qSKOS is a tool for finding quality issues in SKOS vocabularies
▸ Available as free online service at http://qskos.poolparty.biz/
▸ SKOS taxonomy being analyzed with regards to 24 issues
25
PoolParty Import
Validator
26
▸ RDF Validation to go beyond SKOS
▸ Checks are defined in RDF, repair strategies also defined as RDF
▸ 15 checks have been integrated
Shapes Constraint
Language (SHACL)
▸ “Do for RDF what XML Schema does for XML”
▸ Language for validating RDF graphs against a set of conditions
▸ SHACL shape graphs are used to validate that data graphs satisfy a set of
conditions
▸ Current status: W3C Working Draft (14 August 2016)
See also: Towards maintainable constraint validation and repair for taxonomies:
The PoolParty approach (Christian Mader and Monika Solanki)
27
GET YOUR
TEST ACCOUNT
GET CERTIFIED
28
Get your test account at
www.poolparty.biz/demo
Get certified at
www.poolparty.biz/academy/
CONNECT
Andreas Blumauer
CEO, Semantic Web Company
▸ a.blumauer@semantic-web.at
▸ http://at.linkedin.com/in/andreasblumauer
▸ https://twitter.com/semwebcompany
▸ https://www.poolparty.biz
▸ https://www.semantic-web.at
29
© Semantic Web Company - http://www.semantic-web.at/ and http://www.poolparty.biz/

Contenu connexe

Tendances

LTM essentials
LTM essentialsLTM essentials
LTM essentials
bharadwajv
 

Tendances (20)

ZFS in 30 minutes
ZFS in 30 minutesZFS in 30 minutes
ZFS in 30 minutes
 
Wireless LAN Security, Policy, and Deployment Best Practices
Wireless LAN Security, Policy, and Deployment Best PracticesWireless LAN Security, Policy, and Deployment Best Practices
Wireless LAN Security, Policy, and Deployment Best Practices
 
FD.io VPP tap-inject with sample_plugins
FD.io VPP tap-inject with sample_pluginsFD.io VPP tap-inject with sample_plugins
FD.io VPP tap-inject with sample_plugins
 
Flexible Data Centre Fabric - FabricPath/TRILL, OTV, LISP and VXLAN
Flexible Data Centre Fabric - FabricPath/TRILL, OTV, LISP and VXLANFlexible Data Centre Fabric - FabricPath/TRILL, OTV, LISP and VXLAN
Flexible Data Centre Fabric - FabricPath/TRILL, OTV, LISP and VXLAN
 
EMEA Airheads- Aruba OS- Mobile First Platform– Aruba OS 8.0 introduction
EMEA Airheads- Aruba OS- Mobile First Platform– Aruba OS 8.0 introductionEMEA Airheads- Aruba OS- Mobile First Platform– Aruba OS 8.0 introduction
EMEA Airheads- Aruba OS- Mobile First Platform– Aruba OS 8.0 introduction
 
ホワイトボックス・スイッチの期待と現実
ホワイトボックス・スイッチの期待と現実ホワイトボックス・スイッチの期待と現実
ホワイトボックス・スイッチの期待と現実
 
Design Fundamentals for Remote and Branch Access Networks
Design Fundamentals for Remote and Branch Access NetworksDesign Fundamentals for Remote and Branch Access Networks
Design Fundamentals for Remote and Branch Access Networks
 
Practical examples of using extended events
Practical examples of using extended eventsPractical examples of using extended events
Practical examples of using extended events
 
A day in the life of a VSAN I/O - STO7875
A day in the life of a VSAN I/O - STO7875A day in the life of a VSAN I/O - STO7875
A day in the life of a VSAN I/O - STO7875
 
LTM essentials
LTM essentialsLTM essentials
LTM essentials
 
Zebra SRv6 CLI on Linux Dataplane (ENOG#49)
Zebra SRv6 CLI on Linux Dataplane (ENOG#49)Zebra SRv6 CLI on Linux Dataplane (ENOG#49)
Zebra SRv6 CLI on Linux Dataplane (ENOG#49)
 
MPLS ppt
MPLS pptMPLS ppt
MPLS ppt
 
Openstack Trunk Port
Openstack Trunk PortOpenstack Trunk Port
Openstack Trunk Port
 
Workshop on CIFS / SMB Protocol Performance Analysis
Workshop on CIFS / SMB Protocol Performance AnalysisWorkshop on CIFS / SMB Protocol Performance Analysis
Workshop on CIFS / SMB Protocol Performance Analysis
 
Winhon Network Solution
Winhon Network SolutionWinhon Network Solution
Winhon Network Solution
 
Receive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-netReceive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-net
 
The right Wireless Architecture for you
The right Wireless Architecture for youThe right Wireless Architecture for you
The right Wireless Architecture for you
 
BGP.HE.NET by Walt Wollny
BGP.HE.NET by Walt WollnyBGP.HE.NET by Walt Wollny
BGP.HE.NET by Walt Wollny
 
Airheads barcelona 2010 rf design for retail warehousing manufacturing
Airheads barcelona 2010   rf design for retail warehousing manufacturingAirheads barcelona 2010   rf design for retail warehousing manufacturing
Airheads barcelona 2010 rf design for retail warehousing manufacturing
 
Introduction to nexux from zero to Hero
Introduction to nexux  from zero to HeroIntroduction to nexux  from zero to Hero
Introduction to nexux from zero to Hero
 

En vedette

En vedette (13)

Taxonomy-Driven UX
Taxonomy-Driven UXTaxonomy-Driven UX
Taxonomy-Driven UX
 
Taming taxonomy—a practical intro
Taming taxonomy—a practical introTaming taxonomy—a practical intro
Taming taxonomy—a practical intro
 
Interactions South America 2015 Keynote
Interactions South America 2015 KeynoteInteractions South America 2015 Keynote
Interactions South America 2015 Keynote
 
Achim Steinacker: Technical Documentation in the age of Industry 4.0
Achim Steinacker: Technical Documentation in the age of Industry 4.0Achim Steinacker: Technical Documentation in the age of Industry 4.0
Achim Steinacker: Technical Documentation in the age of Industry 4.0
 
Taxonomies for E-commerce
Taxonomies for E-commerceTaxonomies for E-commerce
Taxonomies for E-commerce
 
Understanding Website Taxonomy
Understanding Website TaxonomyUnderstanding Website Taxonomy
Understanding Website Taxonomy
 
PROPEL . Austrian's Roadmap for Enterprise Linked Data
PROPEL . Austrian's Roadmap for Enterprise Linked DataPROPEL . Austrian's Roadmap for Enterprise Linked Data
PROPEL . Austrian's Roadmap for Enterprise Linked Data
 
Taxonomy Is User Experience
Taxonomy Is User ExperienceTaxonomy Is User Experience
Taxonomy Is User Experience
 
Taxonomies and Ontologies – The Yin and Yang of Knowledge Modelling
Taxonomies and Ontologies – The Yin and Yang of Knowledge ModellingTaxonomies and Ontologies – The Yin and Yang of Knowledge Modelling
Taxonomies and Ontologies – The Yin and Yang of Knowledge Modelling
 
Financing options for brouwersdam in zuid holland
Financing options for brouwersdam in zuid hollandFinancing options for brouwersdam in zuid holland
Financing options for brouwersdam in zuid holland
 
Pivot Conference '13 - Snackable Take Home Lessons
Pivot Conference '13 - Snackable Take Home LessonsPivot Conference '13 - Snackable Take Home Lessons
Pivot Conference '13 - Snackable Take Home Lessons
 
Blooms taxonomy powerpoint
Blooms taxonomy powerpointBlooms taxonomy powerpoint
Blooms taxonomy powerpoint
 
The A-to-Z Guide to SlideShare
The A-to-Z Guide to SlideShareThe A-to-Z Guide to SlideShare
The A-to-Z Guide to SlideShare
 

Similaire à Taxonomy Quality Assessment

SWT Lecture Session 7 - Advanced uses of RDFS
SWT Lecture Session 7 - Advanced uses of RDFSSWT Lecture Session 7 - Advanced uses of RDFS
SWT Lecture Session 7 - Advanced uses of RDFS
Mariano Rodriguez-Muro
 

Similaire à Taxonomy Quality Assessment (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
PoolParty Semantic Suite - Functional Overview
PoolParty Semantic Suite - Functional OverviewPoolParty Semantic Suite - Functional Overview
PoolParty Semantic Suite - Functional Overview
 
Transforming knowledge management for climate action
Transforming knowledge management for climate action  Transforming knowledge management for climate action
Transforming knowledge management for climate action
 
Low Hanging Fruit Breakout Discussion #2
Low Hanging Fruit Breakout Discussion #2 Low Hanging Fruit Breakout Discussion #2
Low Hanging Fruit Breakout Discussion #2
 
Aiim motorola-taxo-integration-03-15-10-cg
Aiim motorola-taxo-integration-03-15-10-cgAiim motorola-taxo-integration-03-15-10-cg
Aiim motorola-taxo-integration-03-15-10-cg
 
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
 
Expressing Concept Schemes & Competency Frameworks in CTDL
Expressing Concept Schemes & Competency Frameworks in CTDLExpressing Concept Schemes & Competency Frameworks in CTDL
Expressing Concept Schemes & Competency Frameworks in CTDL
 
SKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategiesSKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategies
 
PoolParty Semantic Suite: Management Briefing and Functional Overview
PoolParty Semantic Suite: Management Briefing and Functional Overview PoolParty Semantic Suite: Management Briefing and Functional Overview
PoolParty Semantic Suite: Management Briefing and Functional Overview
 
Metadata: Digital Humanties
Metadata: Digital HumantiesMetadata: Digital Humanties
Metadata: Digital Humanties
 
SWT Lecture Session 7 - Advanced uses of RDFS
SWT Lecture Session 7 - Advanced uses of RDFSSWT Lecture Session 7 - Advanced uses of RDFS
SWT Lecture Session 7 - Advanced uses of RDFS
 
Building Bridges with Taxonomy: Enabling Semantic Integration
Building Bridges with Taxonomy: Enabling Semantic IntegrationBuilding Bridges with Taxonomy: Enabling Semantic Integration
Building Bridges with Taxonomy: Enabling Semantic Integration
 
Taxonomies and Metadata
Taxonomies and MetadataTaxonomies and Metadata
Taxonomies and Metadata
 
Content Analysis Keys Reuse
Content Analysis Keys ReuseContent Analysis Keys Reuse
Content Analysis Keys Reuse
 
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
 
Taxonomy Governance and Iteration
Taxonomy Governance and IterationTaxonomy Governance and Iteration
Taxonomy Governance and Iteration
 
Cataloging roundtable discussion questions
Cataloging roundtable discussion questionsCataloging roundtable discussion questions
Cataloging roundtable discussion questions
 
S doherty counting_dragons_dita-reuse
S doherty counting_dragons_dita-reuseS doherty counting_dragons_dita-reuse
S doherty counting_dragons_dita-reuse
 
Text Analytics for Non-Experts
Text Analytics for Non-ExpertsText Analytics for Non-Experts
Text Analytics for Non-Experts
 
IWMW 2002: The Value of Metadata and How to Realise It
IWMW 2002: The Value of Metadata and How to Realise ItIWMW 2002: The Value of Metadata and How to Realise It
IWMW 2002: The Value of Metadata and How to Realise It
 

Plus de Semantic Web Company

Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management SystemLeveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
Semantic Web Company
 
Linking SharePoint Documents with Structured Data
Linking SharePoint Documents with Structured DataLinking SharePoint Documents with Structured Data
Linking SharePoint Documents with Structured Data
Semantic Web Company
 
Semantics as the Basis of Advanced Cognitive Computing
Semantics as the Basis of Advanced Cognitive ComputingSemantics as the Basis of Advanced Cognitive Computing
Semantics as the Basis of Advanced Cognitive Computing
Semantic Web Company
 

Plus de Semantic Web Company (20)

How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...
How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...
How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...
 
Introduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AIIntroduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AI
 
Deep Text Analytics - How to extract hidden information and aboutness from text
Deep Text Analytics - How to extract hidden information and aboutness from textDeep Text Analytics - How to extract hidden information and aboutness from text
Deep Text Analytics - How to extract hidden information and aboutness from text
 
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management SystemLeveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
 
Linking SharePoint Documents with Structured Data
Linking SharePoint Documents with Structured DataLinking SharePoint Documents with Structured Data
Linking SharePoint Documents with Structured Data
 
The Fast Track to Knowledge Engineering
The Fast Track to Knowledge EngineeringThe Fast Track to Knowledge Engineering
The Fast Track to Knowledge Engineering
 
Semantic AI
Semantic AISemantic AI
Semantic AI
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 
PoolParty Semantic Classifier
PoolParty Semantic ClassifierPoolParty Semantic Classifier
PoolParty Semantic Classifier
 
Leveraging Taxonomy Management with Machine Learning
Leveraging Taxonomy Management with Machine LearningLeveraging Taxonomy Management with Machine Learning
Leveraging Taxonomy Management with Machine Learning
 
Taxonomies put in the right place
Taxonomies put in the right placeTaxonomies put in the right place
Taxonomies put in the right place
 
PoolParty GraphSearch - The Fusion of Search, Recommendation and Analytics
PoolParty GraphSearch - The Fusion of Search, Recommendation and AnalyticsPoolParty GraphSearch - The Fusion of Search, Recommendation and Analytics
PoolParty GraphSearch - The Fusion of Search, Recommendation and Analytics
 
Semantics as the Basis of Advanced Cognitive Computing
Semantics as the Basis of Advanced Cognitive ComputingSemantics as the Basis of Advanced Cognitive Computing
Semantics as the Basis of Advanced Cognitive Computing
 
Structured Content Meets Taxonomy
Structured Content Meets TaxonomyStructured Content Meets Taxonomy
Structured Content Meets Taxonomy
 
PoolParty 6.0 - Climbing the Semantic Ladder
PoolParty 6.0 - Climbing the Semantic LadderPoolParty 6.0 - Climbing the Semantic Ladder
PoolParty 6.0 - Climbing the Semantic Ladder
 
PoolParty Semantic Suite - Release 6.0 (Technical Overview)
PoolParty Semantic Suite - Release 6.0 (Technical Overview)PoolParty Semantic Suite - Release 6.0 (Technical Overview)
PoolParty Semantic Suite - Release 6.0 (Technical Overview)
 
PoolParty Semantic Suite - Release 5.5
PoolParty Semantic Suite - Release 5.5PoolParty Semantic Suite - Release 5.5
PoolParty Semantic Suite - Release 5.5
 
PowerTagging for Sharepoint and Office 365
PowerTagging for Sharepoint and Office 365PowerTagging for Sharepoint and Office 365
PowerTagging for Sharepoint and Office 365
 
From SKOS over SKOS-XL to Custom Ontologies
From SKOS over SKOS-XL to Custom OntologiesFrom SKOS over SKOS-XL to Custom Ontologies
From SKOS over SKOS-XL to Custom Ontologies
 
PoolParty Semantic Suite: Solutions for Sustainable Development: The Climate ...
PoolParty Semantic Suite: Solutions for Sustainable Development: The Climate ...PoolParty Semantic Suite: Solutions for Sustainable Development: The Climate ...
PoolParty Semantic Suite: Solutions for Sustainable Development: The Climate ...
 

Dernier

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 

Dernier (20)

%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 

Taxonomy Quality Assessment

  • 1. Andreas Blumauer CEO & Managing Partner Semantic Web Company & PoolParty Semantic Suite TAXONOMY QUALITY ASSESSMENT: TOOLS & TECHNIQUES Taxonomy Boot Camp 2016 Washington, DC 1
  • 2. INTRODUCTION 2 Semantic Web Company founder & CEO of Andreas Blumauer developer and vendor of 2004 founded 5.5 current Version active at based on Vienna located part of Taxonomy Knowledge Graph standard for part of is a >200serves customers Ontology manages part ofis a
  • 3. Aspects of Taxonomy Quality Types of taxonomy quality metrics, and for which scenarios they are relevant 3
  • 4. Why is taxonomy quality important? Some examples for quality issues and their possible consequences 4 ▸ Missing labels ▹ AGROVOC (FAO) defines concepts in 25 different languages. While most concepts have English labels attached, only 38% have German labels. ▹ This can be a problem for multilingual applications that rely on label translations. ▸ Orphan concepts ▹ An orphan concept is a concept that has no semantic relation with any other concept. Although it might have attached lexical labels, it lacks valuable context information. ▹ This can be crucial for retrieval tasks such as search query expansion. ▸ Mismatch between content and taxonomy ▹ There are only minor overlaps between the scope of the documents (or data) to be indexed and the scope of the controlled vocabulary in use. ▹ This leads to a sparse enrichment of the document index by semantic information. See also: Finding quality issues in SKOS vocabularies (Christian Mader, Bernhard Haslhofer, Antoine Isaac)
  • 5. Taxonomy quality issues are more frequently observed than some might expect 5 See also: Finding quality issues in SKOS vocabularies
  • 6. Taxonomy quality criteria and issues at different levels 6 1. Formal integrity conditions based on SKOS ▹ Construction of well-formed and consistent data to promote interoperability ▹ Example: No two concepts may be connected by both related and broader transitive ▹ Read more: SKOS: A Guide for Information Professionals (Jane Frazier) 2. Labeling and documentation issues ▹ Construction of taxonomies that allow support for complex retrieval tasks ▹ Example: No two concepts of a concept scheme may have the same preferred label ▹ Read more: SKOS Primer (Antoine Isaac / Ed Summers) 3. Structural issues ▹ Logic-based based processing of taxonomies ▹ Example: Avoidance of hierarchical cycles ▹ Read more: Key choices in the design of SKOS (Thomas Baker et al) 4. Content coverage ▹ Development of taxonomies that reflect well the scope of represented content ▹ Example: Avoid maintaining subtrees that only have limited occurrences in a representative document corpus ▹ Read more: Corpus management with PoolParty 5. Network topological issues (experimental) ▹ (Co-)occurrences of concepts in a corpus should be reflected in the network topology of a knowledge graph ▹ Example: Nodes/concepts with high betweenness centrality should occur correspondingly in a reference document corpus
  • 7. Why are standards-based technologies and tools so important when it comes to taxonomy quality management? 7 Spreadsheet editors are still the most common type of software application being used for taxonomy management. They cannot measure quality automatically.
  • 8. ‘Good’ quality depends on the usage scenario 8 Example: Google Product Taxonomy has no synonyms at all, only hierarchical relations
  • 9. How to pick the most relevant quality criteria for a taxonomy project 9 PoolParty supports various application scenarios. Quality checks can be enforced, reported, or ignored.
  • 10. How to pick the most relevant quality criteria for a taxonomy project 10 ▸ General purpose thesaurus vs. Custom enterprise taxonomy ▹ Custom enterprise taxonomies can be developed specifically on top of reference corpora ▹ General purpose thesauri are frequently used in the context of linked data environments → Linked data specific issues become more important ■ Missing In-Links ■ Missing Out-Links ■ Broken Links ■ Undefined SKOS Resources ■ HTTP URI Scheme Violation See also: PoolParty SKOS Quality Checker based on qSKOS
  • 11. Taxonomy Quality Metrics How quality issues can be unveiled and how insights can be used for further improvements 11
  • 14. Unveil mismatch between taxonomy and document corpus 14 Content Manager Integrator Taxonomist/ Ontologist Thesaurus Server Extractor PowerTagging uses API is user of is user of is basis of is basis of Index annotates enriches Corpus Learning/ Semantic Analysis CMS extends is basis of analyzes uses API
  • 15. Unveil mismatch between taxonomy and document corpus 15 PoolParty extracts concepts not being used in a reference corpus at all and provides suggestions how those concepts could be reworked or extended to become relevant.
  • 16. Unveil mismatch between taxonomy and document corpus 16 PoolParty extracts relevant candidate concepts based on a deep corpus analysis.
  • 17. Unveil mismatch between taxonomy and document corpus 17 PoolParty suggest possible ‘right places’ for the candidate concepts within the approved taxonomy.
  • 18. Unveil network topological issues 18 Example: STW Thesaurus for Economics
  • 19. Unveil network topological issues 19 Example: STW Thesaurus for Economics - Top 10 thesaurus concepts (betweenness)
  • 20. Combined analysis over network topology and reference corpus 20 Example: STW Thesaurus for Economics and reference corpus about ‘Crude Oil Market’
  • 21. Combined analysis over network topology and reference corpus 21 Example: STW Thesaurus for Economics and reference corpus about ‘Crude Oil Market’
  • 22. Combined analysis over network topology and reference corpus: Correlation Betweenness & Document Frequency 22 Example: STW Thesaurus for Economics and reference corpus about ‘Crude Oil Market’
  • 23. Techniques and Tools How they help to assess Taxonomy Quality 23
  • 24. BARTOC.org Basel Register of Thesauri, Ontologies & Classifications ▸ Unveil Taxonomy Quality by the Wisdom of the Crowd 24
  • 25. qSKOS ▸ qSKOS is a tool for finding quality issues in SKOS vocabularies ▸ Available as free online service at http://qskos.poolparty.biz/ ▸ SKOS taxonomy being analyzed with regards to 24 issues 25
  • 26. PoolParty Import Validator 26 ▸ RDF Validation to go beyond SKOS ▸ Checks are defined in RDF, repair strategies also defined as RDF ▸ 15 checks have been integrated
  • 27. Shapes Constraint Language (SHACL) ▸ “Do for RDF what XML Schema does for XML” ▸ Language for validating RDF graphs against a set of conditions ▸ SHACL shape graphs are used to validate that data graphs satisfy a set of conditions ▸ Current status: W3C Working Draft (14 August 2016) See also: Towards maintainable constraint validation and repair for taxonomies: The PoolParty approach (Christian Mader and Monika Solanki) 27
  • 28. GET YOUR TEST ACCOUNT GET CERTIFIED 28 Get your test account at www.poolparty.biz/demo Get certified at www.poolparty.biz/academy/
  • 29. CONNECT Andreas Blumauer CEO, Semantic Web Company ▸ a.blumauer@semantic-web.at ▸ http://at.linkedin.com/in/andreasblumauer ▸ https://twitter.com/semwebcompany ▸ https://www.poolparty.biz ▸ https://www.semantic-web.at 29 © Semantic Web Company - http://www.semantic-web.at/ and http://www.poolparty.biz/