SlideShare a Scribd company logo
1 of 25
The role of thesauri in
data modeling
Danny Greefhorst
dgreefhorst@archixl.nl
Topics in this presentation
• What is a thesaurus and why is it valuable?
• What does a thesaurus look like?
• How does a thesaurus relate to a data model?
• SKOS as a language for describing thesauri
• Guidelines for good definitions (based on ISO 704)
• Quality control for thesauri
• A thesaurus is a type of controlled vocabulary for content retrieval.
• A controlled vocabulary is a defined list of explicitly allowed terms used to
index, categorize, tag, sort, and retrieve content through browsing and
searching.
• A thesaurus provides information about each term and its relationship to
other terms.
• Relationships are either hierarchical, associative or equivalent.
• Thesauri can be used to:
• organize unstructured content
• uncover relationships between content from different media
• improve website navigation
• optimize search
Thesaurus in the Data Management Body of
Knowledge
3
• A business glossary is a means of sharing this vocabulary within the organization.
• A data steward is generally responsible for business glossary content.
• They enhance enterprise knowledge by associating data assets with glossary
terms.
• Business glossaries have the following objectives:
• enable common understanding of the core business concepts and
terminology
• reduce the risk that data will be misused due to inconsistent
understanding of the business concepts
• improve the alignment between technology assets (with their
technical naming conventions) and the
business organization
• maximize search capability and enable access to
documented institutional knowlegde
Business glossary in Data Management Body of
Knowledge
4
Link concepts to other objects
Concept
Document/ web content
Application
Business rule
API specification
Database definition
Data model
Dataset
Dashboard/report
A controlled vocabulary is needed to make data FAIR
https://www.go-fair.org/fair-principles/
Concepts and data lineage - wat does the data mean?
Regulations such as PERDARR/BCBS239 ask explicitly for a catalogue of
concepts:
• As a precondition, a bank should have a “dictionary” of the concepts used, such that data
is defined consistently across an organization
• A bank should develop an inventory and classification of risk data items which includes a
reference to the concepts used to elaborate the reports.
Data Data Data
Concepts Concepts Concepts
Report
Horizontal data lineage
Vertical
data
lineage
Practical template for concepts
Name Description
Term A preferred linguistic reference to a concept.
URI A unique identifier to the concept.
Domain Domain in which the concept exists.
Definition The formal definition of the concept.
Source A reference to the source of the definition.
Informal definition A simple definition of the concept that is understandable for a broad audience.
Explanation A further clarification of the concept and the way it is used in the specific context.
Editorial notes Remarks that are related to decisions made during the description of the concept.
Examples A short summary or description of example instances of the concept.
Synonyms Terms that denote almost the same concept.
Exact match Concepts in another thesaurus that denote the same concept.
Related Concepts that are related to the concept in another (non-hierarchical) manner.
Broader Concepts that have a broader meaning than the concept.
Broader partitive Concepts that represent a whole that the concept is a part of.
Levels of modeling
Thesaurus
Logical data model
Physical data model
A collection of concepts and their relationships
A design of a data structure
A technology-specific representation of data
Information model A formal description of a universe of discourse
Conceptual
Logical
Physical
Level Type of model Description
Semiotic triangle
Thought or reference
Referent
Symbol
Stands for
Source: Ogden and Richards (1923)
A concept model is a model that develops the meaning of
core concepts for a problem domain, defines their collective
structure, and specifies the appropriate vocabulary needed
to communicate about it consistently.
Data models can usually be rather easily derived from
concept models
Strengths of a concept model:
• Provides a business-friendly way to communicate with
stakeholders about precise meanings and subtle
distinctions.
• Is independent of data design biases and the often
limited business vocabulary coverage of data models.
• Proves highly useful for white-collar, knowledge-rich,
decision-laden business processes.
• Helps ensure that large numbers of business rules and
complex decision tables are free of ambiguity and fit
together cohesively.
A thesaurus is close to a concept model
Source: Ron Ross: “Business Rule Concepts” and https://www.brcommunity.com/articles.php?id=b779
Linking concepts to a data model – MIM standard
All model elements have a property “Concept”:
Reference to a concept, from a model element, indicating on
which concept, or concepts, the information model element
is based. The reference is in the form of a term or a URI.
SKOS - Simple knowledge organisation system
• Open standard of the W3C – defined in 2009
• Part of and based on Linked Data standards such as RDF
• Makes every concept findable on the web with a URI
• Offers a model for describing knowledge organisation systems such as thesauri
• Based on general theory and standards about thesauri
• Specifically aimed at publication of concepts on the web
• Simplified model for describing concepts compared to other systems
• Uses RDF and accompanyning standards (XML, TTL, JSON-LD)
• Supported by various commercial and open source tools
• Can be combined with the SKOS-THES standard to include part of and instance of
relationships
• Can be combined with the Dublin Core metadata standard
• More information: https://www.w3.org/TR/skos-primer/
Practical template mapped to SKOS
Name SKOS representation
Term skos:prefLabel
URI skos:Concept
Domain skos:member
Definition skos:definition
Source dc:source
Informal definition rdfs:comment
Explanation skos:scopeNote
Editorial notes skos:editorialNote
Examples skos:example
Synonyms skos:altLabel
Exact match skos:exactMatch
Related skos:related
Broader skos:broader
Broader partitive isothes:broaderPartitive
Practical template for concepts
Name Description
Term A preferred linguistic reference to a concept.
URI A unique identifier to the concept.
Domain Domain in which the concept exists.
Definition The formal definition of the concept.
Source A reference to the source of the definition.
Informal definition A simple definition of the concept that is understandable for a broad audience.
Explanation A further clarification of the concept and the way it is used in the specific context.
Editorial notes Remarks that are related to decisions made during the description of the concept.
Examples A short summary or description of example instances of the concept.
Synonyms Terms that denote almost the same concept.
Exact match Concepts in another thesaurus that denote the same concept.
Related Concepts that are related to the concept in another (non-hierarchical) manner.
Broader Concepts that have a broader meaning than the concept.
Broader partitive Concepts that represent a whole that the concept is a part of.
A SKOS concept
<?xml version="1.0" encoding="utf-8" ?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:dc="http://purl.org/dc/terms/"
xmlns:ns0="http://www.eionet.europa.eu/gemet/2004/06/gemet-schema.rdf#">
<skos:narrower rdf:resource="http://www.eionet.europa.eu/gemet/concept/15031"/>
<dc:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">
2004-09-08T09:59:20+00:00</dc:modified>
<skos:prefLabel xml:lang="en">air quality</skos:prefLabel>
<skos:prefLabel xml:lang="nl">luchtkwaliteit</skos:prefLabel>
<dc:created rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">
2004-09-08T09:59:20+00:00</dc:created>
<skos:definition xml:lang="en">The degree to which air is polluted; the type and
maximum concentration of man-produced pollutants that should be permitted in the
atmosphere.</skos:definition>
</rdf:RDF>
Guidelines for formulating definitions of concepts (1)
• Connect with general language use and the language in the organization
in terms and definitions
“Car” instead of “Automobile”
• Define terms with a short name (term), in singular and starting with a
capital letter
“Car” instead of “Cars”
• Keep definitions as short as possible; include only distinguishing features
"a motorized vehicle with 4 wheels" instead of "a motorized vehicle with 4 wheels that
can be used for both private and business transport“
• Use intensional definitions where possible; name the distinguishing
features of a concept
“a 4-wheel motorized vehicle” instead of “sedan or station wagon”
Guidelines for formulating definitions of concepts (2)
• Start by defining more general and broader words
Define car first, then define station wagon
• Define a term with a narrower meaning such as “A <broader notion> that…”
A station wagon is “a car with a large cargo area”
• Do not include features of a broader concept in the definition of a concept
Not: a station wagon is “a car with 4 wheels and a large loading space”
• Adopt definitions from official sources where possible and consistent with
proprietary terminology
Do not adopt a definition from a commercial source (such as a supplier)
Guidelines for formulating definitions of concepts (3)
• Define separate terms for all non-common words in definitions
A “wheel” is a common word and needs no definition
• Define not only concepts that lead 1-1 to data elements, but also the relevant
context
Define road and driver in addition to car
• Avoid circular definitions; do not express a concept in terms of itself or its
conjugations and do not allow definitions of concepts to refer to each other
Driving is “moving around with a car” instead of “driving a car”
• Avoid definitions that contain negations; define what something is and not what
something isn't (unless you define opposing concepts)
Not: a car is “a vehicle that is not a truck”
Guidelines for formulating definitions of concepts (4)
• Avoid the term “data” and anything directly related to data in definitions of terms
Not: a car is “four-wheel vehicle data”
• Support the definition with an explanation that indicates how the term is used
within the organization
Explanation for cars: “Cars are only relevant to our organization from the perspective of
parking.”
• Avoid using homonyms whenever possible
Don't: define the term “Car” in two ways
• Only name synonyms that are frequently used and acceptable
“Automotive” as a synonym for “Car”, but not “Motor car”
Quality rules for SKOS thesauri
https://seco.cs.aalto.fi/publications/2014/suominen-mader-skosquality.pdf
Example quality rules in more detail
• Omitted or Invalid Language Tags: Literals should be tagged consistently with a
language tag.
• Undocumented Concepts: Concepts should include the set of “documentation
properties" as defined in the SKOS Reference.
• Overlapping Labels: No two concepts should have the same preferred lexical label
in a given language when they belong to the same concept scheme.
• Disjoint Labels Violation: skos:prefLabel, skos:altLabel and skos:hiddenLabel
should be pairwise disjoint properties.
• Extra Whitespace in Labels: Labels should not have any leading or trailing
whitespace.
• Orphan Concepts: Concepts should have associative or hierarchical relationships
with other concepts.
https://seco.cs.aalto.fi/publications/2014/suominen-mader-skosquality.pdf
Summary
• A thesaurus gives meaning to words
• Concepts can be linked to all sorts of artefacts, enabling findability
• Data modelling should start at a thesaurus level
• Open and FAIR data requires a controlled vocabulary such as a thesaurus
• SKOS is the de facto standard for thesauri
More information?
ArchiXL thesaurus:
https://begrippen.archixl.nl/archixl/nl/
BegrippenXL thesaurusplatform:
https://www.begrippenxl.nl/en/?clang=nl

More Related Content

What's hot

Topic map for Topic Maps case examples
Topic map for Topic Maps case examplesTopic map for Topic Maps case examples
Topic map for Topic Maps case examplestmra
 
Taxonomies and Folksonomies
Taxonomies and FolksonomiesTaxonomies and Folksonomies
Taxonomies and FolksonomiesHeather Hedden
 
Subject analysis, subject heading principles
Subject analysis, subject heading principlesSubject analysis, subject heading principles
Subject analysis, subject heading principlesRichard.Sapon-White
 
LIS 653, Session 9: Subject Analysis
LIS 653, Session 9: Subject Analysis LIS 653, Session 9: Subject Analysis
LIS 653, Session 9: Subject Analysis Dr. Starr Hoffman
 
Subject analysis: What's it all about, Alfie?
Subject analysis:  What's it all about, Alfie?Subject analysis:  What's it all about, Alfie?
Subject analysis: What's it all about, Alfie?Johan Koren
 
Subject analysis, an introduction
Subject analysis, an introductionSubject analysis, an introduction
Subject analysis, an introductionRichard.Sapon-White
 
4. Publication Strategy - Iustin Dornescu (UoW)
4. Publication Strategy - Iustin Dornescu (UoW)4. Publication Strategy - Iustin Dornescu (UoW)
4. Publication Strategy - Iustin Dornescu (UoW)RIILP
 
14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for TranslationRIILP
 
Last But Not Least - Managing The Indexing Process
Last But Not Least  - Managing The Indexing ProcessLast But Not Least  - Managing The Indexing Process
Last But Not Least - Managing The Indexing ProcessFred Leise
 
LIS 703 Subject Analysis by Malgorzata Kot
LIS 703 Subject Analysis by Malgorzata KotLIS 703 Subject Analysis by Malgorzata Kot
LIS 703 Subject Analysis by Malgorzata KotMalgorzataKot
 
Using the library for research
Using the library for researchUsing the library for research
Using the library for researchRoddy MacLeod
 
Logistics Management 354 - Reading and Referencing
Logistics Management 354 - Reading and ReferencingLogistics Management 354 - Reading and Referencing
Logistics Management 354 - Reading and Referencingpvhead123
 
4 Literature Search Techniques 2 Strategic Searching
4 Literature Search Techniques 2 Strategic Searching4 Literature Search Techniques 2 Strategic Searching
4 Literature Search Techniques 2 Strategic Searchingrichard kemp
 
Literature search and review
Literature search and reviewLiterature search and review
Literature search and reviewGraça Gabriel
 
Subject analysis, process of subject analysis
Subject analysis, process of subject analysisSubject analysis, process of subject analysis
Subject analysis, process of subject analysisRichard.Sapon-White
 

What's hot (20)

Taxonomy Fundamentals Workshop 2013
Taxonomy Fundamentals Workshop 2013Taxonomy Fundamentals Workshop 2013
Taxonomy Fundamentals Workshop 2013
 
Topic map for Topic Maps case examples
Topic map for Topic Maps case examplesTopic map for Topic Maps case examples
Topic map for Topic Maps case examples
 
Taxonomies and Folksonomies
Taxonomies and FolksonomiesTaxonomies and Folksonomies
Taxonomies and Folksonomies
 
Machine Aided Indexer
Machine Aided IndexerMachine Aided Indexer
Machine Aided Indexer
 
Subject analysis, subject heading principles
Subject analysis, subject heading principlesSubject analysis, subject heading principles
Subject analysis, subject heading principles
 
LIS 653, Session 9: Subject Analysis
LIS 653, Session 9: Subject Analysis LIS 653, Session 9: Subject Analysis
LIS 653, Session 9: Subject Analysis
 
Subject analysis: What's it all about, Alfie?
Subject analysis:  What's it all about, Alfie?Subject analysis:  What's it all about, Alfie?
Subject analysis: What's it all about, Alfie?
 
Subject analysis, an introduction
Subject analysis, an introductionSubject analysis, an introduction
Subject analysis, an introduction
 
4. Publication Strategy - Iustin Dornescu (UoW)
4. Publication Strategy - Iustin Dornescu (UoW)4. Publication Strategy - Iustin Dornescu (UoW)
4. Publication Strategy - Iustin Dornescu (UoW)
 
Taxonomy made easy
Taxonomy made easyTaxonomy made easy
Taxonomy made easy
 
14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation
 
Business research lec5
Business research lec5Business research lec5
Business research lec5
 
Last But Not Least - Managing The Indexing Process
Last But Not Least  - Managing The Indexing ProcessLast But Not Least  - Managing The Indexing Process
Last But Not Least - Managing The Indexing Process
 
LIS 703 Subject Analysis by Malgorzata Kot
LIS 703 Subject Analysis by Malgorzata KotLIS 703 Subject Analysis by Malgorzata Kot
LIS 703 Subject Analysis by Malgorzata Kot
 
Using the library for research
Using the library for researchUsing the library for research
Using the library for research
 
Literature Review
Literature ReviewLiterature Review
Literature Review
 
Logistics Management 354 - Reading and Referencing
Logistics Management 354 - Reading and ReferencingLogistics Management 354 - Reading and Referencing
Logistics Management 354 - Reading and Referencing
 
4 Literature Search Techniques 2 Strategic Searching
4 Literature Search Techniques 2 Strategic Searching4 Literature Search Techniques 2 Strategic Searching
4 Literature Search Techniques 2 Strategic Searching
 
Literature search and review
Literature search and reviewLiterature search and review
Literature search and review
 
Subject analysis, process of subject analysis
Subject analysis, process of subject analysisSubject analysis, process of subject analysis
Subject analysis, process of subject analysis
 

Similar to The Role of Thesauri in Data Modeling

An introduction to Metadata Application Profiles
An introduction to Metadata Application ProfilesAn introduction to Metadata Application Profiles
An introduction to Metadata Application Profileskcoylenet
 
Taxonomy design best practices
Taxonomy design best practices Taxonomy design best practices
Taxonomy design best practices voginip
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Semantic Web (Web 3.0)
Semantic Web (Web 3.0)Semantic Web (Web 3.0)
Semantic Web (Web 3.0)John Dougherty
 
The Semantic Web meets the Code of Federal Regulations
The Semantic Web meets the Code of Federal RegulationsThe Semantic Web meets the Code of Federal Regulations
The Semantic Web meets the Code of Federal Regulationstbruce
 
Writing technical definitions
Writing technical definitionsWriting technical definitions
Writing technical definitionsAriadne Rooney
 
Expressing Concept Schemes & Competency Frameworks in CTDL
Expressing Concept Schemes & Competency Frameworks in CTDLExpressing Concept Schemes & Competency Frameworks in CTDL
Expressing Concept Schemes & Competency Frameworks in CTDLCredential Engine
 
Argumentative Research EssayAssignment DescriptionIn upper lev.docx
Argumentative Research EssayAssignment DescriptionIn upper lev.docxArgumentative Research EssayAssignment DescriptionIn upper lev.docx
Argumentative Research EssayAssignment DescriptionIn upper lev.docxjewisonantone
 
Understanding Information Architecture
Understanding Information ArchitectureUnderstanding Information Architecture
Understanding Information ArchitectureScott Abel
 
SKOS - 2007 Open Forum on Metadata Registries - NYC
SKOS - 2007 Open Forum on Metadata Registries - NYCSKOS - 2007 Open Forum on Metadata Registries - NYC
SKOS - 2007 Open Forum on Metadata Registries - NYCjonphipps
 
DCMI Abstract Model: issues and proposed changes
DCMI Abstract Model: issues and proposed changesDCMI Abstract Model: issues and proposed changes
DCMI Abstract Model: issues and proposed changesEduserv Foundation
 
Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)Bradley Allen
 
Referencing an Article - Its styles and type.pptx
Referencing an Article - Its styles and type.pptxReferencing an Article - Its styles and type.pptx
Referencing an Article - Its styles and type.pptxPhD Assistance
 
Mastering the Art of SharePoint DMS
Mastering the Art of SharePoint DMSMastering the Art of SharePoint DMS
Mastering the Art of SharePoint DMSOliver Wirkus
 
Post conference workshop (xml and structure)
Post conference workshop (xml and structure)Post conference workshop (xml and structure)
Post conference workshop (xml and structure)Scriptorium Publishing
 

Similar to The Role of Thesauri in Data Modeling (20)

An introduction to Metadata Application Profiles
An introduction to Metadata Application ProfilesAn introduction to Metadata Application Profiles
An introduction to Metadata Application Profiles
 
Taxonomy design best practices
Taxonomy design best practices Taxonomy design best practices
Taxonomy design best practices
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Semantic Web (Web 3.0)
Semantic Web (Web 3.0)Semantic Web (Web 3.0)
Semantic Web (Web 3.0)
 
The Semantic Web meets the Code of Federal Regulations
The Semantic Web meets the Code of Federal RegulationsThe Semantic Web meets the Code of Federal Regulations
The Semantic Web meets the Code of Federal Regulations
 
Writing technical definitions
Writing technical definitionsWriting technical definitions
Writing technical definitions
 
Expressing Concept Schemes & Competency Frameworks in CTDL
Expressing Concept Schemes & Competency Frameworks in CTDLExpressing Concept Schemes & Competency Frameworks in CTDL
Expressing Concept Schemes & Competency Frameworks in CTDL
 
Linked Data
Linked DataLinked Data
Linked Data
 
Taxonomy Quality Assessment
Taxonomy Quality AssessmentTaxonomy Quality Assessment
Taxonomy Quality Assessment
 
Argumentative Research EssayAssignment DescriptionIn upper lev.docx
Argumentative Research EssayAssignment DescriptionIn upper lev.docxArgumentative Research EssayAssignment DescriptionIn upper lev.docx
Argumentative Research EssayAssignment DescriptionIn upper lev.docx
 
Understanding Information Architecture
Understanding Information ArchitectureUnderstanding Information Architecture
Understanding Information Architecture
 
SKOS - 2007 Open Forum on Metadata Registries - NYC
SKOS - 2007 Open Forum on Metadata Registries - NYCSKOS - 2007 Open Forum on Metadata Registries - NYC
SKOS - 2007 Open Forum on Metadata Registries - NYC
 
DCMI Abstract Model: issues and proposed changes
DCMI Abstract Model: issues and proposed changesDCMI Abstract Model: issues and proposed changes
DCMI Abstract Model: issues and proposed changes
 
Case Study: JSTOR: A Year Later
Case Study: JSTOR: A Year LaterCase Study: JSTOR: A Year Later
Case Study: JSTOR: A Year Later
 
Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)
 
Basics of scientific research writing
Basics of scientific  research writingBasics of scientific  research writing
Basics of scientific research writing
 
Mind the Semantic Gap
Mind the Semantic GapMind the Semantic Gap
Mind the Semantic Gap
 
Referencing an Article - Its styles and type.pptx
Referencing an Article - Its styles and type.pptxReferencing an Article - Its styles and type.pptx
Referencing an Article - Its styles and type.pptx
 
Mastering the Art of SharePoint DMS
Mastering the Art of SharePoint DMSMastering the Art of SharePoint DMS
Mastering the Art of SharePoint DMS
 
Post conference workshop (xml and structure)
Post conference workshop (xml and structure)Post conference workshop (xml and structure)
Post conference workshop (xml and structure)
 

More from Danny Greefhorst

Architecture as Linked Data
Architecture as Linked DataArchitecture as Linked Data
Architecture as Linked DataDanny Greefhorst
 
De rol van thesauri in datamanagement
De rol van thesauri in datamanagementDe rol van thesauri in datamanagement
De rol van thesauri in datamanagementDanny Greefhorst
 
Gegevenskwaliteit – een raamwerk vanuit NORA
Gegevenskwaliteit – een raamwerk vanuit NORAGegevenskwaliteit – een raamwerk vanuit NORA
Gegevenskwaliteit – een raamwerk vanuit NORADanny Greefhorst
 
Presentatie bij Boeklancering "Testautomatisering wendbaar organiseren"
Presentatie bij Boeklancering "Testautomatisering wendbaar organiseren"Presentatie bij Boeklancering "Testautomatisering wendbaar organiseren"
Presentatie bij Boeklancering "Testautomatisering wendbaar organiseren"Danny Greefhorst
 
Presentatie Gegevenskwaliteit in de Omgevingswet voor Werkgroep GAB
Presentatie Gegevenskwaliteit in de Omgevingswet voor Werkgroep GABPresentatie Gegevenskwaliteit in de Omgevingswet voor Werkgroep GAB
Presentatie Gegevenskwaliteit in de Omgevingswet voor Werkgroep GABDanny Greefhorst
 
Inzicht in kwaliteit van gegevens
Inzicht in kwaliteit van gegevensInzicht in kwaliteit van gegevens
Inzicht in kwaliteit van gegevensDanny Greefhorst
 
Data trends en ontwikkelingen
Data trends en ontwikkelingenData trends en ontwikkelingen
Data trends en ontwikkelingenDanny Greefhorst
 
Enterprise Architectuur - de essentie
Enterprise Architectuur - de essentieEnterprise Architectuur - de essentie
Enterprise Architectuur - de essentieDanny Greefhorst
 
The role of enterprise architecture in digital transformation
The role of enterprise architecture in digital transformationThe role of enterprise architecture in digital transformation
The role of enterprise architecture in digital transformationDanny Greefhorst
 
Presentatie Gegevenskwaliteit voor Nationaal Archief
Presentatie Gegevenskwaliteit voor Nationaal ArchiefPresentatie Gegevenskwaliteit voor Nationaal Archief
Presentatie Gegevenskwaliteit voor Nationaal ArchiefDanny Greefhorst
 
Enterprise Architectuur - terug naar de essentie
Enterprise Architectuur - terug naar de essentieEnterprise Architectuur - terug naar de essentie
Enterprise Architectuur - terug naar de essentieDanny Greefhorst
 
Creatief en kritisch denken
Creatief en kritisch denkenCreatief en kritisch denken
Creatief en kritisch denkenDanny Greefhorst
 
Gegevenskwaliteit in de omgevingswet
Gegevenskwaliteit in de omgevingswetGegevenskwaliteit in de omgevingswet
Gegevenskwaliteit in de omgevingswetDanny Greefhorst
 
Gegevenskwaliteit in de omgevingswet 1.0
Gegevenskwaliteit in de omgevingswet 1.0Gegevenskwaliteit in de omgevingswet 1.0
Gegevenskwaliteit in de omgevingswet 1.0Danny Greefhorst
 
Handreiking bij gegevenskwaliteit in de omgevingswet
Handreiking bij gegevenskwaliteit in de omgevingswetHandreiking bij gegevenskwaliteit in de omgevingswet
Handreiking bij gegevenskwaliteit in de omgevingswetDanny Greefhorst
 
Presentatie Enterprise Architectuur - Agile en Essentie
Presentatie Enterprise Architectuur - Agile en EssentiePresentatie Enterprise Architectuur - Agile en Essentie
Presentatie Enterprise Architectuur - Agile en EssentieDanny Greefhorst
 
Presentatie Kritisch Denken van Informatie voor NAF ALV
Presentatie Kritisch Denken van Informatie voor NAF ALVPresentatie Kritisch Denken van Informatie voor NAF ALV
Presentatie Kritisch Denken van Informatie voor NAF ALVDanny Greefhorst
 

More from Danny Greefhorst (20)

Architecture as Linked Data
Architecture as Linked DataArchitecture as Linked Data
Architecture as Linked Data
 
Design for sustainability
Design for sustainabilityDesign for sustainability
Design for sustainability
 
De rol van thesauri in datamanagement
De rol van thesauri in datamanagementDe rol van thesauri in datamanagement
De rol van thesauri in datamanagement
 
Gegevenskwaliteit – een raamwerk vanuit NORA
Gegevenskwaliteit – een raamwerk vanuit NORAGegevenskwaliteit – een raamwerk vanuit NORA
Gegevenskwaliteit – een raamwerk vanuit NORA
 
Presentatie bij Boeklancering "Testautomatisering wendbaar organiseren"
Presentatie bij Boeklancering "Testautomatisering wendbaar organiseren"Presentatie bij Boeklancering "Testautomatisering wendbaar organiseren"
Presentatie bij Boeklancering "Testautomatisering wendbaar organiseren"
 
Presentatie Gegevenskwaliteit in de Omgevingswet voor Werkgroep GAB
Presentatie Gegevenskwaliteit in de Omgevingswet voor Werkgroep GABPresentatie Gegevenskwaliteit in de Omgevingswet voor Werkgroep GAB
Presentatie Gegevenskwaliteit in de Omgevingswet voor Werkgroep GAB
 
Routes naar datakwaliteit
Routes naar datakwaliteitRoutes naar datakwaliteit
Routes naar datakwaliteit
 
Inzicht in kwaliteit van gegevens
Inzicht in kwaliteit van gegevensInzicht in kwaliteit van gegevens
Inzicht in kwaliteit van gegevens
 
Data trends en ontwikkelingen
Data trends en ontwikkelingenData trends en ontwikkelingen
Data trends en ontwikkelingen
 
TOGAF 9.2 - the update
TOGAF 9.2 - the updateTOGAF 9.2 - the update
TOGAF 9.2 - the update
 
Enterprise Architectuur - de essentie
Enterprise Architectuur - de essentieEnterprise Architectuur - de essentie
Enterprise Architectuur - de essentie
 
The role of enterprise architecture in digital transformation
The role of enterprise architecture in digital transformationThe role of enterprise architecture in digital transformation
The role of enterprise architecture in digital transformation
 
Presentatie Gegevenskwaliteit voor Nationaal Archief
Presentatie Gegevenskwaliteit voor Nationaal ArchiefPresentatie Gegevenskwaliteit voor Nationaal Archief
Presentatie Gegevenskwaliteit voor Nationaal Archief
 
Enterprise Architectuur - terug naar de essentie
Enterprise Architectuur - terug naar de essentieEnterprise Architectuur - terug naar de essentie
Enterprise Architectuur - terug naar de essentie
 
Creatief en kritisch denken
Creatief en kritisch denkenCreatief en kritisch denken
Creatief en kritisch denken
 
Gegevenskwaliteit in de omgevingswet
Gegevenskwaliteit in de omgevingswetGegevenskwaliteit in de omgevingswet
Gegevenskwaliteit in de omgevingswet
 
Gegevenskwaliteit in de omgevingswet 1.0
Gegevenskwaliteit in de omgevingswet 1.0Gegevenskwaliteit in de omgevingswet 1.0
Gegevenskwaliteit in de omgevingswet 1.0
 
Handreiking bij gegevenskwaliteit in de omgevingswet
Handreiking bij gegevenskwaliteit in de omgevingswetHandreiking bij gegevenskwaliteit in de omgevingswet
Handreiking bij gegevenskwaliteit in de omgevingswet
 
Presentatie Enterprise Architectuur - Agile en Essentie
Presentatie Enterprise Architectuur - Agile en EssentiePresentatie Enterprise Architectuur - Agile en Essentie
Presentatie Enterprise Architectuur - Agile en Essentie
 
Presentatie Kritisch Denken van Informatie voor NAF ALV
Presentatie Kritisch Denken van Informatie voor NAF ALVPresentatie Kritisch Denken van Informatie voor NAF ALV
Presentatie Kritisch Denken van Informatie voor NAF ALV
 

Recently uploaded

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdftheeltifs
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss ConfederationEfruzAsilolu
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制vexqp
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制vexqp
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 

Recently uploaded (20)

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 

The Role of Thesauri in Data Modeling

  • 1. The role of thesauri in data modeling Danny Greefhorst dgreefhorst@archixl.nl
  • 2. Topics in this presentation • What is a thesaurus and why is it valuable? • What does a thesaurus look like? • How does a thesaurus relate to a data model? • SKOS as a language for describing thesauri • Guidelines for good definitions (based on ISO 704) • Quality control for thesauri
  • 3. • A thesaurus is a type of controlled vocabulary for content retrieval. • A controlled vocabulary is a defined list of explicitly allowed terms used to index, categorize, tag, sort, and retrieve content through browsing and searching. • A thesaurus provides information about each term and its relationship to other terms. • Relationships are either hierarchical, associative or equivalent. • Thesauri can be used to: • organize unstructured content • uncover relationships between content from different media • improve website navigation • optimize search Thesaurus in the Data Management Body of Knowledge 3
  • 4. • A business glossary is a means of sharing this vocabulary within the organization. • A data steward is generally responsible for business glossary content. • They enhance enterprise knowledge by associating data assets with glossary terms. • Business glossaries have the following objectives: • enable common understanding of the core business concepts and terminology • reduce the risk that data will be misused due to inconsistent understanding of the business concepts • improve the alignment between technology assets (with their technical naming conventions) and the business organization • maximize search capability and enable access to documented institutional knowlegde Business glossary in Data Management Body of Knowledge 4
  • 5. Link concepts to other objects Concept Document/ web content Application Business rule API specification Database definition Data model Dataset Dashboard/report
  • 6. A controlled vocabulary is needed to make data FAIR https://www.go-fair.org/fair-principles/
  • 7. Concepts and data lineage - wat does the data mean? Regulations such as PERDARR/BCBS239 ask explicitly for a catalogue of concepts: • As a precondition, a bank should have a “dictionary” of the concepts used, such that data is defined consistently across an organization • A bank should develop an inventory and classification of risk data items which includes a reference to the concepts used to elaborate the reports. Data Data Data Concepts Concepts Concepts Report Horizontal data lineage Vertical data lineage
  • 8.
  • 9. Practical template for concepts Name Description Term A preferred linguistic reference to a concept. URI A unique identifier to the concept. Domain Domain in which the concept exists. Definition The formal definition of the concept. Source A reference to the source of the definition. Informal definition A simple definition of the concept that is understandable for a broad audience. Explanation A further clarification of the concept and the way it is used in the specific context. Editorial notes Remarks that are related to decisions made during the description of the concept. Examples A short summary or description of example instances of the concept. Synonyms Terms that denote almost the same concept. Exact match Concepts in another thesaurus that denote the same concept. Related Concepts that are related to the concept in another (non-hierarchical) manner. Broader Concepts that have a broader meaning than the concept. Broader partitive Concepts that represent a whole that the concept is a part of.
  • 10. Levels of modeling Thesaurus Logical data model Physical data model A collection of concepts and their relationships A design of a data structure A technology-specific representation of data Information model A formal description of a universe of discourse Conceptual Logical Physical Level Type of model Description
  • 11. Semiotic triangle Thought or reference Referent Symbol Stands for Source: Ogden and Richards (1923)
  • 12. A concept model is a model that develops the meaning of core concepts for a problem domain, defines their collective structure, and specifies the appropriate vocabulary needed to communicate about it consistently. Data models can usually be rather easily derived from concept models Strengths of a concept model: • Provides a business-friendly way to communicate with stakeholders about precise meanings and subtle distinctions. • Is independent of data design biases and the often limited business vocabulary coverage of data models. • Proves highly useful for white-collar, knowledge-rich, decision-laden business processes. • Helps ensure that large numbers of business rules and complex decision tables are free of ambiguity and fit together cohesively. A thesaurus is close to a concept model Source: Ron Ross: “Business Rule Concepts” and https://www.brcommunity.com/articles.php?id=b779
  • 13. Linking concepts to a data model – MIM standard All model elements have a property “Concept”: Reference to a concept, from a model element, indicating on which concept, or concepts, the information model element is based. The reference is in the form of a term or a URI.
  • 14. SKOS - Simple knowledge organisation system • Open standard of the W3C – defined in 2009 • Part of and based on Linked Data standards such as RDF • Makes every concept findable on the web with a URI • Offers a model for describing knowledge organisation systems such as thesauri • Based on general theory and standards about thesauri • Specifically aimed at publication of concepts on the web • Simplified model for describing concepts compared to other systems • Uses RDF and accompanyning standards (XML, TTL, JSON-LD) • Supported by various commercial and open source tools • Can be combined with the SKOS-THES standard to include part of and instance of relationships • Can be combined with the Dublin Core metadata standard • More information: https://www.w3.org/TR/skos-primer/
  • 15. Practical template mapped to SKOS Name SKOS representation Term skos:prefLabel URI skos:Concept Domain skos:member Definition skos:definition Source dc:source Informal definition rdfs:comment Explanation skos:scopeNote Editorial notes skos:editorialNote Examples skos:example Synonyms skos:altLabel Exact match skos:exactMatch Related skos:related Broader skos:broader Broader partitive isothes:broaderPartitive
  • 16. Practical template for concepts Name Description Term A preferred linguistic reference to a concept. URI A unique identifier to the concept. Domain Domain in which the concept exists. Definition The formal definition of the concept. Source A reference to the source of the definition. Informal definition A simple definition of the concept that is understandable for a broad audience. Explanation A further clarification of the concept and the way it is used in the specific context. Editorial notes Remarks that are related to decisions made during the description of the concept. Examples A short summary or description of example instances of the concept. Synonyms Terms that denote almost the same concept. Exact match Concepts in another thesaurus that denote the same concept. Related Concepts that are related to the concept in another (non-hierarchical) manner. Broader Concepts that have a broader meaning than the concept. Broader partitive Concepts that represent a whole that the concept is a part of.
  • 17. A SKOS concept <?xml version="1.0" encoding="utf-8" ?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:dc="http://purl.org/dc/terms/" xmlns:ns0="http://www.eionet.europa.eu/gemet/2004/06/gemet-schema.rdf#"> <skos:narrower rdf:resource="http://www.eionet.europa.eu/gemet/concept/15031"/> <dc:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime"> 2004-09-08T09:59:20+00:00</dc:modified> <skos:prefLabel xml:lang="en">air quality</skos:prefLabel> <skos:prefLabel xml:lang="nl">luchtkwaliteit</skos:prefLabel> <dc:created rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime"> 2004-09-08T09:59:20+00:00</dc:created> <skos:definition xml:lang="en">The degree to which air is polluted; the type and maximum concentration of man-produced pollutants that should be permitted in the atmosphere.</skos:definition> </rdf:RDF>
  • 18. Guidelines for formulating definitions of concepts (1) • Connect with general language use and the language in the organization in terms and definitions “Car” instead of “Automobile” • Define terms with a short name (term), in singular and starting with a capital letter “Car” instead of “Cars” • Keep definitions as short as possible; include only distinguishing features "a motorized vehicle with 4 wheels" instead of "a motorized vehicle with 4 wheels that can be used for both private and business transport“ • Use intensional definitions where possible; name the distinguishing features of a concept “a 4-wheel motorized vehicle” instead of “sedan or station wagon”
  • 19. Guidelines for formulating definitions of concepts (2) • Start by defining more general and broader words Define car first, then define station wagon • Define a term with a narrower meaning such as “A <broader notion> that…” A station wagon is “a car with a large cargo area” • Do not include features of a broader concept in the definition of a concept Not: a station wagon is “a car with 4 wheels and a large loading space” • Adopt definitions from official sources where possible and consistent with proprietary terminology Do not adopt a definition from a commercial source (such as a supplier)
  • 20. Guidelines for formulating definitions of concepts (3) • Define separate terms for all non-common words in definitions A “wheel” is a common word and needs no definition • Define not only concepts that lead 1-1 to data elements, but also the relevant context Define road and driver in addition to car • Avoid circular definitions; do not express a concept in terms of itself or its conjugations and do not allow definitions of concepts to refer to each other Driving is “moving around with a car” instead of “driving a car” • Avoid definitions that contain negations; define what something is and not what something isn't (unless you define opposing concepts) Not: a car is “a vehicle that is not a truck”
  • 21. Guidelines for formulating definitions of concepts (4) • Avoid the term “data” and anything directly related to data in definitions of terms Not: a car is “four-wheel vehicle data” • Support the definition with an explanation that indicates how the term is used within the organization Explanation for cars: “Cars are only relevant to our organization from the perspective of parking.” • Avoid using homonyms whenever possible Don't: define the term “Car” in two ways • Only name synonyms that are frequently used and acceptable “Automotive” as a synonym for “Car”, but not “Motor car”
  • 22. Quality rules for SKOS thesauri https://seco.cs.aalto.fi/publications/2014/suominen-mader-skosquality.pdf
  • 23. Example quality rules in more detail • Omitted or Invalid Language Tags: Literals should be tagged consistently with a language tag. • Undocumented Concepts: Concepts should include the set of “documentation properties" as defined in the SKOS Reference. • Overlapping Labels: No two concepts should have the same preferred lexical label in a given language when they belong to the same concept scheme. • Disjoint Labels Violation: skos:prefLabel, skos:altLabel and skos:hiddenLabel should be pairwise disjoint properties. • Extra Whitespace in Labels: Labels should not have any leading or trailing whitespace. • Orphan Concepts: Concepts should have associative or hierarchical relationships with other concepts. https://seco.cs.aalto.fi/publications/2014/suominen-mader-skosquality.pdf
  • 24. Summary • A thesaurus gives meaning to words • Concepts can be linked to all sorts of artefacts, enabling findability • Data modelling should start at a thesaurus level • Open and FAIR data requires a controlled vocabulary such as a thesaurus • SKOS is the de facto standard for thesauri
  • 25. More information? ArchiXL thesaurus: https://begrippen.archixl.nl/archixl/nl/ BegrippenXL thesaurusplatform: https://www.begrippenxl.nl/en/?clang=nl