Presentation for Data Modeling Zone Europe 2021. The foundation for any data model is an understanding of reality. This is typically supported by the construction of conceptual models. Understanding can and should however start in an earlier phase, and should not require formal models since this creates a gap between modelers and subject matter experts. Instead, it should start at a language level, which everyone understands. Thesauri are good instruments to support understanding at a language level. They sit in a sweet spot between a glossary of terms and a formal conceptual
model. Danny Greefhorst shows what a thesaurus is, how you can use it to model a universe of discourse and provide you with practical guidelines.
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
The Role of Thesauri in Data Modeling
1. The role of thesauri in
data modeling
Danny Greefhorst
dgreefhorst@archixl.nl
2. Topics in this presentation
• What is a thesaurus and why is it valuable?
• What does a thesaurus look like?
• How does a thesaurus relate to a data model?
• SKOS as a language for describing thesauri
• Guidelines for good definitions (based on ISO 704)
• Quality control for thesauri
3. • A thesaurus is a type of controlled vocabulary for content retrieval.
• A controlled vocabulary is a defined list of explicitly allowed terms used to
index, categorize, tag, sort, and retrieve content through browsing and
searching.
• A thesaurus provides information about each term and its relationship to
other terms.
• Relationships are either hierarchical, associative or equivalent.
• Thesauri can be used to:
• organize unstructured content
• uncover relationships between content from different media
• improve website navigation
• optimize search
Thesaurus in the Data Management Body of
Knowledge
3
4. • A business glossary is a means of sharing this vocabulary within the organization.
• A data steward is generally responsible for business glossary content.
• They enhance enterprise knowledge by associating data assets with glossary
terms.
• Business glossaries have the following objectives:
• enable common understanding of the core business concepts and
terminology
• reduce the risk that data will be misused due to inconsistent
understanding of the business concepts
• improve the alignment between technology assets (with their
technical naming conventions) and the
business organization
• maximize search capability and enable access to
documented institutional knowlegde
Business glossary in Data Management Body of
Knowledge
4
5. Link concepts to other objects
Concept
Document/ web content
Application
Business rule
API specification
Database definition
Data model
Dataset
Dashboard/report
7. Concepts and data lineage - wat does the data mean?
Regulations such as PERDARR/BCBS239 ask explicitly for a catalogue of
concepts:
• As a precondition, a bank should have a “dictionary” of the concepts used, such that data
is defined consistently across an organization
• A bank should develop an inventory and classification of risk data items which includes a
reference to the concepts used to elaborate the reports.
Data Data Data
Concepts Concepts Concepts
Report
Horizontal data lineage
Vertical
data
lineage
8.
9. Practical template for concepts
Name Description
Term A preferred linguistic reference to a concept.
URI A unique identifier to the concept.
Domain Domain in which the concept exists.
Definition The formal definition of the concept.
Source A reference to the source of the definition.
Informal definition A simple definition of the concept that is understandable for a broad audience.
Explanation A further clarification of the concept and the way it is used in the specific context.
Editorial notes Remarks that are related to decisions made during the description of the concept.
Examples A short summary or description of example instances of the concept.
Synonyms Terms that denote almost the same concept.
Exact match Concepts in another thesaurus that denote the same concept.
Related Concepts that are related to the concept in another (non-hierarchical) manner.
Broader Concepts that have a broader meaning than the concept.
Broader partitive Concepts that represent a whole that the concept is a part of.
10. Levels of modeling
Thesaurus
Logical data model
Physical data model
A collection of concepts and their relationships
A design of a data structure
A technology-specific representation of data
Information model A formal description of a universe of discourse
Conceptual
Logical
Physical
Level Type of model Description
12. A concept model is a model that develops the meaning of
core concepts for a problem domain, defines their collective
structure, and specifies the appropriate vocabulary needed
to communicate about it consistently.
Data models can usually be rather easily derived from
concept models
Strengths of a concept model:
• Provides a business-friendly way to communicate with
stakeholders about precise meanings and subtle
distinctions.
• Is independent of data design biases and the often
limited business vocabulary coverage of data models.
• Proves highly useful for white-collar, knowledge-rich,
decision-laden business processes.
• Helps ensure that large numbers of business rules and
complex decision tables are free of ambiguity and fit
together cohesively.
A thesaurus is close to a concept model
Source: Ron Ross: “Business Rule Concepts” and https://www.brcommunity.com/articles.php?id=b779
13. Linking concepts to a data model – MIM standard
All model elements have a property “Concept”:
Reference to a concept, from a model element, indicating on
which concept, or concepts, the information model element
is based. The reference is in the form of a term or a URI.
14. SKOS - Simple knowledge organisation system
• Open standard of the W3C – defined in 2009
• Part of and based on Linked Data standards such as RDF
• Makes every concept findable on the web with a URI
• Offers a model for describing knowledge organisation systems such as thesauri
• Based on general theory and standards about thesauri
• Specifically aimed at publication of concepts on the web
• Simplified model for describing concepts compared to other systems
• Uses RDF and accompanyning standards (XML, TTL, JSON-LD)
• Supported by various commercial and open source tools
• Can be combined with the SKOS-THES standard to include part of and instance of
relationships
• Can be combined with the Dublin Core metadata standard
• More information: https://www.w3.org/TR/skos-primer/
15. Practical template mapped to SKOS
Name SKOS representation
Term skos:prefLabel
URI skos:Concept
Domain skos:member
Definition skos:definition
Source dc:source
Informal definition rdfs:comment
Explanation skos:scopeNote
Editorial notes skos:editorialNote
Examples skos:example
Synonyms skos:altLabel
Exact match skos:exactMatch
Related skos:related
Broader skos:broader
Broader partitive isothes:broaderPartitive
16. Practical template for concepts
Name Description
Term A preferred linguistic reference to a concept.
URI A unique identifier to the concept.
Domain Domain in which the concept exists.
Definition The formal definition of the concept.
Source A reference to the source of the definition.
Informal definition A simple definition of the concept that is understandable for a broad audience.
Explanation A further clarification of the concept and the way it is used in the specific context.
Editorial notes Remarks that are related to decisions made during the description of the concept.
Examples A short summary or description of example instances of the concept.
Synonyms Terms that denote almost the same concept.
Exact match Concepts in another thesaurus that denote the same concept.
Related Concepts that are related to the concept in another (non-hierarchical) manner.
Broader Concepts that have a broader meaning than the concept.
Broader partitive Concepts that represent a whole that the concept is a part of.
17. A SKOS concept
<?xml version="1.0" encoding="utf-8" ?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:dc="http://purl.org/dc/terms/"
xmlns:ns0="http://www.eionet.europa.eu/gemet/2004/06/gemet-schema.rdf#">
<skos:narrower rdf:resource="http://www.eionet.europa.eu/gemet/concept/15031"/>
<dc:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">
2004-09-08T09:59:20+00:00</dc:modified>
<skos:prefLabel xml:lang="en">air quality</skos:prefLabel>
<skos:prefLabel xml:lang="nl">luchtkwaliteit</skos:prefLabel>
<dc:created rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">
2004-09-08T09:59:20+00:00</dc:created>
<skos:definition xml:lang="en">The degree to which air is polluted; the type and
maximum concentration of man-produced pollutants that should be permitted in the
atmosphere.</skos:definition>
</rdf:RDF>
18. Guidelines for formulating definitions of concepts (1)
• Connect with general language use and the language in the organization
in terms and definitions
“Car” instead of “Automobile”
• Define terms with a short name (term), in singular and starting with a
capital letter
“Car” instead of “Cars”
• Keep definitions as short as possible; include only distinguishing features
"a motorized vehicle with 4 wheels" instead of "a motorized vehicle with 4 wheels that
can be used for both private and business transport“
• Use intensional definitions where possible; name the distinguishing
features of a concept
“a 4-wheel motorized vehicle” instead of “sedan or station wagon”
19. Guidelines for formulating definitions of concepts (2)
• Start by defining more general and broader words
Define car first, then define station wagon
• Define a term with a narrower meaning such as “A <broader notion> that…”
A station wagon is “a car with a large cargo area”
• Do not include features of a broader concept in the definition of a concept
Not: a station wagon is “a car with 4 wheels and a large loading space”
• Adopt definitions from official sources where possible and consistent with
proprietary terminology
Do not adopt a definition from a commercial source (such as a supplier)
20. Guidelines for formulating definitions of concepts (3)
• Define separate terms for all non-common words in definitions
A “wheel” is a common word and needs no definition
• Define not only concepts that lead 1-1 to data elements, but also the relevant
context
Define road and driver in addition to car
• Avoid circular definitions; do not express a concept in terms of itself or its
conjugations and do not allow definitions of concepts to refer to each other
Driving is “moving around with a car” instead of “driving a car”
• Avoid definitions that contain negations; define what something is and not what
something isn't (unless you define opposing concepts)
Not: a car is “a vehicle that is not a truck”
21. Guidelines for formulating definitions of concepts (4)
• Avoid the term “data” and anything directly related to data in definitions of terms
Not: a car is “four-wheel vehicle data”
• Support the definition with an explanation that indicates how the term is used
within the organization
Explanation for cars: “Cars are only relevant to our organization from the perspective of
parking.”
• Avoid using homonyms whenever possible
Don't: define the term “Car” in two ways
• Only name synonyms that are frequently used and acceptable
“Automotive” as a synonym for “Car”, but not “Motor car”
22. Quality rules for SKOS thesauri
https://seco.cs.aalto.fi/publications/2014/suominen-mader-skosquality.pdf
23. Example quality rules in more detail
• Omitted or Invalid Language Tags: Literals should be tagged consistently with a
language tag.
• Undocumented Concepts: Concepts should include the set of “documentation
properties" as defined in the SKOS Reference.
• Overlapping Labels: No two concepts should have the same preferred lexical label
in a given language when they belong to the same concept scheme.
• Disjoint Labels Violation: skos:prefLabel, skos:altLabel and skos:hiddenLabel
should be pairwise disjoint properties.
• Extra Whitespace in Labels: Labels should not have any leading or trailing
whitespace.
• Orphan Concepts: Concepts should have associative or hierarchical relationships
with other concepts.
https://seco.cs.aalto.fi/publications/2014/suominen-mader-skosquality.pdf
24. Summary
• A thesaurus gives meaning to words
• Concepts can be linked to all sorts of artefacts, enabling findability
• Data modelling should start at a thesaurus level
• Open and FAIR data requires a controlled vocabulary such as a thesaurus
• SKOS is the de facto standard for thesauri