"Classification schemes, thesauri and other Knowledge Organization Systems - a Linked Data perspective".
Presentation at the Pelagios Linked Pasts event, July 20-21, 2015.
http://pelagios-project.blogspot.co.uk/2015/03/linked-pasts.html
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Classification schemes, thesauri and other Knowledge Organization Systems - a Linked Data perspective - Linked pasts 15
1. Classification schemes, thesauri and other
Knowledge Organization Systems
- a Linked Data perspective
Antoine Isaac
Pelagios: Linked Pasts
London, July 20-21, 2015
2. Classification schemes?
Scope: knowledge organization systems (KOS) such as
classification systems, thesauri, gazetteers, subject
heading lists…
(last-minute addition: also time periods, cf. PeriodO )
3.
4.
5.
6. Simple Knowledge Organization System
SKOS is for exchanging KOSs as Linked Data (in RDF)
• Better than semi-structured data (CSV)
• Still relatively simple
7. A SKOS graph
animals
cats
UF domestic cats
RT wildcats
BT animals
SN used only for domestic cats
domestic cats
USE cats
wildcats
8. Representing semantics
The formal way: OWL Semantic Web ontology language
Used for ontologies that enable machine reasoning
Mother is a class
Parent is the class of entities of type Person that are related to
at least one other resource of type Person using the child
property
…
9. Do we want to represent every vocabulary
as a formal ontology?
It is possible, but not easy
KOS are large
KOS have softer “semantics”
Parent RelatedTerm Child
KOS have a focus on terminological information
Child UsedFor Offspring
Softer semantics can be useful for many applications!
10. Europeana and knowledge organisation systems
Create a “semantic layer” on top of cultural heritage objects
From: Stefan Gradmann
21. Enrichment types and vocabularies
Enrichment
Type
Target
vocabulary
Source metadata
fields
Number of
enriched
objects
Places GeoNames dcterms:spatial,
dc:coverage
7M
Concepts GEMET,
DBpedia,
dc:subject,
dc:type
9.2M
Agents DBpedia dc:creator,
dc:contributor
144K
Time Semium Time dc:date,
dc:coverage,
dcterms:temporal,
edm:year
10,2M
22. Work in progress
Entity-based search and browsing
Annotation
Pundit @ DM2E project http://dm2e.eu
Europeana Channels
Semantic auto-completion
23. Not only end-user facing functions
Data must be accessible
(Unified) APIs, Linked Data
Data re-users should be able to provide enhanced services
to their audience easily, especially in digital humanities
Specific collection and application needs cannot rely on a handful of
generic vocabularies
25. Vocabulary management and publication
Europeana developed its own WWI vocabulary based on a
subset of LCSH
Terms translated in 10 languages and linked to id.loc.gov
27. Representing finer-grained semantics
More precise relationships and formal semantics
For query expansion or data validation
E.g. ISO 25964 and Getty SKOS extensions
30. The need for alignment / co-reference /
reconciliation
KOS 1:
animals
cats
wildcats
KOS 2:
animal
human
object
31. A lot of work (being) done
A long line of work in the KOS community: DESIRE,
CARMEN, Renardus, LIMBER, HILT, MSAC, MACS,
Crisscross, KoMoHe, FAO…
Continued in Linked data context: Pleiades, Wikidata…
MACS: 120K links between Library of Congress Subject Headings (LCSH),
RAMEAU, Schlagwortnormdatei (SWD)
34. Finding and re-using vocabularies
Well-known or new vocabularies
Wikidata, VIAF, Geonames, Pleiades, DBpedia, LCSH…
Data repositories and inventories
The Data Hub
35. Vocabulary selection criteria
Available in technically appropriate way
Well-maintained
Documented (including metadata)
Well-connected, e.g. equivalent elements in other
vocabularies are indicated
Multilingual
Open
• license stacking hampers re-use
Quality assessment?
Cf. Data on the Web Best Practices
http://www.w3.org/TR/2015/WD-dwbp-20150625/#dataVocabularies
36. Take-home messages
Efforts across the whole ecosystem
Publishers of vocabularies, Providers of object data, Application
developers, Researchers…
Requires to get very different steps right
Implementing standards for data exchange
Design consuming applications
Not only technical: encouraging open data!
View the object at: http://www.europeana.eu/portal/record/09102/_CM_0161930.html
The labels are also stored in our database for better search.
-> Search 'uurglazen' in Italy
http://europeana.eu/portal/search.html?query=uurglazen&rows=96&qf=COUNTRY%3Aitaly
-> back to http://europeana.eu/portal/record/02301/urn_imss_instrument_401058.html
Show that the Dutch label for Hourglasses was not in the original data. But it's in the auto-generated tags below.
Two categories:
Global
Produced by projects
See list on the wiki
People may ask why we've not just re-used the LCSH URIs and added translation data to them.
Response will be "so as to obey the principle of not re-defining others' data"
Note: the last line is redundant with previous slides
Alignment: 2 vocabularies describing the same concept can be aligned via the concept...
In the linked environment, enrichment often refers to adding new information at the semantic level to the data about certain resources. It is the creation of new links between the enriched resources and another data resource, such as controlled vocabularies and authority files. The goal is contexualization of metadata and embedding the resoucrs in context outside the scope of the platform