Thesaurus-Based Indexing of Research Data in the Social SciencesOpportunities and Difficulties of Internationalization Efforts
Contents:
- Current Trends and Demands in Describing and Cataloguing Research Data
- Subject Indexing of Research Data in the Social Sciences (Present Situation in Europe)
- Thesauri in Subject Indexing
- Recommended Indexing Model
- Retrieval Model
- Practical Aspects
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Baum, Kempf: Thesaurus based indexing
1. Thesaurus-Based Indexing of Research
Data in the Social Sciences
Opportunities and Difficulties
of Internationalization Efforts
Katrin Baum, Dipl.-Bibl.
Dr. Andreas Oskar Kempf, M.A. (LIS)
GESIS – Leibniz-Institute for the Social Sciences
Cologne, May 28 – 31 May │ Baum, Kempf │ IASSIST 2013 │ Thesaurus-Based Indexing of Research Data
2. Contents
1. Current Trends and Demands in Describing and Cataloguing Research
Data
2. Subject Indexing of Research Data in the Social Sciences – Present
Situation in Europe
3. Thesauri in Subject Indexing
4. Recommended Indexing Model
5. Retrieval Model
6. Practical Aspects
Cologne, May 28 – 31 May │ Baum, Kempf │ IASSIST 2013 │ Thesaurus-Based Indexing of Research Data 2
3. 1. Current Trends and Demands
in Describing and Cataloguing Research Data
Increasing internationalization and standardization efforts:
to enable and facilitate data exchange
to enable and facilitate integrated retrieval across distributed
information systems
In the social sciences:
DDI (e.g. metadata specification, controlled vocabularies)
Commonly used systems for subject indexing (e.g. ELSST,
CESSDA Topic Classification)
…
Cologne, May 28 – 31 May │ Baum, Kempf │ IASSIST 2013 │ Thesaurus-Based Indexing of Research Data 3
4. 2. Subject Indexing of Research Data in the Social
Sciences – Present Situation in Europe (1/5)
CESSDA (Council of European Social Science Data Archives):
Members = data archives and other organisations all across
Europe which archive and provide social science data for
secondary use
Provides access to 25,000 data collections + 1,000 data
collections every year
Development and maintenance of European Language Social
Science Thesaurus (ELSST) and CESSDA Topic Classification
CESSDA catalogue: allows search in data collections of
member organisations, e.g. search by topic or search by
keyword
Cologne, May 28 – 31 May │ Baum, Kempf │ IASSIST 2013 │ Thesaurus-Based Indexing of Research Data 4
5. 2. Subject Indexing of Research Data in the Social
Sciences in Europe – Present Situation (2/5)
Cologne, May 28 – 31 May │ Baum, Kempf │ IASSIST 2013 │ Thesaurus-Based Indexing of Research Data 5
6. 2. Subject Indexing of Research Data in the Social
Sciences in Europe – Present Situation (3/5)
European Language Social Science Thesaurus (ELSST):
Multilingual thesaurus for the social sciences (translated into English,
Danish, Finnish, French, German, Greek, Norwegian, Spanish and
Swedish)
Based on the HASSET Thesaurus of UKDA
Further developed by CESSDA members
Planned: annual release of new version (latest version: 3/2013)
Contains about 3,300 internationally applicable concepts extracted
from HASSET
Allows for local extensions of concepts
Used for subject indexing of research data by CESSDA members
Cologne, May 28 – 31 May │ Baum, Kempf │ IASSIST 2013 │ Thesaurus-Based Indexing of Research Data 6
7. 2. Subject Indexing of Research Data in the Social
Sciences in Europe – Present Situation (4/5)
Cologne, May 28 – 31 May │ Baum, Kempf │ IASSIST 2013 │ Thesaurus-Based Indexing of Research Data 7
8. 2. Subject Indexing of Research Data in the Social Sciences
in Europe – Present Situation (5/5)
But:
No coherent indexing practice throughout the participating
archives due to a lack of a binding indexing policy
Limited representation of fine-grained national / local issues
(e.g. historical, juridical, religious and political aspects, forms
of national organizations, educational system, collection-
specific aspects …)
Retrieval limited to internationally applicable concepts
Cologne, May 28 – 31 May │ Baum, Kempf │ IASSIST 2013 │ Thesaurus-Based Indexing of Research Data 8
9. 3. Thesauri in Subject Indexing (1/3)
Some general findings on thesauri:
Scope and content of each thesaurus is tightly
connected to a specific collection => scope and content
of thesauri of the same domain can differ
Different levels of abstraction / specificity
Different perspectives / classification aspects can lead to
different semantic relations
Cologne, May 28 – 31 May │ Baum, Kempf │ IASSIST 2013 │ Thesaurus-Based Indexing of Research
Data
9
10. 3.1 Thesauri in Subject Indexing - Internationally
usable Thesauri (2/3)
Internationally usable thesaurus has to:
represent concepts that exist in any language
display these concepts in a hierarchical / semantic structure
that fits all languages
be free of any bias
be multilingual
But:
Fine-grained local issues cannot be displayed
Retrieval limited to internationally applicable concepts
Cologne, May 28 – 31 May │ Baum, Kempf │ IASSIST 2013 │ Thesaurus-Based Indexing of Research
Data
10
11. 3.2 Thesauri in Subject Indexing - Local Thesauri (3/3)
Exclusive use of a local indexing system:
Represents scope of local collection
Respects local aspects
Allows for more precise indexing
Easier to maintain
Monolingual or multilingual access to local collection
But:
No access to dispersed collections that are indexed with
different terminological resources
Cologne, May 28 – 31 May │ Baum, Kempf │ IASSIST 2013 │ Thesaurus-Based Indexing of Research Data 11
12. = Aggregate of local thesauri with common, internationally
applicable core concepts
Core:
Contains concepts that exist in any language
Hierarchical structure fits all languages
Free of bias
Concepts that are already part of the local systems
can be mapped to concepts of core system
Concepts that are still missing in local systems
can be added
4. Recommended Indexing Model (1/3)
Cologne, May 28 – 31 May │ Baum, Kempf │ IASSIST 2013 │ Thesaurus-Based Indexing of Research Data 12
13. 4. Recommended Indexing Model (2/3)
Cologne, May 28 – 31 May │ Baum, Kempf │ IASSIST 2013 │ Thesaurus-Based Indexing of Research Data 13
ELSST
(CESSDA
CATALOGUE)
TheSoz
(GESIS)
Universal Core Indexing System
contains central concepts which exist
in any language
(e.g. SECONDARY SCHOOLS)
contains central concepts which
already exist in local indexing systems
(e.g. WEITERFÜHRENDE SCHULEN)
Local Indexing System:
contains local specificities
(e.g. GYMNASIUM)
contains collection-specific concepts
(e.g. NORDRHEIN-WESTFALEN)
HASSET
(UKDA)
14. 4. Recommended Indexing Model (3/3)
SECONDARY SCHOOLS > GYMNASIUM SECONDARY
SCHOOL
(Gymnasium)
GYMNASE
SECONDARY SCHOOLS > REALSCHULE INTERMEDIATE
SCHOOL
ÉCOLE SECONDAIRE
PRATIQUE
SECONDARY SCHOOLS > HAUPTSCHULE SECONDARY
MODERN SCHOOL
ÉCOLE SECONDAIRE
OBLIGATOIRE
Cologne, May 28 – 31 May │ Baum, Kempf │ IASSIST 2013 │ Thesaurus-Based Indexing of Research Data
14
Thesaurus Cross-Concordances
ELSST
(D, DK, E, FIN, F, GB, GR, N, S)
Relation TheSoz
(D, GB, F)
SECONDARY SCHOOLS = WEITERFÜHRENDE
SCHULE
SECONDARY
SCHOOL
ÉCOLE SECONDAIRE
Linkage between International Core and Local Indexing
System
15. 5. Retrieval Model
Cologne, May 28 – 31 May │ Baum, Kempf │ IASSIST 2013 │ Thesaurus-Based Indexing of Research Data 15
„schools“
„Schulen“
„écoles“
„colegios“
„koulut“
„skole“
„ΣΧΟΛΕΙΑ“
„skola“
„skoler“
Integrated
Retrieval System
(e.g. CESSDA
Catalogue)
ELSST
Preferred Term:
SCHOOLS
Narrower Terms:
- SECONDARY
SCHOOLS
- WEITERFÜHREDE
SCHULE
- … Narrower Terms:
> SECONDARY SCHOOL
(GYMNASIUM)
- GYMNASIUM
> INTERMEDIATE
SCHOOL
- REALSCHULE
> SECONDARY MODERN
SCHOOL
- Hauptschule
=
TheSoz
- SECONDARY
SCHOOLS
- WEITERFÜHRENDE
SCHULE
International Indexing System Local Indexing System
16. 6. Practical Aspects
Need for binding indexing guidelines for core terms
Data already indexed with local system remain useful
User only needs to know one thesaurus
Local system represents local collection
Indexing with local system guarantees a more precise
indexing and respects local aspects
Local systems are easier to maintain
Cologne, May 28 – 31 May │ Baum, Kempf │ IASSIST 2013 │ Thesaurus-Based Indexing of Research Data 16
17. Thank you
for your attention.
Contact
Katrin Baum
GESIS-Leibniz-Institute for the Social Sciences
katrin.baum@gesis.org
Dr. Andreas Oskar Kempf
GESIS – Leibniz-Institute for the Social Sciences
andreas.kempf@gesis.org
www.gesis.org
Cologne, May 28 – 31 May │ Baum, Kempf │ IASSIST 2013 │ Thesaurus-Based Indexing of Research Data 17