Z Score,T Score, Percential Rank and Box Plot Graph
Knowledge Organisation Systems in Digital Libraries: A Comparative Study
1. Bhojaraju Gunjal
Lead - Knowledge Management
GCI Solutions,
Bangalore.
email: Bhojaraju.G@gmail.com
Prof. Shalini R Urs
Professor
Department of Library and
Information Science,
University of Mysore, Mysore.
email: shalini@vidyanidhi.org.in
Knowledge Organisation Systems (KOS)
in Digital Libraries: A Comparative Study
Gunjal, Bhojaraju & Urs, Shalini R (2006). Knowledge Organisation System in Digital Libraries: A Comparative
Study. In: Proceedings of National Seminar on Knowledge Representation and Information Retrieval, Paper:
Document Research & Training Centre, ISI, Bangalore, March 22-24, 2006.
2. February 1, 2015 2
Outline
1. Introduction to KOS and DL
2. Current Status of KOS
3. Types of KOS
4. Study of KOS in Digital Libraries: A
Comparative Study
5. Conclusion
6. Appendix
3. February 1, 2015 3
Abstract
This article presents the preliminary results of a study of the
Knowlededge Organisation Systems (KOS) deployed in major
digital libraries (DLs) of the world.
While traditional libraries ( physical libraries) had evolved
certain norms and systems the kinds of KOS being deployed
in DLs vary.
The dynamic nature of the DLs, the retrieval capability of the
digital environments and the ‘black box nature of the systems
have made the knowledge about KOS in DLs difficult to
perceive and comprehend.
Hence there is need to study in depth the knowledge
organization (KO) tools, such as library classifications,
thesauri and subject heading systems and the Knowledge
representation (Ontologies) that are being deployed in Digital
Libraries.
This article at the ends gives the comparative study of DLs of
the world and summarise all the present scenario of KOS in
these DLs.
4. February 1, 2015 4
1. Introduction to KOS, DL and ETD
Libraries (Physical Libraries) have evolved
and adopted different system of Knowledge
Organisation (KO) in arranging their
collections.
Examples of such KOS include classification
systems such as CC, DDC, UDC, etc for KO.
Tools such as classification schemes,
cataloguing, indexing etc have played a
prominent role in organising documents in a
collection.
The new electronic environments pose a
threat and challenge to the theory and
practice of KO.
5. February 1, 2015 5
1.1 Knowledge Organization System (KOS)
The term knowledge organization system
was coined by the Networked Knowledge
Organization Systems (NKOS) Working
Group at its initial meeting at the ACM
Digital Libraries-98 Conference in
Pittsburgh, Pennsylvania.
Purpose: is to organize content to support
retrieval of relevant items from a digital
library collection.
KOS helps in
systematic organisation of documents.
easy retrieval of information.
6. February 1, 2015 6
2. Current Scenario of KOS
2.1 Simple Knowledge Organisation System
(SKOS)
an area of work developing specifications and standards to support the
use of KOS within the framework of the semantic web.
2.2 Networked Knowledge Organization
Systems /Services (NKOS)
devoted to the discussion of the functional and data model for enabling
KOS as networked interactive information services to support the
description and retrieval of diverse information resources through the
Internet.
2.3 International Society for Knowledge
Organization (ISKO)
the premier international scholarly society devoted to the theory and
practice of KO
7. February 1, 2015 7
3. Types of KOS
KO systems are grouped into three general categories
3.1 Term lists
which emphasize lists of terms often with definitions
Authority files
Glossaries
Dictionaries
Gazetteers
3.2 Classification and Categories
which emphasize the creation of subject sets.
Subject headings
Classification schemes – DDC, CC, LCC
3.3 Relationship lists
which emphasize the connections between terms and concepts
Thesauri
Taxonomies
Semantic Networks, AI, Ontologies
8. February 1, 2015 8
3.1 Term Lists
Authority Files
are lists of terms that are used to control the variant names for an entity or the
domain value for a particular field.
Ex: Library of Congress Name Authority File and the Getty Geographic
Authority File
Glossaries
a list of terms, usually with definitions
Ex: Environmental Protection Agency (EPA) Terms of the Environment.
Dictionaries
alphabetical lists of words and their definitions, more general in scope
provide information about the origin of a word, variants
multiple meanings across disciplines
Ex : Oxford, Webster
Gazetteers
a list of place names
Each entry may also be identified by feature type, such as river, city, or school.
Ex: U.S. Code of Geographic Names
9. February 1, 2015 9
3.2 Classifications and Categories
Subject Headings
provides a set of controlled terms to represent the subjects of
items in a collection.
extensive and cover a broad range of subjects;
Ex: Medical Subject Headings (MeSH) and the Library of
Congress Subject Headings (LCSH).
Classification Schemes
Classification Schemes, Taxonomies, and Categorization Schemes
- are often used interchangeably
classification schemes – DDC, CC, LCC
Subject categories - used to group thesaurus terms in broad topic
sets that lie outside the hierarchical scheme of the thesaurus.
Taxonomies- used in object-oriented design and knowledge
management systems.
Contd..
10. February 1, 2015 10
Classification Schemes
DDC
It was initially enumerative but approaching a synthetic scheme.
Universe of knowledge divided into 10 main classes from 000 to 900.
Each class is divided into 10 forming 100 divisions and each division is
divided into 10 forming1000 subdivisions.
Based on Bacon’s theory of knowledge which divides knowledge acc to:
memory, imagination and reason.
CC
It is faceted/analytico-synthetic scheme.
Divides subjects into mutually exclusive orthogonal facets.
The disciplines are sequenced according to the “principle of increasing
concreteness” and the “principle of decreasing naturalness and
increasing artificiality of the content”.
LCC
It is an enumerative classification scheme.
Discipline based on the literary warrant of the Library of Congress
collection.
Organization of classes and subclasses followed general pattern of
Martel’s seven points: general form divisions, theory/philosophy,
history, treatises or general works, law/regulation/state relations, study
and teaching, and special subjects and subdivisions of the subject.
11. February 1, 2015 11
Taxonomies
Sometimes called natural classification schemes.
Basis for the classification schemes and indexing
systems.
A taxonomy is a classification system. As the Greek
root "taxis" implies, it is about putting things in order.
The aim of a taxonomy is to group things according to
similarities in some respect such as similarities in
structure, role, behavior, etc.
Use of taxonomies have profound role in biology.
Represented by a tree. It is a set of nodes and set of
connections between the nodes such that for any pair
of nodes there is a unique path. Any path from root to
leaf is called a branch.
Construction involves splitting a set into subsets, and
repeating the process on subsets. Criteria for splitting
depend on the application.
Contd..
12. February 1, 2015 12
3.3 Relationship Lists
Thesauri
based on concepts and they show relationships among terms
relationships are generally represented by the notation BT (broader term), NT (narrower
term), SY (synonym), and RT (associative or related term).
Ex: UMLS from the NLM, Roget's Thesaurus, FAO’s Aquatic Sciences and Fisheries
Thesaurus, NASA Thesaurus
Semantic Networks
advent of natural language processing – led developments in semantic networks
KOSs structure concepts and terms not as hierarchies but as a network or a web.
Ex: Princeton University's WordNet - used in a variety of search engines
Ontologies
form of knowledge representation
study of relationships that give rise to meaning of expressions.
provide a shared and common understanding of a domain that can be communicated across
people and application systems
play a major role in supporting information exchange processes
natural successors of thesauri, particularly for information retrieval and knowledge
management.
Two main types:
General ontologies. Ex - SENSUS, Cyc, WordNet, etc.
Domain-specific ontologies Ex. - GALEN – Generalized Architecture for Languages, Encyclopedias,
and Nomenclatures in medicine; UMLS - Unified Medical Language System.
Ontologies can be built using XML and RDF.
Many specific ontology development languages, specifically Web Ontology languages
have been develop. Ex. - OIL, DAML, OWL etc.
Contd..
13. February 1, 2015 13
3.4 Concept Mapping
Developed by Prof. Joseph D. Novak at
Cornell University in the 1960s. This work
was based on the theories of David Ausubel.
A technique for representing knowledge in
graphs
Knowledge graphs are networks of concepts
Concept mapping can be done for several
purposes:
to generate ideas (brain storming, etc.);
to design a complex structure (long texts, hypermedia, large web
sites, etc.);
to communicate complex ideas;
to aid learning by explicitly integrating new and old knowledge;
to assess understanding or diagnose misunderstanding.
14. February 1, 2015 14
3.5 Search Engines
The present problem with search
engines are
Lack of intelligence– they can only find pages
that have the chosen key /search/content word
in the text.
Lack of refinement while retrieving documents
The ideal search engine—features
Speed
Currency
Recall s
Precision
Ranking
Contd..
15. February 1, 2015 15
Some semantic search engines
3.5.1 Teoma
It applies three proprietary techniques: Refine, Results and
Resources.
Refine - organizes sites into naturally occurring communities that
are about the subject of each search query. This tool allows a user
to further focus his or her specific search.
Results - next it employs a technique called Subject-Specific
Popularity. It analyzes the relationship of sites within a
community, ranking a site based on the number of same-subject
pages that reference it, among hundreds of other criteria.
Resources - finally, by dividing the Web into local subject
communities, Teoma is able to find and identify expert resources
about a particular subject. These sites feature lists of other
authoritative sites and links relating to the search topic.
Contd..
16. February 1, 2015 16
Snapshot of search result in Teoma
Search results
showed in
Results, Refine
and Resources
17. February 1, 2015 17
3.5.2 Vivisimo
It employs clustering techniques to retrieve. The
Vivisimo algorithm puts documents together
(clusters them) based on textual similarity that is
augmented with heuristics.
It does not use a pre-defined taxonomy of controlled
vocabulary, so the name of each cluster is generated
from the search results within it.
It also does not force each document into a single
place in the cluster hierarchy—means, as document
can cover multiple themes, each document is placed
where it seems to fit.
Contd..
18. February 1, 2015 18
Snapshot of search result in VivisimoSnapshot of search result in Vivisimo
Search results
are grouped in
to clusters
Search results
are grouped in
to clusters
19. February 1, 2015 19
3.6 Visual tools for KO & Retrieval
Concept maps are effective visualization tools for
representing knowledge of any domain. In addition,
concept maps are a learning tool and access point for
the domain of knowledge represented.
In simplistic form, they provide a graphic interface into the
structure and relationships of knowledge.
At the next level of sophistication, the software used to
make concept maps becomes the knowledgebase storage
system.
Research findings support the effectiveness of concept
maps as a thinking and visualization tool that empowers
the user (learner) to more effectively use knowledge.
20. February 1, 2015 20
3.7 KO Tools/ Techniques:
The tools of KO group are:
KO tools usually utilize Classification,
Ontology, Semantic net, Semantic search
engines Directories & Groupware
Technologies.
Software like free mind, Think map and
Visual Thesauri, which are used to
graphically represent the knowledge.
Some of the visual tools are:
FreeMind
Thinkmap
Visual Thesauri
21. February 1, 2015 21
4. Study of KOS in Digital Libraries
4.1 Digital Library: An Overview
emergence of digital libraries - 1990s
Digital Library Federation (2002) defines DL as:
... organizations that provide the resources, including the specialised
staff, to select, structure, offer intellectual access to, interpret,
distribute, preserve the integrity of, and ensure the persistence over
time of collections of digital works so that they are readily available
for use by a defined community or set of communities.
This definition involves three key components, which constitute
the theoretical framework underlying digital libraries, namely:
people;
information resources; and
technology.
An investigation into research areas that have recently been
explored or found challenging throws up issues in all three
areas.
22. February 1, 2015 22
4.2 Motivation for the work
Currently the kinds of KOS being deployed
varies in DL.
Hence there is a need to understand in depth
about the knowledge organization (KO) tools,
such as library classifications, thesauri and
subject heading systems and the Knowledge
representation (Ontologies) that are being
deployed in Digital Libraries.
This study focuses on the need for knowledge
organization (KO) tools, such as library
classifications, thesauri and subject heading
systems, to be fully disclosed and available in
the open network environment.
23. February 1, 2015 23
4.3 Research Gap
KOS refers to a range of tools used for
organisation, classification and retrieval of
knowledge in a general sense. Digital library
researchers operating in different contexts have
investigated the potential of these tools for
different purposes. Some of the applications are:
use of thesauri and classification systems for cross-browsing
and cross-searching across various digital collections;
creation of ontologies using existing thesauri;
classification systems and specialised controlled vocabularies
to provide a general knowledge-representation facility for
digital collections with a diverse range of materials; and
use of taxonomies to provide unified and organised access to
different digital repositories through describing different
layers of the digital collections.
Contd..
24. February 1, 2015 24
Based on the study conducted and data analysed, the
result shows most of the DLs are metadata driven.
Each DLs based on their collection use different set
of standards, KOS systems like – classification,
categorisation, search concepts, etc. shows
difference from one DL to another.
Few of them have used customised classification
schemes and very few have used DDC/BSO/UDC,
etc.
The same way some have used Thesaurus tools for
the Thesaurus construction.
Search facilities are provided by deploying search
engines.
Categorisation has also deployed in some DLs.
The proposed study will try to address this gap.
Contd..
25. February 1, 2015 25
4.4 Need of the study
The DLs are a decade old now.
The KOS development in DL shows difference in the
components of KOS like term lists, Classification
and categories and relationship list components in
each category and their deployment.
Also there are no norms in the use of KOS.
Hence a need is felt to fill aforesaid gap.
There is a need to carry out a systematic study of
KOS adopted in different DLs in the world.
Digital Libraries normally use in their KOS some set
of standards, schemes and search strategies. These
will differ from one system to another. Hence there
is a need to study these concepts.
26. February 1, 2015 26
4.5 Objectives of the study
1. To study the various Knowledge Organisation
Systems (KOS) and methods of KO that are
developed and deployed for organising Digital
Libraries (DLs)
Using Thesaurus
Concept Maps
Visualisation, etc
2. To study major DLs and the KOS used in those DLs
Ex: IEEE, ACM, California DL, Alexandria DL, National Science
DL, etc...... Approx. 20-25 No.s
KOS in these DLs
Overview of DL in general:
KO in Organisation
How search happens
Keyword/Boolean
3. Conceptualisation/Visualisation/ThinkMaps, etc
All these observations have been given in the attached Appendix.
27. February 1, 2015 27
4.6 Scope of the Study
This study limits itself to the major DLs of the
world.
The data collected will be based on the web
sites, persons contacted and other related
resources.
The scope of the study is limited to the
application of KOS deployed in major DLs of
the World.
28. February 1, 2015 28
4.7 Methodology of the study
During this study, the literature search was carried out
by visiting primarily digital library websites and other
related sources. Also contacted the respective
Department Heads/concerned persons in the field of
Digital Libraries, DLs through emails/personal contact
to collect the information about KOS and DL in the
world. The collected information from these case studies
were studied and analysed.
Still need to consult primary sources such as journals,
reports, and conference proceedings etc., related to the
research topic.
For this study, Case study method is used as a research
tool.
Case Study Method as a research tool –
“..A case study is an examination of a specific phenomenon
such as a program, an event, a person, a process, an
institution or social group. The bounded system, or case,
might be selected because is an instance of some concern,
issue or hypothesis.” (Merriam, 1988). Contd..
29. February 1, 2015 29
Selection Criteria:
The selection of cases includes
Major Digital Libraries in the world will be selected on the basis of
General, Consortia, Universities, National, etc. and also based on the
type of the collections.
Data Collection
Based on the above criteria
Study the selected cases of DLs and data will be collected, analysed
and formulated in the form of report.
Personal/e-mail interaction with concerned heads for data collection.
To sum up the following methodology/strategies would be
adopted:
Case study method
Interaction with the Heads/persons concerned
Observation Method
Analytical & Comparative study.
Presently, 32 DLs have been studied and the details of the
same have been mentioned in the attached Appendix. .
Contd..
30. February 1, 2015 30
4.8 Review of Literature
While collecting the data, the following questions were
discussed with the respective heads and the received answers
which are mentioned in the attached Appendix. Viz.
Major DLs in the world
Classification adopted – old schemes/Customized
Type of KOS used in their DLs
How search works? Which search engine is being used?
How categorisation happens? Any categorisation tools used?
How KOS have been applied for those DLs
The following actions took to collect information on above
mentioned points are :
Sent e-mails to all DL and KOS experts, Universities and persons
concerned.
Participated in the DL and KOS mailing lists and discussed about the
topic and sought for latest information on DL and KOS facts
worldwide & collected feedback on the same.
Discussed with commercial database owners over e-mails, phone and
received good feedback.
Feedback and outcome of the discussion have been analysed and
formulated in the appendix attached.
31. February 1, 2015 31
Presently, 32 DLs were selected for the study. The collected
data have been analysed and formulated in the below table to
show how KOS has been deployed in those DLs.
Based on the study conducted and data analysed, the result shows most of the
DLs are metadata driven.
Each DLs based on their collection use different set of standards, KOS systems
like – classification, categorisation, search concepts, etc. shows difference from
one DL to another.
Few of them have used customised classification schemes and very few have used
DDC/BSO/UDC, etc.
The same way some have used Thesaurus tools for the Thesaurus construction.
Search facilities are provided by deploying search engines.
Categorisation has also deployed in some DLs.
4.9 Summary of the Study:
Subject
Categorisation
Classification Visualisation/
Conceptualisation
Thesaurus/
Gazetteer
Search
Interface
Auto-
Categorization
Broad Subject - 5
Alphabetical – 1
UDC/DDC – 1
Types of
Materials – 5
3 3 Search - 13
Google- 1
Swish - Indexing -1
Distributed Search-1
1
For more details please refer attached Appendix.
32. February 1, 2015 32
5. Conclusion
Knowledge organisation system refer to a range of traditional
and nontraditional systems for the organization of knowledge.
The systems have been developed in numerous environments
outside the traditional library environment, including those of
A&I services, publishers and professional organizations, and
corporations. Examples exist in many disciplines and for
many target audiences.
KOSs can enhance the digital library in a number of ways.
They can be used to connect a digital library resource to a
related resource.
A KOS can make digital library materials accessible to
disparate communities.
Organization of knowledge on web is still a challenge and we
have to go a long way in this area. The main challenge is to
include artificial intelligence based systems, searching and
retrieval. Search engines would be efficient if pragmatic
analysis is in-built. But it will take time.
33. February 1, 2015 33
6. Appendix
Presently, 32 DLs have been studied and
planning to study some more DLs during
the research process. The details of the
same have been mentioned in the attached
Appendix.
DLs are categorised on the basis of: General,
Universities, National, Consortia, etc. and
also based on the type of the
collections.
Please refer comparative study attached for
more details.