SlideShare une entreprise Scribd logo
1  sur  90
Télécharger pour lire hors ligne
Casting Our Eyes
Over the Threads of the
Cataloguer’s Work: Population
Perspective in Metadata Research

Joseph T. Tennis
University of Washington
Evolution and Variation of Classification Systems
KnoweScape Workshop March 4-5, 2015 Amsterdam
The question before us
What is the nature of the evolution* and
variation among knowledge organization
systems (KOS)?

Corollary questions
Is this a simple space or a complex space?
How often does it change?
Can we engender a common vocabulary to
describe this space?
*NB: evolution can be considered a loaded term by some – that is it
could be interpreted as fit for survival, and that is not what is intended
here. I often use change in lieu of evolution to clarify this.
The question before us
There are very practical reasons why we
want to ask this question.

Interoperability (sometimes called alignment*)
With widespread, yet still hopeful, collaboration
across cultural heritage sectors – those with rich
KOS, and with further development across a range
of sectors we must understand this problem of how
KOS interoperate, clarify its pressing issues, and
perhaps even incorporate this into formal
education.
 *Alignment in my mind suggests more
similarities than differences, and this seems
presumptuous
The question before us
There are very practical reasons why we
want to ask this question.

Digital Preservation
Digital preservation is not simply the storage of
material on hard disk it is also the system of
policies and practices that guarantee digital
material a usable future. In service of that goal we
need to understand changes in our KOS.
The question before us
There are very practical reasons why we
want to ask this question.

Application Variations (repurposing)
By examining evolution and variety we can also
better evaluate particular applications of KOS. It
is one thing to study the standard, the ideal type, of
the KOS, but it is another see how different
institutions, sectors, and projects install and
perhaps alter that ideal type.
The question before us
I have, elsewhere, called the examination of
these phenomenon, how we change KOS
change over time and repurpose them, as
second-order problems [0]. The same is true for
designing for KOS interoperability.

This is because, in my mind, the first order is how to
design the KOS ex nihilo. And in many ways we
understand this problem of KOS design.
The question before us
So we are left to examine this universe of
KOS, how it changes, and the aspects of its
variety.

Now we can frame the question, and
establish what, from my perspective, we
know at this point.

We can then outline ways forward both in
research and development.
Outline
KOS as the product of problem-solving
Design of Metadata and Indexing
Languages
Metadata in the Wild
Time and Variety
Population Perspective and a Metadata
Observatory
KOS as the Product
of Problem-Solving
KOS as the Product of
Problem-Solving
Ben Good has claimed that we are no in a
Cambrian Age of KOS [1].

In this context many different folks are trying to
solve the information organization problem.

Each of them has approached it from their
perspective, disciplinary biases, and using tools they
are familiar with (e.g., library classification, Protégé,
web browser bookmarks).
This is the first reason we might take a
population perspective in the study of
metadata. Namely, we expect variety.

For example, there was some debate in the
late 90s on whether or not ontologies were
the reinvention of classification. Vickery
took this up in 1997 and Soergel in 1999,
with Gilchrist taking a bird’s eye view in
2003 [2, 3, 4].
KOS as the Product of
Problem-Solving
As we read these accounts it becomes clear
that there are differences that make a
difference. And we are still discussing these
concepts, from various perspectives, in the
literature (cf., Barcellos Almeida, 2013 [5]).

And it is true that many think that such
variety is nothing by reinvention – recasting
old concepts and practices in new language.
Michael Gorman is one of these folks [6, 7].

KOS as the Product of
Problem-Solving
And yet, there are contexts where there is no
difference made.
KOS as the Product of
Problem-Solving
Are there differences
made in this LOV
service?
I have done some work on this problem. I
will introduce it a bit later. I have called it
framework analysis [8, 9, 10].

Suffice it to say here, that I believe it is useful
and generous to consider the program of
creating KOS as problem solving done by
many folks in many contexts.

KOS as the Product of
Problem-Solving
Design of Metadata and
Indexing Languages – First
Order KOS Work
Design of Metadata and
Indexing Languages
Metadata

Machine and human readable assertions

about resources.
Indexing Languages

A set of representations, that is systematically

ordered, that provides access to the

conceptual content, and indicates or

establishes relationships, between terms to

denote concepts and between natural

language and terms used to denote concepts
Design of Metadata and
Indexing Languages
Indexing languages are, in my mind, the superclass
under which thesauri, classification schemes,
ontologies, taxonomies of various sorts hang.

Having said that, indexing languages can and are
used for other things than indexing. But we’ll not
take that up in this talk.* Soergel [3] offers a good
starting list of functions.
*But these may be of interest to
studying a wide variety of metadata
– articulating fully their purposes
Design of Metadata and
Indexing Languages
Metadata, confusingly, sometimes simply refers to
one subset of KOS or sometimes to the whole
universe of KOS.

This requires that we further clarify the form and
function that we assume we find in the universe of
KOS.

NB: KOS in my mind is both metadata AND
indexing languages.
Design of Metadata and
Indexing Languages
For me, metadata is human and machine readable
assertions about resources, where resources are the
W3C definition of anything with an identity.

Your definition may differ, and that is perhaps part
of our building a common vocabulary. So let’s
discuss. 

However, I do not find it important to retrofit non-
machine readable description into the definition of
metadata. It has its own names (e.g., cataloguing).
Design of Metadata and
Indexing Languages
It has been helpful in the context of Dublin Core
Metadata work to clarify between schemes and
schemas. These are naïve distinctions, if you will,
made of convenience, and so through more
thorough research may be revised; but in this
context it is helpful, I think to distinguish between
the attributes of a resource and values you might
use to describe that resource.
Design of Metadata and
Indexing Languages
Attribute: Value
Author: Joseph T. Tennis
Subject: Evolution of KOS
Drawn from a schema: Drawn from a scheme (or
not)

We may find these don’t work well in some
contexts, but let’s try it out for now.
Design of Metadata and
Indexing Languages
Review
Metadata
Indexing Languages
KOS
Schemas
Schemes
Design of Metadata and
Indexing Languages
There is a large literature on the right way to
design metadata and indexing languages.
There is good reason for this, and it is a
useful body of literature. For one thing it is
not as straightforward as one might assume
to construct an indexing language.
Design of Metadata and
Indexing Languages
Whether one consults the literature or not,
the result of trying to solve problems in
information organization results in some
form of KOS. 

And they are out there. Multiplying and
evolving.
Metadata in the Wild
Metadata in the Wild
If we take away the research on the design of
KOS, we are left with the literature that
describes how it is implemented, maintained,
and evaluated.

We are also left with literature that reads KOS
in particular ways.
Metadata in the Wild
In both of these cases we are talking about
metadata in the wild. 

In 2005 we saw a declaration in the form of
a call for papers by Jack Andersen of the
then Royal School calling for, what I now
term, a descriptive turn in knowledge
organization research.
Metadata in the Wild
He said,
“Much classification research, and
knowledge organization research in general,
has tended to be concerned with rules,
principles, standards or techniques; that is,
with prescriptive issues. This workshop will
focus on descriptive issues,” [11].
Metadata in the Wild
Of course we had seen work well before this
time that could be described as descriptive
rather than prescriptive as well. We could
cite Richardson’s bibliography from 1901 or
earlier works that inventoried extant schemes
[12, 13].
Metadata in the Wild
And Bowker and Star have been famously
critical of decisions of classification as
infrastructure – where professional work
around changing what was there or in
faithfully representing controversial topics is
seen as compromise and therefore fruitful for
investigation. For example, representing the
full range of nurses work from medical
procedures to counseling is not
straightforward [14].
Metadata in the Wild
And finally, both Melanie Feinberg’s work
and Melissa Adler’s work, while quite
different, provide us ways in which we can
read KOS as authored rhetorical arguments
or institutions of dominance, power, and
instruments that promulgate particular
worldview if not prejudice, respectively [15,
16]
NB: Both at Local/Global Knowledge
Organization Workshop in Copenhagen
in August
Metadata in the Wild
And it is in this context, that we again ask the
question and its corollaries.

What is the nature of the evolution and variation
among knowledge organization systems (KOS)?

Corollary questions
Is this a simple space or a complex space?
How often does it change?
Can we engender a common vocabulary to
describe this space?
Metadata in the Wild
And it is here that we can begin to discuss what has
been done and how we might go forward.
Time and Variety
Time and Variety
Time
I think it is safe to assume that we all know that
KOS change over time. We revise, edit, sunset,
phoenix, and otherwise rework our schemas and
schemes.

I have been curious about this since 2002.
Time and Variety
In an ISKO paper I looked at the entry from
EUGENICS relative index of the DDC at two points
in time, at edition 16 and edition 20. This simple
case study was enough to demonstrate there is
sometimes dramatic change in long-lived large
indexing languages.

I wanted to learn more.
Time and Variety
For those that do not know, EUGENICS is the body
of knowledge and the practice of creating better
human beings through selective breeding and
sterilization measures. It was once considered, by
the DDC to be a biological science. It is now a
widely debunked science, but the term persists in
many different contexts (even legitimate scientific
ones).
Time and Variety
It makes sense that if I was curious to see how
indexing languages (schemes) change over time I
could use this example and a couple of other
subjects to see how things change.

To that end I began data collection. This took a
village, but it was fun and worth the effort.
Time and Variety
We reviewed all editions of DDC for Eugenics and
Anatomy*

We identified where in the classification we could
find these subjects from 1876-2010. These were
often in different places (because of the nature of
DDC – variety cue!), but it showed us where
cataloguers might put books on these subjects.
*Among others, like Gypsies, Algebra,
Woman, Civil Disobedience, etc.
Time and Variety
DDC 1911 Ed. 7
DDC 1979 Ed. 19
Time and Variety
The second set of data were gathered using Z39.50
protocol, harvesting MARC records from 572
catalogues that both (1) used EUGENICS or
ANATOMY as a first subject heading (in the 650 field
of the MARC record, the subject added entry for
topics) [17], and (2) used the DDC in the 082 field
of the MARC record. After automatically
removing duplicate records we were left with c. 927
records for EUGENICS and c. 1965 for ANATOMY.
Time and Variety
Time and Variety
Combining this data would give us insight into
where some cataloguers were putting books on
EUGENICS and ANATOMY.
Time and Variety
A note about data, and this data specifically is that
it is MESSY and we do not necessary trust our
sources. So at best this is an exploratory look at
this phenomenon and we should improve on
methods of data collection and analysis.
Time and Variety
In this dataset we have 
Date derived from LCCN
DDC class number
Date of publication
Date of publication cleaned (removing c. etc.)
Year differing between LCCN date and pub. date
Title
Server
Abridged notation present or not
Classification edition number if present
Record from Library of Congress?
Total count of identical records
Time and Variety
In this dataset we have 
Date derived from LCCN
DDC class number
Date of publication
Date of publication cleaned (removing c. etc.)
Year differing between LCCN date and pub. date
Title
Server
Abridged notation present or not
Classification edition number if present
Record from Library of Congress?
Total count of identical records 


DDC edition date
DDC classes possible
Discontinued classes
See alsos
Edition number
Notes
Time and Variety
We can now line these two datasets up and
explore our question about subject change
over time.

That is, we can see its ontogeny.

Ontogeny is the totality of changes of an
individual of a species from conception to
full maturation.
Time and Variety
Time and Variety
Time and Variety
Time and Variety
Time and Variety
Time and Variety
Time and Variety
Time and Variety
Time and Variety
Time and Variety
Time and Variety
Time and Variety
Time and Variety
Time and Variety
There are many questions that can be asked
of this data and I will be talking more about
this tomorrow. I have some things here in
appendixes if we have time. I can also
provide citations.
Time and Variety
Variety
Now we can talk about variety in this
context. This is a harder problem for me,
because there may be infinite ways we
describe variety in KOS.
Time and Variety
Let’s take a (potentially) simple example.

What is the difference and similarity between

Descriptor Set (Mooers)

Thesauri

Classification Schemes 

Schemes for Classification (Ranganathan)

Ontologies 

Folksonomies?
Time and Variety
In the past I have looked at this in two ways.
By establishing a hierarchical or nested
method of comparative analysis

Through exploratory naïve linguistic
expression.
Time and Variety
I have tried to establish rubrics or
frameworks whereby we could lay various
standards of KOS against. These
frameworks include [18]:

Structure

Work Practices

Discourse
Time and Variety
Elsewhere, Elin Jacob and I say,
“The structure of a social tagging system, a metadata
scheme, or an indexing language must be understood within
the framework in which it occurs. The information
organization framework itself is comprised of three distinct
but interrelated components: the discourse that establishes
the goals, priorities and values of the system; the work
practices involved in the application and maintenance of
the system; and the structure that instantiates both the
discourses underlying the framework and the work practices
that make it visible,” [10].
Time and Variety
Elsewhere, Elin Jacob and I say,
“For example, ontology curation (or engineering) is an
information organization framework, and the Gene
Ontology (GO) is a specific instance of ontology curation.
The discourses revolving around GO reflect the fact that its
work practices are focused on representation of the natural
(or biological) world; and the structure of GO is therefore
informed by this scientific and representationalist focus and
the work practices and discourses that follow from that
focus.” [10].
Time and Variety
In an earlier project trying to make sense of the then
popular social tagging work (folksonomies), I tried to
compare that work to cataloguing in a similar way.
Time and Variety
[18]
Time and Variety
The second way I have tried to characterize similarities and
differences has been with naïve linguistic expression.

In this exercise, Ben Good and I were trying to see if there
was a way to quantify a gold standard of indexing languages,
such that through automatic inspection we could assess and
modify those that were not satisfactory. 

I must say that I was not convinced this was the right way to
go, but I was curious about what clusters would form and
why when we reduced all indexing languages to a bag of
terms and ran analysis over them.
Time and Variety
Excerpt
from [1]
Time and Variety
Excerpt
from [1]
0	
  
0.1	
  
0.2	
  
0.3	
  
0.4	
  
0.5	
  
0.6	
  
0.7	
  
0.8	
  
0.9	
  
1	
  
%	
  OLP	
  uniterms:	
  
%	
  OLP	
  duplets:	
  
%	
  OLP	
  triplets:	
  
%	
  OLP	
  quadplus:	
  
OLP	
  flexibility:	
  
%	
  containsAnother:	
  
%	
  containedByAnother:	
  
Number	
  disInct	
  terms:	
  
Mean	
  Term	
  Length	
  
Max	
  Term	
  Length	
  
Min	
  Term	
  Length	
  
Median	
  Term	
  Length	
  
Standard	
  DeviaIon	
  -­‐	
  Term	
  
Length	
  
Skewness	
  -­‐	
  Term	
  Length	
  
Coefficient	
  of	
  variaIon	
  -­‐	
  
Term	
  Length	
  
OLP	
  max	
  number	
  sub	
  
terms	
  per	
  term	
  
OLP	
  mean	
  number	
  sub	
  
terms	
  per	
  term	
  
OLP	
  median	
  number	
  sub	
  
terms	
  per	
  term	
  
21	
  Connotea	
  
Time and Variety
Excerpt
from [1]
0	
  
0.1	
  
0.2	
  
0.3	
  
0.4	
  
0.5	
  
0.6	
  
0.7	
  
0.8	
  
0.9	
  
1	
  
%	
  OLP	
  uniterms:	
  
%	
  OLP	
  duplets:	
  
%	
  OLP	
  triplets:	
  
%	
  OLP	
  quadplus:	
  
OLP	
  flexibility:	
  
%	
  containsAnother:	
  
%	
  containedByAnother:	
  
Number	
  disInct	
  terms:	
  
Mean	
  Term	
  Length	
  
Max	
  Term	
  Length	
  
Min	
  Term	
  Length	
  
Median	
  Term	
  Length	
  
Standard	
  DeviaIon	
  -­‐	
  Term	
  
Length	
  
Skewness	
  -­‐	
  Term	
  Length	
  
Coefficient	
  of	
  variaIon	
  -­‐	
  
Term	
  Length	
  
OLP	
  max	
  number	
  sub	
  
terms	
  per	
  term	
  
OLP	
  mean	
  number	
  sub	
  
terms	
  per	
  term	
  
OLP	
  median	
  number	
  sub	
  
terms	
  per	
  term	
  
16	
  CHEBI	
  
Time and Variety
Excerpt
from [1]
0	
  
0.1	
  
0.2	
  
0.3	
  
0.4	
  
0.5	
  
0.6	
  
0.7	
  
0.8	
  
0.9	
  
%	
  OLP	
  uniterms:	
  
%	
  OLP	
  duplets:	
  
%	
  OLP	
  triplets:	
  
%	
  OLP	
  quadplus:	
  
OLP	
  flexibility:	
  
%	
  containsAnother:	
  
%	
  containedByAnother:	
  
Number	
  disInct	
  terms:	
  
Mean	
  Term	
  Length	
  
Max	
  Term	
  Length	
  
Min	
  Term	
  Length	
  
Median	
  Term	
  Length	
  
Standard	
  DeviaIon	
  -­‐	
  Term	
  
Length	
  
Skewness	
  -­‐	
  Term	
  Length	
  
Coefficient	
  of	
  variaIon	
  -­‐	
  Term	
  
Length	
  
OLP	
  max	
  number	
  sub	
  terms	
  
per	
  term	
  
OLP	
  mean	
  number	
  sub	
  terms	
  
per	
  term	
  
OLP	
  median	
  number	
  sub	
  
terms	
  per	
  term	
  
1	
  MeSH	
  PrefLabels	
  	
  
Time and Variety
Excerpt
from [1]
0	
  
0.2	
  
0.4	
  
0.6	
  
0.8	
  
1	
  
%	
  OLP	
  uniterms:	
  
%	
  OLP	
  duplets:	
  
%	
  OLP	
  triplets:	
  
%	
  OLP	
  quadplus:	
  
OLP	
  flexibility:	
  
%	
  
%	
  
Number	
  disInct	
  
Mean	
  Term	
  Length	
  
Max	
  Term	
  Length	
  
Min	
  Term	
  Length	
  
Median	
  Term	
  
Standard	
  DeviaIon	
  
Skewness	
  -­‐	
  Term	
  
Coefficient	
  of	
  
OLP	
  max	
  number	
  
OLP	
  mean	
  number	
  
OLP	
  median	
  
20	
  Bibsonomy	
  
0	
  
0.2	
  
0.4	
  
0.6	
  
0.8	
  
1	
  
%	
  OLP	
  
%	
  OLP	
  duplets:	
  
%	
  OLP	
  triplets:	
  
%	
  OLP	
  
OLP	
  flexibility:	
  
%	
  
%	
  
Number	
  disInct	
  
Mean	
  Term	
  
Max	
  Term	
  
Min	
  Term	
  
Median	
  Term	
  
Standard	
  
Skewness	
  -­‐	
  
Coefficient	
  of	
  
OLP	
  max	
  
OLP	
  mean	
  
OLP	
  median	
  
21	
  Connotea	
  
0	
  
0.2	
  
0.4	
  
0.6	
  
0.8	
  
1	
  
%	
  OLP	
  uniterms:	
  
%	
  OLP	
  duplets:	
  
%	
  OLP	
  triplets:	
  
%	
  OLP	
  quadplus:	
  
OLP	
  flexibility:	
  
%	
  containsAnother:	
  
%	
  
Number	
  disInct	
  
Mean	
  Term	
  Length	
  
Max	
  Term	
  Length	
  
Min	
  Term	
  Length	
  
Median	
  Term	
  Length	
  
Standard	
  DeviaIon	
  -­‐	
  
Skewness	
  -­‐	
  Term	
  
Coefficient	
  of	
  
OLP	
  max	
  number	
  sub	
  
OLP	
  mean	
  number	
  
OLP	
  median	
  number	
  
22	
  CiteUlike	
  
Time and Variety
I do not know if these means anything, but I have kept
collecting similar data.

I have about 36 single versions
of this data. Including English
dictionaries.

And here is where the two come 
together. I need multiple versions
to make sense of this over time.
Population Perspective
and a
Metadata Observatory
Population Perspective and a
Metadata Observatory
I have tried to demonstrate through my past research
that there is sufficient reason to investigate KOS from
a population perspective. 

We have a wide range of standards, types, and a
potentially even wider range of implementations that
change over time.

In order for us to better understand this universe I
believe we need to work toward a metadata
observatory.
Population Perspective and a
Metadata Observatory
Like scanning the night sky for different instances of
blue dwarf stars or gassy giant planets, we can look
for various instances of schemes and schemas. 

We can then see how they change over time. How
they are similar to or different from others.

Currently I’m interested in wikipedia’s category
system and its nature and changes. I’m also
interested in building a view of all the DDC numbers
in use. There would be a lot we could see from a
metadata observatory.
The question before us
What is the nature of the evolution* and
variation among knowledge organization
systems (KOS)?

Corollary questions
Is this a simple space or a complex space?
How often does it change?
Can we engender a common vocabulary to
describe this space?
*NB: evolution can be considered a loaded term by some – that is it
could be interpreted as fit for survival, and that is not what is intended
here. I often use change in lieu of evolution to clarify this.
Population Perspective and a
Metadata Observatory
Possible features of this observatory might be:

Real Time Metadata Feeds

Metadata Viz

Run Analysis on Metadata

Metadata Maps (geographic and conceptual)

Upload Your Metadata

Version Comparisons
Thank you
jtennis@uw.edu
Joseph T. Tennis
University of Washington
Evolution and Variation of Classification Systems
KnoweScape Workshop March 4-5, 2015 Amsterdam
Appendix A.
Time and Variety
Now that we have these visualizations in our
minds (perhaps), we can talk about 
Semantic Gravity
Collocative Integrity
Appendix A.
Time and Variety
Semantic Gravity
Cataloguer privileges collection over updated
scheme (theory)

Collocative Integrity
Degree to which scheme comports with
cataloguing practice
Time and Variety
Appendix A.
Time and Variety
0%	
  
20%	
  
40%	
  
60%	
  
80%	
  
100%	
  
1899	
   1911	
   1913	
   1919	
   1922	
   1927	
   1932	
   1942	
   1951	
   1958	
   1965	
   1971	
   1979	
   1989	
   1991	
   2003	
  
Anatomy	
  
Old	
  
Out	
  
In	
  
0%	
  
20%	
  
40%	
  
60%	
  
80%	
  
100%	
  
1899	
   1911	
   1913	
   1915	
   1919	
   1922	
   1927	
   1932	
   1942	
   1951	
   1958	
   1965	
   1971	
   1979	
   1989	
   1996	
   2003	
  
Eugenics	
  
Old	
  
Out	
  
In	
  
[19]
Appendix A.
Time and Variety
0%	
  
20%	
  
40%	
  
60%	
  
80%	
  
100%	
  
1899-­‐2003	
  
Eugenics	
  
Old	
  
Out	
  
In	
  
0%	
  
20%	
  
40%	
  
60%	
  
80%	
  
100%	
  
1899-­‐2003	
  
Anatomy	
  
Old	
  
Out	
  
In	
  
[19]
References
0 Tennis, J. T. (2010). Form, Intention, and Indexing: The Liminal and Integrated
Conceptions of Work in Knowledge Organization. In Advances in Classification
Research. Vol. 21. Available:
http://journals.lib.washington.edu/index.php/acro/issue/archive
1 Good, B. M. & Tennis, J. T. (2009). Term based comparison metrics for controlled and
uncontrolled indexing languages. In Information Research 14(1). Available:
http://www.informationr.net/ir/14-1/paper395.html
2 Vickery, B. V. (1997). Ontologies. In Journal of Information Science 23(4): 277-286.
3 Soergel, D. (1999). The rise of ontologies or the reinvention of classification. In JASIST
50(12): 1119-1120.
4 Gilchrist, A. (2003). Thesauri, taxonomies and ontologies – an etymological note. In
Journal of Documentation 59(1): 7-18.
5 Barcellos Almeida, M. (2013). Revisiting Ontologies: A Necessary Clarification. In
JASIST 64(8): 1682-1693.
6 Gorman, M. (1990). A Bogus and Dismal Science; or, the Eggplant That Ate Library
Schools. In American Libraries 21(5): 463-465.
7 Gorman, M. (1999). Metadata or cataloguing? In Journal of Internet Cataloging 2: 5-22.
References
8 Tennis, J. T. (2006). Comparative Functional Analysis of Boundary Infrastructures,
Library Classification, and Social Tagging. In Information Science Revisited: Approaches to
Innovation. Proceedings of the Annual Meeting of the Canadian Association for
Information Science/L'Association canadienne des sciences de l'information. York
University, Toronto.
9 Tennis, J. T. (2006). Function, Purpose, Predication, and Context of Information
Organization Frameworks. In Knowledge Organization for a Global Learning Society: Proceedings of
the 9th International Conference for Knowledge Organization. International Society for Knowledge
Organization 9th International Conference. (Vienna, Austria. Jul, 2006). Advances in Knowledge
Organization vol 10. Ergon. Würzburg: 303-310.
10 Tennis, J. T. and Jacob, E. K. (2008). "Toward a Theory of Structure in Information
Organization Frameworks." (2008). In Culture and Identity in Knowledge Organization:
Proceedings of the 10th International Conference for Knowledge Organization. (Montreal, Quebec
August 5-8, 2008). Advances in Knowledge Organization vol. 11. Ergon: Würzburg:
262-268.
11 Andersen, J. (2005). Call for papers. 16th ASIS&T SIG-CR Classification Research
Workshop, 2005, “What knowledge organization does and how it does it: Critical Studies
in and of Classification and Indexing.” Available: http://dhhumanist.org/Archives/
Virginia/v18/0597.html
References
12 Ricahrdson, E. C. (1901). Classification: Theoretical and Practical. Scribner’s Sons.
13 Horne, T. H. (1825). Outlines for the classification of a library; respectfully submitted to the
consideration of the trustees of the British Museum. G. Woodfall.
14 Bowker, G. and Star, S. L. (2000). Sorting Things Out: Classification and Its Consequences.
MIT Press., 
15 Feinberg, M. (2011). How information systems communicate as documents: the
concept of authorial voice. Journal of Documentation 67(6), 1015-1037.
16 Adler, M. (2015). Broker of Information, the “Nation’s Most Important Commodity”:
The Library of Congress in the Neoliberal Era. In Information and Culture 50(1): 24-50.
17 Library of Congress. (2007). 650-Subject Added Entry –Topical Term. http://
www.loc.gov/marc/bibliographic/bd650.html
[18] Tennis, J. T. (2006). Social tagging and the next steps for indexing. In Advances in
classification research, Vol. 17: Proceedings of the 17th ASIS&T SIG/CR Classification Research
Workshop (Austin, TX, November 4, 2006), ed. Jonathan Furner and Joseph T. Tennis.
[19] Tennis, J. (2013). Collocative Integrity and Our Many Varied Subjects: What the
Metric of Alignment between Classification Scheme and Indexer Tells Us About
Langridge’s Theory of Indexing. NASKO, 4(1). Retrieved from http://
journals.lib.washington.edu/index.php/nasko/article/view/14660

Contenu connexe

En vedette

Presentació pares modificada
Presentació pares modificadaPresentació pares modificada
Presentació pares modificadapowersMontse
 
6 tips to help you float through a depressive episode
6 tips to help you float through a depressive episode6 tips to help you float through a depressive episode
6 tips to help you float through a depressive episodeJocelyn Steffen
 
The Statue of Liberty in New York is Symbol of Friendship
The Statue of Liberty in New York is Symbol of FriendshipThe Statue of Liberty in New York is Symbol of Friendship
The Statue of Liberty in New York is Symbol of Friendshipgreedycabin1256
 

En vedette (7)

Presentació pares modificada
Presentació pares modificadaPresentació pares modificada
Presentació pares modificada
 
Care Givers Proposal
Care Givers ProposalCare Givers Proposal
Care Givers Proposal
 
Responsive Website Design
Responsive Website DesignResponsive Website Design
Responsive Website Design
 
6 tips to help you float through a depressive episode
6 tips to help you float through a depressive episode6 tips to help you float through a depressive episode
6 tips to help you float through a depressive episode
 
PKP'S MANY HATS
PKP'S MANY HATSPKP'S MANY HATS
PKP'S MANY HATS
 
FenimorePortfolio_MURP
FenimorePortfolio_MURPFenimorePortfolio_MURP
FenimorePortfolio_MURP
 
The Statue of Liberty in New York is Symbol of Friendship
The Statue of Liberty in New York is Symbol of FriendshipThe Statue of Liberty in New York is Symbol of Friendship
The Statue of Liberty in New York is Symbol of Friendship
 

Similaire à Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

GLIT 6757 (PEI) Winter 2012: Seminar 2
GLIT 6757 (PEI) Winter 2012: Seminar 2GLIT 6757 (PEI) Winter 2012: Seminar 2
GLIT 6757 (PEI) Winter 2012: Seminar 2Michele Knobel
 
GLIT mississauga, Seminar 2
GLIT  mississauga, Seminar 2GLIT  mississauga, Seminar 2
GLIT mississauga, Seminar 2Michele Knobel
 
TOPIC BASED ANALYSIS OF TEXT CORPORA
TOPIC BASED ANALYSIS OF TEXT CORPORATOPIC BASED ANALYSIS OF TEXT CORPORA
TOPIC BASED ANALYSIS OF TEXT CORPORAcsandit
 
Discussion Reponses Needed150-200 words each (3 post total)R
Discussion Reponses Needed150-200 words each (3 post total)RDiscussion Reponses Needed150-200 words each (3 post total)R
Discussion Reponses Needed150-200 words each (3 post total)RLyndonPelletier761
 
ENG333 Week 8 writing the method and results
ENG333 Week 8 writing the method and resultsENG333 Week 8 writing the method and results
ENG333 Week 8 writing the method and resultsDr. Russell Rodrigo
 
Re-negotiating the borders and boundaries of composition
Re-negotiating the borders and boundaries of composition Re-negotiating the borders and boundaries of composition
Re-negotiating the borders and boundaries of composition jjory7
 
Discourse Analysis for Social Research
Discourse Analysis for Social ResearchDiscourse Analysis for Social Research
Discourse Analysis for Social ResearchDominik Lukes
 
Adaptive Governance 2016 - final lecture
Adaptive Governance 2016 - final lectureAdaptive Governance 2016 - final lecture
Adaptive Governance 2016 - final lectureVictor Galaz
 
Applying A Validity Argument To The Viva
Applying A Validity Argument To The VivaApplying A Validity Argument To The Viva
Applying A Validity Argument To The VivaAudrey Britton
 
Presentasi Jenis Essay EXPOSITORY TEXT.pptx
Presentasi Jenis Essay EXPOSITORY TEXT.pptxPresentasi Jenis Essay EXPOSITORY TEXT.pptx
Presentasi Jenis Essay EXPOSITORY TEXT.pptxBagusSetaInbaCipta
 
Taxonomy Development and Digital Projects
Taxonomy Development and Digital ProjectsTaxonomy Development and Digital Projects
Taxonomy Development and Digital Projects daniela barbosa
 
Collaborative Planning Model
Collaborative Planning ModelCollaborative Planning Model
Collaborative Planning ModelPiter Biswas
 
writing literature review
writing literature reviewwriting literature review
writing literature reviewAnies Syahieda
 
Examples Of Definition Essay
Examples Of Definition EssayExamples Of Definition Essay
Examples Of Definition EssayLindsay Adams
 
A Cognitive Process Theory Of Writing
A Cognitive Process Theory Of WritingA Cognitive Process Theory Of Writing
A Cognitive Process Theory Of WritingKelly Lipiec
 
Using reading in your writing
Using reading in your writingUsing reading in your writing
Using reading in your writingJAHennessyMurdoch
 

Similaire à Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research (20)

Pml 8
Pml 8Pml 8
Pml 8
 
GLIT 6757 (PEI) Winter 2012: Seminar 2
GLIT 6757 (PEI) Winter 2012: Seminar 2GLIT 6757 (PEI) Winter 2012: Seminar 2
GLIT 6757 (PEI) Winter 2012: Seminar 2
 
GLIT mississauga, Seminar 2
GLIT  mississauga, Seminar 2GLIT  mississauga, Seminar 2
GLIT mississauga, Seminar 2
 
GCRD 6353: Seminar 2
GCRD 6353: Seminar 2GCRD 6353: Seminar 2
GCRD 6353: Seminar 2
 
TOPIC BASED ANALYSIS OF TEXT CORPORA
TOPIC BASED ANALYSIS OF TEXT CORPORATOPIC BASED ANALYSIS OF TEXT CORPORA
TOPIC BASED ANALYSIS OF TEXT CORPORA
 
Discussion Reponses Needed150-200 words each (3 post total)R
Discussion Reponses Needed150-200 words each (3 post total)RDiscussion Reponses Needed150-200 words each (3 post total)R
Discussion Reponses Needed150-200 words each (3 post total)R
 
ENG333 Week 8 writing the method and results
ENG333 Week 8 writing the method and resultsENG333 Week 8 writing the method and results
ENG333 Week 8 writing the method and results
 
Pml 8
Pml 8Pml 8
Pml 8
 
Re-negotiating the borders and boundaries of composition
Re-negotiating the borders and boundaries of composition Re-negotiating the borders and boundaries of composition
Re-negotiating the borders and boundaries of composition
 
Discourse Analysis for Social Research
Discourse Analysis for Social ResearchDiscourse Analysis for Social Research
Discourse Analysis for Social Research
 
Adaptive Governance 2016 - final lecture
Adaptive Governance 2016 - final lectureAdaptive Governance 2016 - final lecture
Adaptive Governance 2016 - final lecture
 
Unit 4
Unit 4Unit 4
Unit 4
 
Applying A Validity Argument To The Viva
Applying A Validity Argument To The VivaApplying A Validity Argument To The Viva
Applying A Validity Argument To The Viva
 
Presentasi Jenis Essay EXPOSITORY TEXT.pptx
Presentasi Jenis Essay EXPOSITORY TEXT.pptxPresentasi Jenis Essay EXPOSITORY TEXT.pptx
Presentasi Jenis Essay EXPOSITORY TEXT.pptx
 
Taxonomy Development and Digital Projects
Taxonomy Development and Digital ProjectsTaxonomy Development and Digital Projects
Taxonomy Development and Digital Projects
 
Collaborative Planning Model
Collaborative Planning ModelCollaborative Planning Model
Collaborative Planning Model
 
writing literature review
writing literature reviewwriting literature review
writing literature review
 
Examples Of Definition Essay
Examples Of Definition EssayExamples Of Definition Essay
Examples Of Definition Essay
 
A Cognitive Process Theory Of Writing
A Cognitive Process Theory Of WritingA Cognitive Process Theory Of Writing
A Cognitive Process Theory Of Writing
 
Using reading in your writing
Using reading in your writingUsing reading in your writing
Using reading in your writing
 

Plus de COST Action TD1210

Paul Groth: Data Analysis in a Changing Discourse: The Challenges of Scholarl...
Paul Groth: Data Analysis in a Changing Discourse: The Challenges of Scholarl...Paul Groth: Data Analysis in a Changing Discourse: The Challenges of Scholarl...
Paul Groth: Data Analysis in a Changing Discourse: The Challenges of Scholarl...COST Action TD1210
 
Christophe Gueret: Publish Web data - an interactive session
Christophe Gueret: Publish Web data - an interactive sessionChristophe Gueret: Publish Web data - an interactive session
Christophe Gueret: Publish Web data - an interactive sessionCOST Action TD1210
 
Almila Akdag Salah: Looking at classification systems from the point of view ...
Almila Akdag Salah: Looking at classification systems from the point of view ...Almila Akdag Salah: Looking at classification systems from the point of view ...
Almila Akdag Salah: Looking at classification systems from the point of view ...COST Action TD1210
 
Toby Burrows: Vernacular Classification: Knowledge Organization in the Humani...
Toby Burrows: Vernacular Classification: Knowledge Organization in the Humani...Toby Burrows: Vernacular Classification: Knowledge Organization in the Humani...
Toby Burrows: Vernacular Classification: Knowledge Organization in the Humani...COST Action TD1210
 
Valentine Charles: Linking cultural heritage with KOS: the Europeana example
Valentine Charles: Linking cultural heritage with KOS: the Europeana example Valentine Charles: Linking cultural heritage with KOS: the Europeana example
Valentine Charles: Linking cultural heritage with KOS: the Europeana example COST Action TD1210
 
Richard Smiraglia: Empirical methods for knowledge evolution across Knowledge...
Richard Smiraglia: Empirical methods for knowledge evolution across Knowledge...Richard Smiraglia: Empirical methods for knowledge evolution across Knowledge...
Richard Smiraglia: Empirical methods for knowledge evolution across Knowledge...COST Action TD1210
 
Aida Slavic Managing KOS: Evolution of concepts and their representation
Aida Slavic Managing KOS: Evolution of concepts and their representationAida Slavic Managing KOS: Evolution of concepts and their representation
Aida Slavic Managing KOS: Evolution of concepts and their representationCOST Action TD1210
 
Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organi...
Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organi...Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organi...
Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organi...COST Action TD1210
 

Plus de COST Action TD1210 (8)

Paul Groth: Data Analysis in a Changing Discourse: The Challenges of Scholarl...
Paul Groth: Data Analysis in a Changing Discourse: The Challenges of Scholarl...Paul Groth: Data Analysis in a Changing Discourse: The Challenges of Scholarl...
Paul Groth: Data Analysis in a Changing Discourse: The Challenges of Scholarl...
 
Christophe Gueret: Publish Web data - an interactive session
Christophe Gueret: Publish Web data - an interactive sessionChristophe Gueret: Publish Web data - an interactive session
Christophe Gueret: Publish Web data - an interactive session
 
Almila Akdag Salah: Looking at classification systems from the point of view ...
Almila Akdag Salah: Looking at classification systems from the point of view ...Almila Akdag Salah: Looking at classification systems from the point of view ...
Almila Akdag Salah: Looking at classification systems from the point of view ...
 
Toby Burrows: Vernacular Classification: Knowledge Organization in the Humani...
Toby Burrows: Vernacular Classification: Knowledge Organization in the Humani...Toby Burrows: Vernacular Classification: Knowledge Organization in the Humani...
Toby Burrows: Vernacular Classification: Knowledge Organization in the Humani...
 
Valentine Charles: Linking cultural heritage with KOS: the Europeana example
Valentine Charles: Linking cultural heritage with KOS: the Europeana example Valentine Charles: Linking cultural heritage with KOS: the Europeana example
Valentine Charles: Linking cultural heritage with KOS: the Europeana example
 
Richard Smiraglia: Empirical methods for knowledge evolution across Knowledge...
Richard Smiraglia: Empirical methods for knowledge evolution across Knowledge...Richard Smiraglia: Empirical methods for knowledge evolution across Knowledge...
Richard Smiraglia: Empirical methods for knowledge evolution across Knowledge...
 
Aida Slavic Managing KOS: Evolution of concepts and their representation
Aida Slavic Managing KOS: Evolution of concepts and their representationAida Slavic Managing KOS: Evolution of concepts and their representation
Aida Slavic Managing KOS: Evolution of concepts and their representation
 
Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organi...
Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organi...Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organi...
Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organi...
 

Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

  • 1. Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research Joseph T. Tennis University of Washington Evolution and Variation of Classification Systems KnoweScape Workshop March 4-5, 2015 Amsterdam
  • 2. The question before us What is the nature of the evolution* and variation among knowledge organization systems (KOS)? Corollary questions Is this a simple space or a complex space? How often does it change? Can we engender a common vocabulary to describe this space? *NB: evolution can be considered a loaded term by some – that is it could be interpreted as fit for survival, and that is not what is intended here. I often use change in lieu of evolution to clarify this.
  • 3. The question before us There are very practical reasons why we want to ask this question. Interoperability (sometimes called alignment*) With widespread, yet still hopeful, collaboration across cultural heritage sectors – those with rich KOS, and with further development across a range of sectors we must understand this problem of how KOS interoperate, clarify its pressing issues, and perhaps even incorporate this into formal education. *Alignment in my mind suggests more similarities than differences, and this seems presumptuous
  • 4. The question before us There are very practical reasons why we want to ask this question. Digital Preservation Digital preservation is not simply the storage of material on hard disk it is also the system of policies and practices that guarantee digital material a usable future. In service of that goal we need to understand changes in our KOS.
  • 5. The question before us There are very practical reasons why we want to ask this question. Application Variations (repurposing) By examining evolution and variety we can also better evaluate particular applications of KOS. It is one thing to study the standard, the ideal type, of the KOS, but it is another see how different institutions, sectors, and projects install and perhaps alter that ideal type.
  • 6. The question before us I have, elsewhere, called the examination of these phenomenon, how we change KOS change over time and repurpose them, as second-order problems [0]. The same is true for designing for KOS interoperability. This is because, in my mind, the first order is how to design the KOS ex nihilo. And in many ways we understand this problem of KOS design.
  • 7. The question before us So we are left to examine this universe of KOS, how it changes, and the aspects of its variety. Now we can frame the question, and establish what, from my perspective, we know at this point. We can then outline ways forward both in research and development.
  • 8. Outline KOS as the product of problem-solving Design of Metadata and Indexing Languages Metadata in the Wild Time and Variety Population Perspective and a Metadata Observatory
  • 9. KOS as the Product of Problem-Solving
  • 10. KOS as the Product of Problem-Solving Ben Good has claimed that we are no in a Cambrian Age of KOS [1]. In this context many different folks are trying to solve the information organization problem. Each of them has approached it from their perspective, disciplinary biases, and using tools they are familiar with (e.g., library classification, Protégé, web browser bookmarks).
  • 11. This is the first reason we might take a population perspective in the study of metadata. Namely, we expect variety. For example, there was some debate in the late 90s on whether or not ontologies were the reinvention of classification. Vickery took this up in 1997 and Soergel in 1999, with Gilchrist taking a bird’s eye view in 2003 [2, 3, 4]. KOS as the Product of Problem-Solving
  • 12. As we read these accounts it becomes clear that there are differences that make a difference. And we are still discussing these concepts, from various perspectives, in the literature (cf., Barcellos Almeida, 2013 [5]). And it is true that many think that such variety is nothing by reinvention – recasting old concepts and practices in new language. Michael Gorman is one of these folks [6, 7]. KOS as the Product of Problem-Solving
  • 13. And yet, there are contexts where there is no difference made. KOS as the Product of Problem-Solving Are there differences made in this LOV service?
  • 14. I have done some work on this problem. I will introduce it a bit later. I have called it framework analysis [8, 9, 10]. Suffice it to say here, that I believe it is useful and generous to consider the program of creating KOS as problem solving done by many folks in many contexts. KOS as the Product of Problem-Solving
  • 15. Design of Metadata and Indexing Languages – First Order KOS Work
  • 16. Design of Metadata and Indexing Languages Metadata Machine and human readable assertions about resources. Indexing Languages A set of representations, that is systematically ordered, that provides access to the conceptual content, and indicates or establishes relationships, between terms to denote concepts and between natural language and terms used to denote concepts
  • 17. Design of Metadata and Indexing Languages Indexing languages are, in my mind, the superclass under which thesauri, classification schemes, ontologies, taxonomies of various sorts hang. Having said that, indexing languages can and are used for other things than indexing. But we’ll not take that up in this talk.* Soergel [3] offers a good starting list of functions. *But these may be of interest to studying a wide variety of metadata – articulating fully their purposes
  • 18. Design of Metadata and Indexing Languages Metadata, confusingly, sometimes simply refers to one subset of KOS or sometimes to the whole universe of KOS. This requires that we further clarify the form and function that we assume we find in the universe of KOS. NB: KOS in my mind is both metadata AND indexing languages.
  • 19. Design of Metadata and Indexing Languages For me, metadata is human and machine readable assertions about resources, where resources are the W3C definition of anything with an identity. Your definition may differ, and that is perhaps part of our building a common vocabulary. So let’s discuss. However, I do not find it important to retrofit non- machine readable description into the definition of metadata. It has its own names (e.g., cataloguing).
  • 20. Design of Metadata and Indexing Languages It has been helpful in the context of Dublin Core Metadata work to clarify between schemes and schemas. These are naïve distinctions, if you will, made of convenience, and so through more thorough research may be revised; but in this context it is helpful, I think to distinguish between the attributes of a resource and values you might use to describe that resource.
  • 21. Design of Metadata and Indexing Languages Attribute: Value Author: Joseph T. Tennis Subject: Evolution of KOS Drawn from a schema: Drawn from a scheme (or not) We may find these don’t work well in some contexts, but let’s try it out for now.
  • 22. Design of Metadata and Indexing Languages Review Metadata Indexing Languages KOS Schemas Schemes
  • 23. Design of Metadata and Indexing Languages There is a large literature on the right way to design metadata and indexing languages. There is good reason for this, and it is a useful body of literature. For one thing it is not as straightforward as one might assume to construct an indexing language.
  • 24. Design of Metadata and Indexing Languages Whether one consults the literature or not, the result of trying to solve problems in information organization results in some form of KOS. And they are out there. Multiplying and evolving.
  • 26. Metadata in the Wild If we take away the research on the design of KOS, we are left with the literature that describes how it is implemented, maintained, and evaluated. We are also left with literature that reads KOS in particular ways.
  • 27. Metadata in the Wild In both of these cases we are talking about metadata in the wild. In 2005 we saw a declaration in the form of a call for papers by Jack Andersen of the then Royal School calling for, what I now term, a descriptive turn in knowledge organization research.
  • 28. Metadata in the Wild He said, “Much classification research, and knowledge organization research in general, has tended to be concerned with rules, principles, standards or techniques; that is, with prescriptive issues. This workshop will focus on descriptive issues,” [11].
  • 29. Metadata in the Wild Of course we had seen work well before this time that could be described as descriptive rather than prescriptive as well. We could cite Richardson’s bibliography from 1901 or earlier works that inventoried extant schemes [12, 13].
  • 30. Metadata in the Wild And Bowker and Star have been famously critical of decisions of classification as infrastructure – where professional work around changing what was there or in faithfully representing controversial topics is seen as compromise and therefore fruitful for investigation. For example, representing the full range of nurses work from medical procedures to counseling is not straightforward [14].
  • 31. Metadata in the Wild And finally, both Melanie Feinberg’s work and Melissa Adler’s work, while quite different, provide us ways in which we can read KOS as authored rhetorical arguments or institutions of dominance, power, and instruments that promulgate particular worldview if not prejudice, respectively [15, 16] NB: Both at Local/Global Knowledge Organization Workshop in Copenhagen in August
  • 32. Metadata in the Wild And it is in this context, that we again ask the question and its corollaries. What is the nature of the evolution and variation among knowledge organization systems (KOS)? Corollary questions Is this a simple space or a complex space? How often does it change? Can we engender a common vocabulary to describe this space?
  • 33. Metadata in the Wild And it is here that we can begin to discuss what has been done and how we might go forward.
  • 35. Time and Variety Time I think it is safe to assume that we all know that KOS change over time. We revise, edit, sunset, phoenix, and otherwise rework our schemas and schemes. I have been curious about this since 2002.
  • 36. Time and Variety In an ISKO paper I looked at the entry from EUGENICS relative index of the DDC at two points in time, at edition 16 and edition 20. This simple case study was enough to demonstrate there is sometimes dramatic change in long-lived large indexing languages. I wanted to learn more.
  • 37. Time and Variety For those that do not know, EUGENICS is the body of knowledge and the practice of creating better human beings through selective breeding and sterilization measures. It was once considered, by the DDC to be a biological science. It is now a widely debunked science, but the term persists in many different contexts (even legitimate scientific ones).
  • 38. Time and Variety It makes sense that if I was curious to see how indexing languages (schemes) change over time I could use this example and a couple of other subjects to see how things change. To that end I began data collection. This took a village, but it was fun and worth the effort.
  • 39. Time and Variety We reviewed all editions of DDC for Eugenics and Anatomy* We identified where in the classification we could find these subjects from 1876-2010. These were often in different places (because of the nature of DDC – variety cue!), but it showed us where cataloguers might put books on these subjects. *Among others, like Gypsies, Algebra, Woman, Civil Disobedience, etc.
  • 40. Time and Variety DDC 1911 Ed. 7 DDC 1979 Ed. 19
  • 41. Time and Variety The second set of data were gathered using Z39.50 protocol, harvesting MARC records from 572 catalogues that both (1) used EUGENICS or ANATOMY as a first subject heading (in the 650 field of the MARC record, the subject added entry for topics) [17], and (2) used the DDC in the 082 field of the MARC record. After automatically removing duplicate records we were left with c. 927 records for EUGENICS and c. 1965 for ANATOMY.
  • 43. Time and Variety Combining this data would give us insight into where some cataloguers were putting books on EUGENICS and ANATOMY.
  • 44. Time and Variety A note about data, and this data specifically is that it is MESSY and we do not necessary trust our sources. So at best this is an exploratory look at this phenomenon and we should improve on methods of data collection and analysis.
  • 45. Time and Variety In this dataset we have Date derived from LCCN DDC class number Date of publication Date of publication cleaned (removing c. etc.) Year differing between LCCN date and pub. date Title Server Abridged notation present or not Classification edition number if present Record from Library of Congress? Total count of identical records
  • 46. Time and Variety In this dataset we have Date derived from LCCN DDC class number Date of publication Date of publication cleaned (removing c. etc.) Year differing between LCCN date and pub. date Title Server Abridged notation present or not Classification edition number if present Record from Library of Congress? Total count of identical records DDC edition date DDC classes possible Discontinued classes See alsos Edition number Notes
  • 47. Time and Variety We can now line these two datasets up and explore our question about subject change over time. That is, we can see its ontogeny. Ontogeny is the totality of changes of an individual of a species from conception to full maturation.
  • 61. Time and Variety There are many questions that can be asked of this data and I will be talking more about this tomorrow. I have some things here in appendixes if we have time. I can also provide citations.
  • 62. Time and Variety Variety Now we can talk about variety in this context. This is a harder problem for me, because there may be infinite ways we describe variety in KOS.
  • 63. Time and Variety Let’s take a (potentially) simple example. What is the difference and similarity between Descriptor Set (Mooers) Thesauri Classification Schemes Schemes for Classification (Ranganathan) Ontologies Folksonomies?
  • 64. Time and Variety In the past I have looked at this in two ways. By establishing a hierarchical or nested method of comparative analysis Through exploratory naïve linguistic expression.
  • 65. Time and Variety I have tried to establish rubrics or frameworks whereby we could lay various standards of KOS against. These frameworks include [18]: Structure Work Practices Discourse
  • 66. Time and Variety Elsewhere, Elin Jacob and I say, “The structure of a social tagging system, a metadata scheme, or an indexing language must be understood within the framework in which it occurs. The information organization framework itself is comprised of three distinct but interrelated components: the discourse that establishes the goals, priorities and values of the system; the work practices involved in the application and maintenance of the system; and the structure that instantiates both the discourses underlying the framework and the work practices that make it visible,” [10].
  • 67. Time and Variety Elsewhere, Elin Jacob and I say, “For example, ontology curation (or engineering) is an information organization framework, and the Gene Ontology (GO) is a specific instance of ontology curation. The discourses revolving around GO reflect the fact that its work practices are focused on representation of the natural (or biological) world; and the structure of GO is therefore informed by this scientific and representationalist focus and the work practices and discourses that follow from that focus.” [10].
  • 68. Time and Variety In an earlier project trying to make sense of the then popular social tagging work (folksonomies), I tried to compare that work to cataloguing in a similar way.
  • 70. Time and Variety The second way I have tried to characterize similarities and differences has been with naïve linguistic expression. In this exercise, Ben Good and I were trying to see if there was a way to quantify a gold standard of indexing languages, such that through automatic inspection we could assess and modify those that were not satisfactory. I must say that I was not convinced this was the right way to go, but I was curious about what clusters would form and why when we reduced all indexing languages to a bag of terms and ran analysis over them.
  • 72. Time and Variety Excerpt from [1] 0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9   1   %  OLP  uniterms:   %  OLP  duplets:   %  OLP  triplets:   %  OLP  quadplus:   OLP  flexibility:   %  containsAnother:   %  containedByAnother:   Number  disInct  terms:   Mean  Term  Length   Max  Term  Length   Min  Term  Length   Median  Term  Length   Standard  DeviaIon  -­‐  Term   Length   Skewness  -­‐  Term  Length   Coefficient  of  variaIon  -­‐   Term  Length   OLP  max  number  sub   terms  per  term   OLP  mean  number  sub   terms  per  term   OLP  median  number  sub   terms  per  term   21  Connotea  
  • 73. Time and Variety Excerpt from [1] 0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9   1   %  OLP  uniterms:   %  OLP  duplets:   %  OLP  triplets:   %  OLP  quadplus:   OLP  flexibility:   %  containsAnother:   %  containedByAnother:   Number  disInct  terms:   Mean  Term  Length   Max  Term  Length   Min  Term  Length   Median  Term  Length   Standard  DeviaIon  -­‐  Term   Length   Skewness  -­‐  Term  Length   Coefficient  of  variaIon  -­‐   Term  Length   OLP  max  number  sub   terms  per  term   OLP  mean  number  sub   terms  per  term   OLP  median  number  sub   terms  per  term   16  CHEBI  
  • 74. Time and Variety Excerpt from [1] 0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9   %  OLP  uniterms:   %  OLP  duplets:   %  OLP  triplets:   %  OLP  quadplus:   OLP  flexibility:   %  containsAnother:   %  containedByAnother:   Number  disInct  terms:   Mean  Term  Length   Max  Term  Length   Min  Term  Length   Median  Term  Length   Standard  DeviaIon  -­‐  Term   Length   Skewness  -­‐  Term  Length   Coefficient  of  variaIon  -­‐  Term   Length   OLP  max  number  sub  terms   per  term   OLP  mean  number  sub  terms   per  term   OLP  median  number  sub   terms  per  term   1  MeSH  PrefLabels    
  • 75. Time and Variety Excerpt from [1] 0   0.2   0.4   0.6   0.8   1   %  OLP  uniterms:   %  OLP  duplets:   %  OLP  triplets:   %  OLP  quadplus:   OLP  flexibility:   %   %   Number  disInct   Mean  Term  Length   Max  Term  Length   Min  Term  Length   Median  Term   Standard  DeviaIon   Skewness  -­‐  Term   Coefficient  of   OLP  max  number   OLP  mean  number   OLP  median   20  Bibsonomy   0   0.2   0.4   0.6   0.8   1   %  OLP   %  OLP  duplets:   %  OLP  triplets:   %  OLP   OLP  flexibility:   %   %   Number  disInct   Mean  Term   Max  Term   Min  Term   Median  Term   Standard   Skewness  -­‐   Coefficient  of   OLP  max   OLP  mean   OLP  median   21  Connotea   0   0.2   0.4   0.6   0.8   1   %  OLP  uniterms:   %  OLP  duplets:   %  OLP  triplets:   %  OLP  quadplus:   OLP  flexibility:   %  containsAnother:   %   Number  disInct   Mean  Term  Length   Max  Term  Length   Min  Term  Length   Median  Term  Length   Standard  DeviaIon  -­‐   Skewness  -­‐  Term   Coefficient  of   OLP  max  number  sub   OLP  mean  number   OLP  median  number   22  CiteUlike  
  • 76. Time and Variety I do not know if these means anything, but I have kept collecting similar data. I have about 36 single versions of this data. Including English dictionaries. And here is where the two come together. I need multiple versions to make sense of this over time.
  • 78. Population Perspective and a Metadata Observatory I have tried to demonstrate through my past research that there is sufficient reason to investigate KOS from a population perspective. We have a wide range of standards, types, and a potentially even wider range of implementations that change over time. In order for us to better understand this universe I believe we need to work toward a metadata observatory.
  • 79. Population Perspective and a Metadata Observatory Like scanning the night sky for different instances of blue dwarf stars or gassy giant planets, we can look for various instances of schemes and schemas. We can then see how they change over time. How they are similar to or different from others. Currently I’m interested in wikipedia’s category system and its nature and changes. I’m also interested in building a view of all the DDC numbers in use. There would be a lot we could see from a metadata observatory.
  • 80. The question before us What is the nature of the evolution* and variation among knowledge organization systems (KOS)? Corollary questions Is this a simple space or a complex space? How often does it change? Can we engender a common vocabulary to describe this space? *NB: evolution can be considered a loaded term by some – that is it could be interpreted as fit for survival, and that is not what is intended here. I often use change in lieu of evolution to clarify this.
  • 81. Population Perspective and a Metadata Observatory Possible features of this observatory might be: Real Time Metadata Feeds Metadata Viz Run Analysis on Metadata Metadata Maps (geographic and conceptual) Upload Your Metadata Version Comparisons
  • 82. Thank you jtennis@uw.edu Joseph T. Tennis University of Washington Evolution and Variation of Classification Systems KnoweScape Workshop March 4-5, 2015 Amsterdam
  • 83. Appendix A. Time and Variety Now that we have these visualizations in our minds (perhaps), we can talk about Semantic Gravity Collocative Integrity
  • 84. Appendix A. Time and Variety Semantic Gravity Cataloguer privileges collection over updated scheme (theory) Collocative Integrity Degree to which scheme comports with cataloguing practice
  • 86. Appendix A. Time and Variety 0%   20%   40%   60%   80%   100%   1899   1911   1913   1919   1922   1927   1932   1942   1951   1958   1965   1971   1979   1989   1991   2003   Anatomy   Old   Out   In   0%   20%   40%   60%   80%   100%   1899   1911   1913   1915   1919   1922   1927   1932   1942   1951   1958   1965   1971   1979   1989   1996   2003   Eugenics   Old   Out   In   [19]
  • 87. Appendix A. Time and Variety 0%   20%   40%   60%   80%   100%   1899-­‐2003   Eugenics   Old   Out   In   0%   20%   40%   60%   80%   100%   1899-­‐2003   Anatomy   Old   Out   In   [19]
  • 88. References 0 Tennis, J. T. (2010). Form, Intention, and Indexing: The Liminal and Integrated Conceptions of Work in Knowledge Organization. In Advances in Classification Research. Vol. 21. Available: http://journals.lib.washington.edu/index.php/acro/issue/archive 1 Good, B. M. & Tennis, J. T. (2009). Term based comparison metrics for controlled and uncontrolled indexing languages. In Information Research 14(1). Available: http://www.informationr.net/ir/14-1/paper395.html 2 Vickery, B. V. (1997). Ontologies. In Journal of Information Science 23(4): 277-286. 3 Soergel, D. (1999). The rise of ontologies or the reinvention of classification. In JASIST 50(12): 1119-1120. 4 Gilchrist, A. (2003). Thesauri, taxonomies and ontologies – an etymological note. In Journal of Documentation 59(1): 7-18. 5 Barcellos Almeida, M. (2013). Revisiting Ontologies: A Necessary Clarification. In JASIST 64(8): 1682-1693. 6 Gorman, M. (1990). A Bogus and Dismal Science; or, the Eggplant That Ate Library Schools. In American Libraries 21(5): 463-465. 7 Gorman, M. (1999). Metadata or cataloguing? In Journal of Internet Cataloging 2: 5-22.
  • 89. References 8 Tennis, J. T. (2006). Comparative Functional Analysis of Boundary Infrastructures, Library Classification, and Social Tagging. In Information Science Revisited: Approaches to Innovation. Proceedings of the Annual Meeting of the Canadian Association for Information Science/L'Association canadienne des sciences de l'information. York University, Toronto. 9 Tennis, J. T. (2006). Function, Purpose, Predication, and Context of Information Organization Frameworks. In Knowledge Organization for a Global Learning Society: Proceedings of the 9th International Conference for Knowledge Organization. International Society for Knowledge Organization 9th International Conference. (Vienna, Austria. Jul, 2006). Advances in Knowledge Organization vol 10. Ergon. Würzburg: 303-310. 10 Tennis, J. T. and Jacob, E. K. (2008). "Toward a Theory of Structure in Information Organization Frameworks." (2008). In Culture and Identity in Knowledge Organization: Proceedings of the 10th International Conference for Knowledge Organization. (Montreal, Quebec August 5-8, 2008). Advances in Knowledge Organization vol. 11. Ergon: Würzburg: 262-268. 11 Andersen, J. (2005). Call for papers. 16th ASIS&T SIG-CR Classification Research Workshop, 2005, “What knowledge organization does and how it does it: Critical Studies in and of Classification and Indexing.” Available: http://dhhumanist.org/Archives/ Virginia/v18/0597.html
  • 90. References 12 Ricahrdson, E. C. (1901). Classification: Theoretical and Practical. Scribner’s Sons. 13 Horne, T. H. (1825). Outlines for the classification of a library; respectfully submitted to the consideration of the trustees of the British Museum. G. Woodfall. 14 Bowker, G. and Star, S. L. (2000). Sorting Things Out: Classification and Its Consequences. MIT Press., 15 Feinberg, M. (2011). How information systems communicate as documents: the concept of authorial voice. Journal of Documentation 67(6), 1015-1037. 16 Adler, M. (2015). Broker of Information, the “Nation’s Most Important Commodity”: The Library of Congress in the Neoliberal Era. In Information and Culture 50(1): 24-50. 17 Library of Congress. (2007). 650-Subject Added Entry –Topical Term. http:// www.loc.gov/marc/bibliographic/bd650.html [18] Tennis, J. T. (2006). Social tagging and the next steps for indexing. In Advances in classification research, Vol. 17: Proceedings of the 17th ASIS&T SIG/CR Classification Research Workshop (Austin, TX, November 4, 2006), ed. Jonathan Furner and Joseph T. Tennis. [19] Tennis, J. (2013). Collocative Integrity and Our Many Varied Subjects: What the Metric of Alignment between Classification Scheme and Indexer Tells Us About Langridge’s Theory of Indexing. NASKO, 4(1). Retrieved from http:// journals.lib.washington.edu/index.php/nasko/article/view/14660