Usually, important catalogs are accessed for copy-cataloguing whole records. It is possible to retrieve "atomic" information too, using unique keys like ISBN.
Library at Pontificia Università della S. Croce developed a tool that allows Dewey retrieval and insertion into bibliographic records, in bulk mode as well as in single record mode, i.e. during cataloguing.
During the bulk process, Dewey classification was added to about 20,000 records, retrieving it from OCLC, Library of Congress and some national libraries, up to 7 external sources.
The single record mode was integrated into the Koha ILS, to make easier to assign Dewey classification during cataloguing.
Catalog enrichment: importing Dewey Decimal Classification from external sources (slides)
1. Stefano Bargioni
Pontificia Università della Santa Croce
Catalogue enrichment: importing
Dewey Decimal Classification
from external sources
Oct 18, 2013
ADLUG 2013
1
2. The project
●
Improving the Dewey search path
–
–
●
●
with a minimal effort
while adding BNCF compliant subject headings to our
catalog
Koha 3 <http://koha-community.org> open source
ILS
Can be applied to other ILS's
Oct 18, 2013
ADLUG 2013
2
3. Version 1: The Batch Mode
●
Add Dewey notations to the catalog
–
automatically
–
from selected sources
–
ensure quality and uniformity
Oct 18, 2013
ADLUG 2013
3
4. An atomic copy cataloguing
●
●
copy cataloguing is usually related to the full record
we only need to copy field 082 (MARC21) or 676
(Unimarc)
●
ISBN unique identifier
●
the policy issue
Oct 18, 2013
ADLUG 2013
4
5. Records to be modified
●
without Dewey notation
●
with ISBN
●
limit: 008 language
–
SELECT biblionumber, ISBN
FROM biblio
WHERE ISBN_present
AND dewey_absent
AND language_008='...'
Oct 18, 2013
ADLUG 2013
In
Ko
cla ha,
My use i the W
Ex
tra SQ s ba HE
on ctV L
s
fie alu fun ed o RE
ld
e, t ctio n
thr bibl ha n
ou io. t w
exp gh X ma ork
res Pa rcxm s
sio th
l
ns
5
6. Dewey Sources (I)
●
a choice based on copy cataloguing experience
●
OCLC Classify
●
some National Libraries
●
API, Z39.50 or HTML access
Oct 18, 2013
ADLUG 2013
6
7. Dewey Sources (II): OCLC Classify
●
●
●
Classify is a FRBR-based prototype designed to support the assignment of classification
numbers and subject headings for books, DVDs, CDs, and other types of materials.
This project applies principles of the FRBR model to aggregate bibliographic information
above the manifestation level. Bibliographic records are grouped using the OCLC FRBR
Work-Set algorithm to form a work-level summary of the class numbers and subject headings
assigned to a work. You can retrieve a summary by ISBN, ISSN, UPC, OCLC number,
author/title, or subject heading.
The Classify database is accessible through a user interface and as a machine-to-machine
service. The database provides access to more than 36 million WorldCat records that contain
Dewey Decimal Classification (DDC) numbers,[...].
●
Retrieved information is in XML format.
●
http://www.oclc.org/research/activities/classify.html?urlm=159746
Oct 18, 2013
ADLUG 2013
7
8. Dewey Sources (III): National Libraries
LC
Library of Congress
(any)
MARC
BNF
Bibliothèque nationale de France
(fre)
MARC
DNB
Deutsche Nationalbibliothek
(ger)
HTML
BNCF
Biblioteca Nazionale Centrale di Firenze
(ita)
HTML
BNCR
Biblioteca Nazionale Centrale di Roma
(ita)
HTML
BNB
British National Bibliography
(eng)
MARC
Oct 18, 2013
ADLUG 2013
8
9. The logic used in the programs
●
open the connection to the bibliographical database
●
obtain the ISBN from records without a Dewey number
●
open the connection to the Dewey source, if Z39.50
●
for each ISBN
●
query the data source using the current ISBN
●
if a Dewey number is available in the response
●
if the Dewey number passes quality control
●
update the bibliographical record
●
wait to avoid overloading
●
close the connection to the Dewey source, if Z39.50
●
close the connection to the bibliographical database
Oct 18, 2013
ADLUG 2013
9
10. Quality check
●
Catalogs contain errors
●
DDC has many editions
●
Our old Dewey numbers start from edition 19
●
Indicators
●
Lot of discarded Dewey...
●
… but we moved from 40,000
to 60,000 records with Dewey number
Oct 18, 2013
ADLUG 2013
+5
0%
10
11. Delay while searching sources
●
Continuous searching can suffocate remote servers
–
–
●
●
robots.txt
policies for crawlers
Continuous indexing can overload your server
Wait a few seconds between searches or group of
searches
–
this will slow the harvesting process
Oct 18, 2013
ADLUG 2013
11
12. Statistics
Source
Language
Dewey #
not found
Dewey #
discarded
Classify
all
42387
10267
5321
6607
20059
LC
all
31999
1252
21195
8562
1011
BNF
all
30903
2253
21327
7268
55
DNB
ger
4193
163
3867
163
0
BNCF
ita
12017
4088
3643
3542
744
BNCR
ita
7549
1515
3003
2978
53
BNB
eng
6215
193
5449
55
518
Total
Oct 18, 2013
Records
Scanned
Records
Modified
ISBN not
found
Several
works
with
same
ISBN
8240
ISBN
incorrect
133
19710
ADLUG 2013
12
13. Browsing Dewey Index
Besides author, uniform
titles and subject
headings, our OPAC
offers a path of semantic
search based on the
Dewey classification
number
Oct 18, 2013
ADLUG 2013
13
14. Software
●
Query programs were written in Perl language, making
use of the Koha API and the following libraries
available on CPAN:
–
LWP for HTTP connections
–
ZOOM for Z39.50 connections
–
DBI for connections to the MySQL database
–
XML::XPath for XML data processing
–
WWW::Scraper for HTML data processing
–
MARC::Record for MARC records processing
Oct 18, 2013
ADLUG 2013
14
15. A scientific article
●
●
published on JLIS.it at
http://leo.cilea.it/index.php/jlis/article/view/8766
JLIS.it, Italian Journal of Library and information
science, is an academic journal of international
scope, peer-reviewed and open access
●
written with my cataloguers
●
doesn't deal with the dynamic component
Oct 18, 2013
ADLUG 2013
15
16. Version 2.0 - Single Record Mode
●
New record:
–
–
retrieve Dewey from important catalogs
–
●
enter the ISBN
choose and import the best one into the new record
Or upgrade an old record adding or modifying its
Dewey classification
Oct 18, 2013
ADLUG 2013
16
18. Conclusions
●
Increase of available bibliographic data on the net
●
Unique identifiers
–
–
●
ISBN, ISSN, ...
VIAF Id, ISNI, ...
Catalog enrichment
–
–
●
bibliographic records
authority records
Expose rich linked data
–
with coded information like Dewey
–
with standard IDs like iSBN, ISNI, ...
Oct 18, 2013
ADLUG 2013
18