1. Networked Digital Library through
Harvesting: The Future of Digital
Archiving
Barnali Roy Choudhury and Dr. Parthasarathi Mukhopadhyay
Department of Library and Information Science
The University of Burdwan,
Burdwan – 713 104
2. DIGITAL LIBRARY
A digital library is a library in which collections are stored in digital
formats (as opposed to print, microform, or other media) and
accessible by computers.[1] The digital content may be stored
locally, or accessed remotely via computer networks.
(Wikipedia)
The DELOS Digital Library Reference Model[2] defines a digital library
as:
An organization, which might be virtual, that comprehensively
collects, manages and preserves for the long term rich digital
content, and offers to its user communities specialized functionality
on that content, of measurable quality and according to codified
policies.
4. Networked Digital Library
An entity that collects metadata in a
central place from selected Dls for
providing centralized searching
5. OBJECTIVES
To harvest metadata in a single window (centralized
search facility) from different OAI/PMH repositories
related to LIS;
To design union catalogue of scholarly objects through
harvesting (by using OAI/PMH protocol, PKP open
source harvesting software on LAMP architecture);
and
To provide comprehensive search facilities to end
users of LIS domain for accessing scholarly objects
(search metadata locally and access full-text
globally).
6. CRITERIA for DL selection
Selection of a particular domain
Selection of most efficient and effective dataset
Selected data are OAI/PMH compatible or not
7. Open Access Institutional
Digital Repository
Institutional Digital Repositories (IDRs) are digital collections that organize,
preserve, and make accessible the intellectual output of a single institution
or a group of related institutions (Crow, 2002).
A typical IDR has following attributes
Open-access Repositories allow author/ right holders to deposit their articles
May allow preprints (pre-published manuscripts)
Normally allow post-prints (peer-reviewed and published articles)
Most reputed academic publishers allow authors to deposit some version of
their articles in such
repositories (http://romeo.eprints.org/stats)
10. IDRs in LIS domain
Directory for Open Access Repositories (www.opndoar.org)
lists
around 51 open access repositories
among them 43 are in English language;
24 are only LIS & IT related;
18 are OAI/PMH compatible.
In English, ELIS consist of highest no. of records i.e, 9565
Registry of Open Access Repositories (roar. eprints.org) lists
around 6
institutional repositories among them 5 are OAI/PMH
compatible.
allow us to search & list open access
repositories by subject, country and content type.
11. Cross Collection
Interoperability
These repositories allows submission of scholarly materials
globally (i.e cross-institutional) by extensive uses of two
interoperability standards
Z39.50 is a protocol for distributed search services;
OAI/PMH deals with metadata harvesting
12. What is OAI/PMH
1.
The OAI/PMH is a light-weight standard protocol for harvesting
metadata records from ‘data providers’ to ‘service providers’
2.
It provides some rules to harvest the metadata of a repository not
the full content.
3.
The content should be retrieve form source repository allows
‘service provider’ to say ‘give me some or all of your metadata
records’
4.
Based on HTTP and XML
5.
Simply carries metadata
6.
Mandates simple DC as record format
but extensible to any XML format – IEEE LOM, ONIX, MARC,
METS, MPEG-21, etc.
13. HOW OAI WORKS?
OAI “VERBS”
Identify
ListMetadataFormats
ListSets
ListIdentifiers
ListRecords
GetRecord
H
HTTP Request
A
(OAI Verb)
R
V
E OAI
S
T
HTTP Response
E
(Valid XML)
R
R
E
P
O
OAI S
I
T
O
R
Y
15. LAMP related activities
The prototype harvesting framework developed at
Department of LIS, The University of Burdwan, named as
UniLIS, is based on open source software and open
standards.
It uses LAMP architecture as base,
Linux (Ubuntu 9.10)as operating system,
Apache (2.2.8) as Web server,
MySQL (5.0.0) as RDBMS, and
PHP version 5.X as harvesting tool
Linking PHP with Apache & MySQL
16. Harvester related activities
The requirements of PKP harvester are as follows –
PHP >= 4.2.x (including PHP 5.x); Microsoft IIS
requires PHP 5.x
MySQL >= 3.23.23 (including MySQL 4.x/5.x)
Apache >= 1.3.2x or >= 2.0.4x or 2.0.5x /Microsoft
IIS 5.x or 6.x
Operating system: Any OS that supports the above
software, including Linux, BSD, Solaris, Mac OS X, Windows
(preferably NT based Windows flavors)
17. Harvester related activities
This group includes two major tasks –
•
Installation of PKP harvester requires
a) login name and password for system administrator
(root user)
b) database details (name of the MySQL database, user
of database and password of the database user)
18. Harvester related activities
ii) Configuration of PKP harvester
a) site management (configuration of site specific details,
language, crosswalk, plug-in and reading tools);
b) Archives (creation of archives, managing created
archives); and
c) other administrative functions (layout, customization
etc.).
22. IDRs related requirements
Name of open access repositories LDL Librarians Digital Library
Sponsoring Institute
Documentation Research and
Training Centre (DRTC), Indian
Institute, Bangalore centre (ISI).
India.
No of records
249 items (2009-03-13)
Software in use
Dspace
URL of the repository
https://drtc.isibang.ac.in
OAI/PMH base URL
http://drtc.isibang.ac.in/oai/requ
est
Document type
Articles; Conferences; Theses;
Multimedia
Language
English, Hindi, Kannada
32. UniLIS repository
Presently it includes 5 large-scale open access repositories
in LIS domain.
In future it is going to include LIS specific open access
journals, ETDs and other open access repositories for the
purpose of developing a comprehensive local search service
for open access resource in the domain of LIS.