Networked digital library through harvesting

Networked Digital Library through
Harvesting: The Future of Digital
Archiving

Barnali Roy Choudhury and Dr. Parthasarathi Mukhopadhyay
Department of Library and Information Science
The University of Burdwan,
Burdwan – 713 104

DIGITAL LIBRARY
A digital library is a library in which collections are stored in digital
formats (as opposed to print, microform, or other media) and
accessible by computers.[1] The digital content may be stored
locally, or accessed remotely via computer networks.
(Wikipedia)
The DELOS Digital Library Reference Model[2] defines a digital library
as:
An organization, which might be virtual, that comprehensively
collects, manages and preserves for the long term rich digital
content, and offers to its user communities specialized functionality
on that content, of measurable quality and according to codified
policies.

No traditional library is self sufficient;
No digital library is self sufficient;

Networked Digital Library

An entity that collects metadata in a
central place from selected Dls for
providing centralized searching

OBJECTIVES


To harvest metadata in a single window (centralized
search facility) from different OAI/PMH repositories
related to LIS;



To design union catalogue of scholarly objects through
harvesting (by using OAI/PMH protocol, PKP open
source harvesting software on LAMP architecture);
and



To provide comprehensive search facilities to end
users of LIS domain for accessing scholarly objects
(search metadata locally and access full-text
globally).

CRITERIA for DL selection

Selection of a particular domain
Selection of most efficient and effective dataset
Selected data are OAI/PMH compatible or not

Open Access Institutional
Digital Repository
Institutional Digital Repositories (IDRs) are digital collections that organize,
preserve, and make accessible the intellectual output of a single institution
or a group of related institutions (Crow, 2002).
A typical IDR has following attributes

Open-access Repositories allow author/ right holders to deposit their articles



May allow preprints (pre-published manuscripts)



Normally allow post-prints (peer-reviewed and published articles)



Most reputed academic publishers allow authors to deposit some version of
their articles in such
repositories (http://romeo.eprints.org/stats)

IDRs in LIS domain
Directory for Open Access Repositories (www.opndoar.org)
lists






around 51 open access repositories
among them 43 are in English language;
24 are only LIS & IT related;
18 are OAI/PMH compatible.
In English, ELIS consist of highest no. of records i.e, 9565
Registry of Open Access Repositories (roar. eprints.org) lists
around 6
institutional repositories among them 5 are OAI/PMH
compatible.
allow us to search & list open access
repositories by subject, country and content type.

Cross Collection
Interoperability
These repositories allows submission of scholarly materials
globally (i.e cross-institutional) by extensive uses of two
interoperability standards

Z39.50 is a protocol for distributed search services;
OAI/PMH deals with metadata harvesting

What is OAI/PMH
1.

The OAI/PMH is a light-weight standard protocol for harvesting
metadata records from ‘data providers’ to ‘service providers’

2.

It provides some rules to harvest the metadata of a repository not
the full content.

3.

The content should be retrieve form source repository allows
‘service provider’ to say ‘give me some or all of your metadata
records’

4.

Based on HTTP and XML

5.

Simply carries metadata

6.

Mandates simple DC as record format

but extensible to any XML format – IEEE LOM, ONIX, MARC,
METS, MPEG-21, etc.

HOW OAI WORKS?
OAI “VERBS”
Identify
ListMetadataFormats
ListSets
ListIdentifiers
ListRecords
GetRecord

H
HTTP Request
A
(OAI Verb)
R
V
E OAI
S
T
HTTP Response
E
(Valid XML)
R

R
E
P
O
OAI S
I
T
O
R
Y

METHODOLOGY OF DESIGING





LAMP related activities
Harvester related activities
Repository related activities
Development of repositories

LAMP related activities


The prototype harvesting framework developed at
Department of LIS, The University of Burdwan, named as
UniLIS, is based on open source software and open
standards.
It uses LAMP architecture as base,



Linux (Ubuntu 9.10)as operating system,



Apache (2.2.8) as Web server,



MySQL (5.0.0) as RDBMS, and



PHP version 5.X as harvesting tool
Linking PHP with Apache & MySQL

The requirements of PKP harvester are as follows –

PHP >= 4.2.x (including PHP 5.x); Microsoft IIS
requires PHP 5.x


MySQL >= 3.23.23 (including MySQL 4.x/5.x)



Apache >= 1.3.2x or >= 2.0.4x or 2.0.5x /Microsoft
IIS 5.x or 6.x



Operating system: Any OS that supports the above
software, including Linux, BSD, Solaris, Mac OS X, Windows
(preferably NT based Windows flavors)

This group includes two major tasks –
•

Installation of PKP harvester requires
a) login name and password for system administrator
(root user)
b) database details (name of the MySQL database, user
of database and password of the database user)

ii) Configuration of PKP harvester


a) site management (configuration of site specific details,
language, crosswalk, plug-in and reading tools);



b) Archives (creation of archives, managing created
archives); and



c) other administrative functions (layout, customization
etc.).

UniLIS
Burdwan

Department of LIS, The University of

UniLIS

Department of LIS, The University of Burdwan

IDRs related requirements
Name of open access repositories LDL Librarians Digital Library
Sponsoring Institute

Documentation Research and
Training Centre (DRTC), Indian
Institute, Bangalore centre (ISI).
India.

No of records

249 items (2009-03-13)

Software in use

Dspace

URL of the repository

https://drtc.isibang.ac.in

OAI/PMH base URL

http://drtc.isibang.ac.in/oai/requ
est

Document type

Articles; Conferences; Theses;
Multimedia

Language

English, Hindi, Kannada

UniLIS repository


Presently it includes 5 large-scale open access repositories
in LIS domain.



In future it is going to include LIS specific open access
journals, ETDs and other open access repositories for the
purpose of developing a comprehensive local search service
for open access resource in the domain of LIS.

Networked digital library through harvesting

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (7)

En vedette

En vedette (9)

Similaire à Networked digital library through harvesting

Similaire à Networked digital library through harvesting (20)

Plus de Netaji Subhas Open University

Plus de Netaji Subhas Open University (11)

Dernier

Dernier (20)

Networked digital library through harvesting