The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) provides a simple but effective mechanism for metadata harvesting. It allows service providers to aggregate content from data providers to build value-added services. The OAI-PMH uses HTTP and XML to share metadata in any agreed format, with Dublin Core as a baseline. It defines a set of verbs and standards for harvesting metadata from repositories in a consistent way. This interoperability has helped surface resources and build services across independently developed digital libraries.
Ontology and Ontology Libraries: a Critical StudyDebashisnaskar
The concept of digital library revolutionized its popularity with the development of networking technology. Digital library stores various kind of documents in digitized format that enables user smooth access to these documents at subsidized costs. In the recent past, a similar concept i.e., ontology library has gained popularity among the communities like semantic web, artificial intelligence, information science, philosophy, linguistics, and so forth.
This PPT contain details of Z39.50 and useful for Library Science students. This protocol used for information retrieval and in the end list of different types of protocols are given.
Ontology and Ontology Libraries: a Critical StudyDebashisnaskar
The concept of digital library revolutionized its popularity with the development of networking technology. Digital library stores various kind of documents in digitized format that enables user smooth access to these documents at subsidized costs. In the recent past, a similar concept i.e., ontology library has gained popularity among the communities like semantic web, artificial intelligence, information science, philosophy, linguistics, and so forth.
This PPT contain details of Z39.50 and useful for Library Science students. This protocol used for information retrieval and in the end list of different types of protocols are given.
A presentation on Interoperability in Digital Libraries by Rupesh Kumar A, Assistant Professor, Department of Studies and Research in Library and Information Science, Tumkur University, Tumakuru, Karnataka, India.
Software's now-a-days became the life line of modern day organizations. Libraries also need software if they want to create a parallel digital library with features which we may not find in a traditional library.
Information repackaging is a process to repackage the analyzed, consolidate information in that form which is more suitable & usable for library users. Customization of information taking into account the needs and characteristics of the individual or user groups and matching them with the information to be provided so that diffusion of information occurs.
There are various Information Literacy Standards & Models.
The Aim of these S&M are to enable persons to acquire the necessary competencies and become Information Literate citizens.
The Standards provide a means to provide key milestones for students and assess their skill level.
Relationship of information science with library scienceSadaf Batool
Relationship of information science with library science
Presentation by Sadaf Batool
MPhil 1st semester
Table of contents
1. Definition of information science
2. Definition of library science
3. Primary history of library
4. Primary history of information
5. Progress of library science as (Library and information science)
6. IS &LS concerned task
7. Relationship of Information science with library science
8. According to S.R Nathan’s five laws
9. Difference of Information science &Library science
10. Conclusion
11. References
Definition of information science
Information science is that discipline that investigates the properties and behavior of information, the forces governing the flow of information, and the means of processing information for optimum accessibility and usability.
It primarily concerned with the analysis, collection, classification, manipulation, storage, retrieval, movement, dissemination, and protection of information.
This includes the investigation of information representations in both natural and artificial systems, the use of codes for efficient message transmission, and the study of information processing devices and techniques such as computers and their programming systems.
It is an interdisciplinary science derived from and related to such fields as mathematics, logic, linguistics, psychology, computer technology, operations research, the graphic arts, communications, library science, management, and other similar fields. It has both a pure science component, which inquiries into the subject without regard to its application, and an applied science component, which develops services and products." (Borko, 1968, p.3The study of – the use of information, – its sources and development; – usually taken to refer to the role of scientific, industrial and specialized libraries and information units – in the handling and – dissemination of information. (Prytherch, 2005)
The systematic study and analysis of the – sources, – development, – collection, – organization, – dissemination, – evaluation, – use, and – management of information in all its forms, including the channels (formal and informal) and technology used in its communication. – –(Reitz, 2004) Definition of library science
The study of principles and practices of library care, and organization and administration of a library, and of its technical, informational, and reference services.
Library science as “a generic term for the study of libraries and information units, the role they play in society, their various component routines and processes, and their history and future development. (Harrods ‘Librarian’s Glossary)
Collection of reading material, its processing, organization and dissemination started with the advent of library. The knowledge and its implementation in respect of library may therefore be called library science.
The professional kn
Introduction to MARC
History (MARC to MARC 21)
Why MARC 21/Need of MARC 21
Characteristics
Design principle for MARC 21
MARC 21 Documentation
MARC 21Record System
MARC 21 Communication formats
MARC 21 Format for Bibliographic Data
Component of bibliographic record
Communication Standard
Mapping of MARC 21
MARC 21 Translation
Maintenance Agency
MARC 21 Regulation
Advantage of MARC 21
Problems with MARC 21
Future of MARC 21
A presentation on Interoperability in Digital Libraries by Rupesh Kumar A, Assistant Professor, Department of Studies and Research in Library and Information Science, Tumkur University, Tumakuru, Karnataka, India.
Software's now-a-days became the life line of modern day organizations. Libraries also need software if they want to create a parallel digital library with features which we may not find in a traditional library.
Information repackaging is a process to repackage the analyzed, consolidate information in that form which is more suitable & usable for library users. Customization of information taking into account the needs and characteristics of the individual or user groups and matching them with the information to be provided so that diffusion of information occurs.
There are various Information Literacy Standards & Models.
The Aim of these S&M are to enable persons to acquire the necessary competencies and become Information Literate citizens.
The Standards provide a means to provide key milestones for students and assess their skill level.
Relationship of information science with library scienceSadaf Batool
Relationship of information science with library science
Presentation by Sadaf Batool
MPhil 1st semester
Table of contents
1. Definition of information science
2. Definition of library science
3. Primary history of library
4. Primary history of information
5. Progress of library science as (Library and information science)
6. IS &LS concerned task
7. Relationship of Information science with library science
8. According to S.R Nathan’s five laws
9. Difference of Information science &Library science
10. Conclusion
11. References
Definition of information science
Information science is that discipline that investigates the properties and behavior of information, the forces governing the flow of information, and the means of processing information for optimum accessibility and usability.
It primarily concerned with the analysis, collection, classification, manipulation, storage, retrieval, movement, dissemination, and protection of information.
This includes the investigation of information representations in both natural and artificial systems, the use of codes for efficient message transmission, and the study of information processing devices and techniques such as computers and their programming systems.
It is an interdisciplinary science derived from and related to such fields as mathematics, logic, linguistics, psychology, computer technology, operations research, the graphic arts, communications, library science, management, and other similar fields. It has both a pure science component, which inquiries into the subject without regard to its application, and an applied science component, which develops services and products." (Borko, 1968, p.3The study of – the use of information, – its sources and development; – usually taken to refer to the role of scientific, industrial and specialized libraries and information units – in the handling and – dissemination of information. (Prytherch, 2005)
The systematic study and analysis of the – sources, – development, – collection, – organization, – dissemination, – evaluation, – use, and – management of information in all its forms, including the channels (formal and informal) and technology used in its communication. – –(Reitz, 2004) Definition of library science
The study of principles and practices of library care, and organization and administration of a library, and of its technical, informational, and reference services.
Library science as “a generic term for the study of libraries and information units, the role they play in society, their various component routines and processes, and their history and future development. (Harrods ‘Librarian’s Glossary)
Collection of reading material, its processing, organization and dissemination started with the advent of library. The knowledge and its implementation in respect of library may therefore be called library science.
The professional kn
Introduction to MARC
History (MARC to MARC 21)
Why MARC 21/Need of MARC 21
Characteristics
Design principle for MARC 21
MARC 21 Documentation
MARC 21Record System
MARC 21 Communication formats
MARC 21 Format for Bibliographic Data
Component of bibliographic record
Communication Standard
Mapping of MARC 21
MARC 21 Translation
Maintenance Agency
MARC 21 Regulation
Advantage of MARC 21
Problems with MARC 21
Future of MARC 21
Linked Open Data Principles, Technologies and ExamplesOpen Data Support
Theoretical and practical introducton to linked data, focusing both on the value proposition, the theory/foundations, and on practical examples. The material is tailored to the context of the EU institutions.
Similaire à Open Archives Initiatives For Metadata Harvesting (20)
Implementing web scale discovery services: special reference to Indian Librar...Nikesh Narayanan
Web scale Discovery services arebecoming the widely adopted Information Retrieval solution in libraries across the world to connect its patrons with the relevant information they seek. In lieu with the world trend, Resources Discovery Solution implementation is gathering momentum in Indian libraries also.
Considering the Indian Libraries scenario, this paper attempts to provide an overview of Library Web Scale Discovery solutions, its need in Indian Libraries, important parameters to be considered for evaluation of Discovery Services, essential factors to be considered prior to implementation, stages of implementation and finally some thoughts on post implementation analysis for measuring the success.
Web scale Discovery services are becoming the most sought after solution for Libraries to connect its patrons with the relevant information they seek. Many studies show that these services are getting wide acceptance from users as well as Library staff and making revolution in Library Information retrieval arena. Given such broad implications, selecting a new discovery service for libraries is an important undertaking. Library professionals should carefully evaluate options to meet their goal of finding the best potential match for their library. This Paper attempts to provide a comprehensive overview of Library Web Scale Discovery solutions by depicting various facets of Web Scale Discovery, how it differs from federated searching and highlights the important parameters to be considered for taking an informed and confident decision on selecting discovery service.
Cloud web scale discovery services landscape an overviewNikesh Narayanan
Abstract
The impact of Internet and Google like search engines radically influenced the information behavior of Net Generation users. They expect same environment in library services such that all their required information make available in a single set of results through unified search across all the available resources. Libraries have been striving to respond to this challenge for years. Until recently, federated search technology of the past decade was the better attempt in this area to meet these user expectations. But federated search solution is marked by the drawbacks of its slowness as it searches each database on the fly. New Generation cloud based Library Web scale discovery technology is a promising entrant in this landscape. This Paper attempts to provide a comprehensive overview of Library Web Scale Discovery solutions by depicting various facets of Web Scale Discovery solutions such as its importance to Library field, their possible role as the starting point for research, content coverage, and finally analyses the competition at the discovery front by comparing the services of major players. The comparative analysis shows that all the major service providers are extending competitive features and services, but varies in some areas and the adoption choice depends on the concerned library’s preferences and the cost involved.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
1. Open Archives Initiatives for Metadata Harvesting
A Framework for Building Open Digital Libraries
Term Paper-1
Submitted by
NIKESH.N
International School of Information Management
University of Mysore
2010
2. Open Archives Initiatives for Metadata Harvesting
A Framework for Building Open Digital Libraries
1.0 Introduction
Digital Library may be defined as system that supports collection, organization, storage, retrieval
and dissemination of Digital Documents. It may be viewed as the intersection of Library Science,
Computer Science and networked information systems. Open movements are gaining acceptance
in the scholarly information arena and many of the Universities and research centers have started
to provide public access to their repositories. With the growing number of repositories of digital
repositories in the Web, it became difficult for the users to visit individual places in search of
information. Many organizational repositories have not been indexed by the search engines. Such
mechanism is therefore required by which the repositories can share the resources and work in
coordination, to provide a broader purview to the users. The mechanism which provides the ability to
the information systems to work in coordination has been termed as Interoperability. Open Archives
Initiative is one of the landmark efforts to ensure the availability of the metadata of digital resources
of many repositories at the users’ end.
The essence of the open archives approach is to enable access to Web-accessible material through
interoperable repositories for metadata sharing, publishing and archiving.
Such interoperability requirements necessitated the development of standards such as the Dublin
Core Metadata Element Set and the Open Archives Initiative's Protocol for Metadata Harvesting
(OAI-PMH). These standards have achieved a degree of success in the DL community largely
because of their generality and simplicity.
2.0 Need for a Harvester protocol
There is a growing need to make resources, not only descriptive metadata, harvestable in an
interoperable manner. There are two major use cases that motivate this need:
• Preservation: The need to periodically transfer digital content from a data repository to one or
more trusted digital repositories charged with storing and preserving safety copies of the
3. content. The trusted digital repositories need a mechanism to automatically synchronize with
the originating data repository.
• Discovery: The need to use content itself in the creation of services. Examples include search
engines that make full-text from multiple data repositories searchable, and citation indexing
systems that extract references from the full-text content. Another scenario is the provision of
thumbnail versions of high-quality images from cultural heritage collections to external
services that build browsing interfaces that include the thumbnails
3.0 OAI Protocol for Metadata Harvesting (OAI-PMH)
In October of 1999 the Open Archives Initiative (OAI) was launched in an attempt to address
interoperability issues among the many existing and independent DLs. The focus was on high-
level communication among systems and simplicity of protocols. The OAI has since received
much media attention in the DL community and, primarily because of the simplicity of its
standards, has attracted many early adopters. It defines a mechanism for harvesting records
containing metadata from repositories.
3.1 Definitions of Key terms
• Open archives Initiatives (OAI)
OAI is an initiative to develop and promote interoperability standards that aim to facilitate the
efficient dissemination of content.
• Archive
The term "archive" in the name Open Archives Initiative reflects the origins of the OAI in
the e-prints community where the term archive is generally accepted as a synonym for
repository of scholarly papers. Members of the archiving profession have justifiably noted
the strict definition of an ?archive? within their domain; with connotations of preservation of
long-term value, statutory authorization and institutional policy. The OAI uses the term ?
archive? in a broader sense: as a repository for stored information. Language and terms are
never unambiguous and uncontroversial and the OAI respectfully requests the indulgence of
the professional archiving community with this broader use of ?archive?
4. (OAI definition quoted from FAQ on OAI Web site)
• OAI Protocol for Metadata Harvesting (OAI-PMH)
OAI-PMH is a lightweight harvesting protocol for sharing metadata between services.
• Protocol
A protocol is a set of rules defining communication between systems. FTP (File Transfer
Protocol) and HTTP (Hypertext Transport Protocol) are examples of other protocols used for
communication between systems across the Internet.
• Harvesting
In the OAI context, harvesting refers specifically to the gathering together of metadata from a
number of distributed repositories into a combined data store.
3.2 Prerequisites to develop metadata harvesting protocol
To facilitate metadata harvesting there needs to be agreement on:
o Transport protocol - HTTP or FTP or other such protocol
o Metadata format - Dublin Core or MARC or other such format
o Metadata Quality Assurance - mandatory element set, naming and subject conventions, etc.
o Intellectual Property and Usage Rights - who can do what with what?
3.3 OAI: Key players
There are two groups of 'participants': Data Providers and Service Providers.
5. Data Providers
(open archives, repositories) provide free access to metadata, and may, but do not necessarily,
offer free access to full texts or other resources. OAI-PMH provides an easy to implement, low
barrier solution for Data Providers.
Service Providers
use the OAI interfaces of the Data Providers to harvest and store metadata. Note that this means
that there are no live search requests to the Data Providers; rather, services are based on the
harvested data via OAI-PMH. Service Providers may select certain subsets from Data Providers
(e.g., by set hierarchy or date stamp). Service Providers offer (value-added) services on the basis
of the metadata harvested, and they may enrich the harvested metadata in order to do so.
3.4 How it works
6. Prerequisites to develop metadata harvesting protocol
To facilitate metadata harvesting there needs to be agreement on:
o Transport protocol - HTTP or FTP or other such protocol
o Metadata format - Dublin Core or MARC or other such format
o Metadata Quality Assurance - mandatory element set, naming and subject conventions, etc.
o Intellectual Property and Usage Rights - who can do what with what?
The OAI-PMH gives a simple technical option for data providers to make their metadata
available to services, based on the open standards HTTP (Hypertext Transport Protocol) and
XML (Extensible Markup Language). The metadata that is harvested may be in any format that
is agreed by a community (or by any discrete set of data and service providers), although
unqualified Dublin Core is specified to provide a basic level of interoperability. Thus, metadata
from many sources can be gathered together in one database, and services can be provided based
on this centrally harvested or "aggregated" data. The link between this metadata and the related
content is not defined by the OAI protocol. It is important to realize that OAI-PMH does not
provide a search across this data, it simply makes it possible to bring the data together in one
place. In order to provide services, the harvesting approach must be combined with other
mechanisms.
3.5 Protocol details
Records
A record is the metadata of a resource in a specific format. A record has three parts: a header and
metadata, both of which are mandatory, and an optional about statement. Each of these is made
up of various components as set out below.
header (mandatory)
identifier (mandatory: 1 only)
7. datestamp (mandatory: 1 only)
setSpec elements (optional: 0, 1 or more)
status attribute for deleted item
metadata (mandatory)
XML encoded metadata with root tag, namespace
repositories must support Dublin Core, may support other formats
about (optional)
rights statements
provenance statements
Datestamps
A datestamp is the date of last modification of a metadata record. Datestamp is a mandatory
characteristic of every item. It has two possible levels of granularity:
YYYY-MM-DD or YYYY-MM-DDThh:mm:ssZ.
The function of the datestamp is to provide information on metadata that enables selective
harvesting using from and until arguments. Its applications are in incremental update
mechanisms. It gives either the date of creation, last modification, or deletion. Deletion is
covered with three support levels: no, persistent, transient.
Metadata schema
OAI-PMH supports dissemination of multiple metadata formats from a repository. The
properties of metadata formats are:
– id string to specify the format (metadataPrefix)
– metadata schema URL (XML schema to test validity)
– XML namespace URI (global identifier for metadata format)
Repositories must be able to disseminate unqualified Dublin Core. Further arbitrary metadata
formats can be defined and transported via the OAI-PMH. Any returned metadata must comply
8. with an XML namespace specification. The Dublin Core Metadata Element Set contains 15
elements. All elements are optional, and all elements may be repeated.
3.6 The Dublin Core Metadata Element Set:
Title Contributor Source
Creator Date Language
Subject Type Relation
Description Format Coverage
Publisher Identifier Rights
Sets
Sets enable a logical partitioning of repositories. They are optional archives do not have to
define Sets. There are no recommendations for the implementation of Sets. Sets are not
necessarily exhaustive of the content of a repository. They are not necessarily strictly
hierarchical. It is important and necessary to have negotiated agreements within communities
defining useful sets for the communities.
• function: selective harvesting (set parameter)
• applications: subject gateways, dissertation search engine, and others
• examples
o publication types (thesis, article, ?)
o document types (text, audio, image, ?)
o content sets, according to DNB (medicine, biology, ?)
3.7 Request format
Requests must be submitted using the GET or POST methods of HTTP, and repositories must
support both methods. At least one key=value pair: verb=RequestType (where RequestType is
9. some type of request such as ListRecords) must be provided. Additional key=value pairs depend
on the request type.
example for GET request: http://archive.org/oai?
verb=ListRecords&metadataPrefix=oai_dc
The encoding of special characters must be supported; for example, ":" (host port separator)
becomes "%3A"
3.8 Response
Responses are formatted as HTTP responses. The content type must be text/xml. HTTP-based
status codes, as distinguished from OAI-PMH errors, such as 302 (redirect) and 503 (service not
available) may be returned. Compression codes are optional in OAI-PMH, only identity
encoding is mandatory. The response format must be well-formed XML with markup as follows:
1. XML declaration
(<?xml version="1.0" encoding="UTF-8" ?>)
2. root element named OAI-PMH with three attributes
(xmlns, xmlns:xsi, xsi:schemaLocation)
3. three child elements
1. responseDate (UTC datetime)
2. request (the request that generated this response)
3. a) error (in case of an error or exception condition)
b) element with the name of the OAI-PMH request
10. 3.9 OAI-
PMH
Verbs
Here ‘verb’
means
request type which the service provider/harvester sends to get responses from data providers. There is
a standard set of 6 verbs:
o Identify
o ListMetadataFormats
o ListSets
o GetRecord
o ListIdentifiers
o ListRecords
Function
Identify Description of repository
ListMetadataFormats Metadata format supported by the repository
ListSets Sets defined by repository
ListIdentifiers Retrieves unique identifiers of the item
ListRecords Used to harvest records from the repository
GetRecords Retrieves individual metadata record from the
repository
11. A harvester is not required to use all types. However, a repository must implement all types.
There are required and optional arguments, depending on request types.
4.0 Dspace : OAI compatible Digital Library Software
DSpace is open source software for building and managing Digital repositories. Developed jointly by
MIT Libraries and Hewlett-Packard (HP), is freely available to research institutions as an open
source system that can be customized and extended. DSpace is a digital institutional repository that
captures, stores, indexes, preserves, and redistributes content in digital formats. Institutional
Repository is a set of services that a research institution/ organization/ university offers to the
members of its community for the management and dissemination of digital
materials created by the institution and its community members Typically, DSpace has been
deployed for Institutional Repositories of publications, thesis and dissertations. There are several
groups working on extending its capabilities such implementation of ontologies in search interface
and for submission module, customization for management of electronic theses and dissertations and
for localization and international of the package for the world languages.
Dspace is compliant with OAI-PMH ver 2.0 and metadata in Dspace digital libraries can be
harvested.
4.1 DSpace Search System
The end user can browse, search and access the collections using the hierarchies and also the
alphabetic bar menu. For searching the collection, Dspace uses Lucene Search Engine, which is a
part of Apache Jakarta Project (1). Additionally research projects such as the …(Portugal)…
provides Ontologies that enables context based querying. This work like subject based directory
structures.
Lucene search engine has very powerful search features that encompass many search approaches of
the end-user. It provides the basic ‘exact term’ or keyword search. In addition it allows fielded search
akin the field level search of library databases. In Dspace, Dublin Core elements are used for the field
names. Lucene also facilitates Boolean search, range searches, term boosting and proximity searches.
The interesting search facility lucene uses fuzzy logic that is based on the Levenstien’s alogorithm
(5) that can replace and match terms by similarity. This feature is especially useful in instances where
we hear a term and guess it spellings and more so in the case of personal names.
12. 4.2 Metadata in Dspace
DSpace users deal with/come across metadata in the following modules:
D Administration modules: Dublin core registry, administrative metadata- default values, mail
alert to subscribers
a Submission modules: descriptive metadata
a Harvesting – OAI-PMH using the DC elements (unqualified)
a Search result display: brief and full metadata
4.3 Metadata harvesting in Dspace
Dspace is compliant with the OAI-PMH for exposing metadata. OAI-PMH allows repositories to
expose an hierarchy of sets in which records may be placed. DSpace exposes collections as sets.
Each collection has a corresponding OAI set and harvestors use a verb (OAI- command) ListSets, to
discover the sets. Only the 15 basic Dublin Core elements is exposed at present.
5.0 OAI Harvester Software
o Arc (http://arc.cs.odu.edu/)
o Citebase (http://citebase.eprints.org/cgi-bin/search)
o CYCLADES (http://www.ercim.org/cyclades/)
o DP9 (http://arc.cs.odu.edu:8080/dp9/index.jsp)
o MeIND (http://www.meind.de/)
o METALIS (http://metalis.cilea.it/)
o my.OAI (http://www.myoai.com)
o NCSTRL (http://www.ncstrl.org/)
o Purseus (http://www.perseus.tufts.edu/cgi-bin/vor)
o Public Knowledge Project – Open Archives Harvester (http://pkp.ubc.ca/harvester/)
o OAICAT (http://www.oclc.org/research/software/oai/cat.htm)
o OAI Repository Explorer (http://re.cs.uct.ac.za/)
o OAIster (http://oaister.umdl.umich.edu/o/oaister/)
o OASIC (Open Archvies en SIC) (http://oasic.ccsd.cnrs.fr/)
o OAIHarvester (http://www.oclc.org/research/software/oai/harvester.htm)
o DLESE OAI Software (http://dlese.org/oai/index.jsp)
6.0 Future Prospects
13. Some more work has to be done in order to make OAI-PMH as a complete globally accepted
metadata harvesting protocol:
o Tools and software has to be developed by which the non-OAI-PMH compliant repositories
can be converted into OAI-PMH compliant so that the repository can be made data provider.
o The higher versions of the protocol should be made compatible of the lower ones.
At metadata creation level some standardization is required, as a particular resource is described
inconsistently at different repositories. Vocabulary control measures should be also taken care of.
Still some more improvements are awaited in OAI-PMH protocol, and then only we can ensure
a comprehensive view of the resources available on a particular subject to our end-users.
7.0 Conclusion
Much promise is seen for the use of the protocol within an open archives approach. Support for a
new pattern for scholarly communication is the most publicized potential benefit. Perhaps most
readily achievable are the goals of surfacing 'hidden resources' and low cost interoperability.
Although the OAI-PMH is technically very simple, building coherent services that meet user
requirements remains complex. The OAI-PMH protocol could become part of the infrastructure
of the Web, as taken-for-granted as the HTTP protocol now is, if a combination of its relative
simplicity and proven success by early implementers in a service context leads to widespread
uptake by research organizations, publishers and archives.
REFERENCES
1. http://www.openarchives.org/
2. Breeding, M. (2002, April). The Emergence of the Open Archives Initiative: This Protocol
could become a key part of the digital library infrastructure. Information Today.
from http://www.findarticles.com/cf_0/m3336/4_19/85251474/p1/article.jhtml
3. Breeding, M. (2002). Understanding the Protocol for Metadata Harvesting of the Open
Archives Initiative. Computers in Libraries, 22(8).
4. Lagoze, C., & Sompel, H. V. d. (2001, January). The Open Archives Initiative Protocol for
Metadata Harvesting,from http://www.openarchives.org/OAI/openarchivesprotocol.htm
14. 5. Lynch, C. A. (2001, August). Metadata Harvesting and the Open Archives Initiative. ARL
Bimonthly Report 217. from http://www.arl.org/newsltr/217/mhp.html
6. Shearer, K. (2002, March). The Open Archives Initiative: Developing an Interoperability
Framework for Scholarly Publishing. CARL/ABRC Background Series, No. 5. from
http://www.carl-abrc.ca/projects/scholarly/open_archives.PDF
7. Suleman, H., & Fox, E. A. (2001, December). A Framework for Building Open Digital
Libraries. D-Lib Magazine, 7(12). from
http://www.dlib.org/dlib/december01/suleman/12suleman.html
8. Sompel, H. V. d., & Lagoze, C. (2000, February). The Santa Fe Convention of the Open
Archives Initiative. D-Lib Magazine, 6(2). from http://www.dlib.org/dlib/february00/vandesompel-
oai/02vandesompel-oai.html
9. Warner, S. (2001, June). Exposing and Harvesting Metadata Using the OAI Metadata
Harvesting Protocol: A Tutorial. HEP Libraries Webzine Issue 4. from
http://library.cern.ch/HEPLW/4/papers/3/
11 . http://www.ukoln.ac.uk/repositories/digirep/index/FAQs
12 . Michael Shepherd, (2003), Interoperability for Digital Libraries, DRTC Workshop on
Semantic Web 8th – 10th December, 2003,DRTC, Bangalore
13 . http://www.openarchives.org/Register/BrowseSites
14 . http://www.openarchives.org/service/listproviders.html