2. Structure
• Federation & Harvesting
• Operations of LORs
• Case Studies of LORs under development
– Policy and Technical Issues at the University of
Sydney Library
– JScholarship at The Johns Hopkins University
3. Federation & Harvesting
• Federated searches are conducted by search
engines accessing many different databases
with a the same query
• Harvesting, on the other hand, refers to the
gathering together of metadata from a
number of distributed repositories into one
portal website
4. Operations of LORs
• search/find – the ability to locate an
appropriate learning object. This can include
the ability to browse
• quality control – a system that ensures
learning objects meet technical, educational
and metadata requirements
• request – a learning object that has been
located in the database
5. Operations of LORs
• maintain - appropriate version control
• retrieve – receive an object that has been
requested
• submit – provide an object to a repository for
storage
• store – place a submitted object into a data
store with unique, registered identifiers that
allow it to be located
6. Operations of LORs
• gather (push/pull) – obtain metadata about
objects in other repositories for wider
searches and information via a clearing house
function
• publish – provide metadata to other
repositories
9. University of Sydney Library
• The University of Sydney Library supports
research, learning, and teaching through a
variety of initiatives and collaborative
activities with academics
• Aim to develop guidelines to support a
consistent and sustainable approach to
dealing with requests to manage materials
within the repository
10. Collection Descriptions
• An individual academic, research project team, or
a group of academics working within a discipline
created the collections
• Academics are aware of the facility of descriptive
metadata for categorizing and interrogating
datasets
– They adopt or modify domain standards or create rich
and often highly granular tag sets to suit project
requirements
• The collections are not large, generally in the
range of tens or hundreds of gigabytes
11. Collection Descriptions
• Metadata is typically held in databases including
File-maker and MySQL or spreadsheet
applications such as Microsoft Excel, with
associated data objects housed on personal
computer or departmental file systems
• Collections under discussion arise from:
– School of Geosciences,
– Sydney College of the Arts, Department of
Archaeology and
– School of Biological Sciences
12. Defining Metadata Management
Requirements
• Retain the granularity of the native record
• Enable export, including Open Archives
Initiative (OAI) harvesting, of records in DC
and native format
• Enable development of schema-specific
search interfaces, whether through repository
tools or integration with other services.
• Ensure service sustainability
13. Considering Options for Metadata
Management
• Map native metadata to existing DC elements
– Native metadata records are mapped to DC and
transferred to the repository as standard DC
records
– All approaches have advantages and
disadvantages related to:
• The loss of information from existing metadata
• The use of the metadata after their transformation and
practically the ways in which metadata can be used
from that point on
14. Advantages
• Low submission cost and low ongoing
maintenance cost,
• No configuration or maintenance of DSpace
index keys needed,
• Customized metadata schemas, or OAI
crosswalks,
• Records fully searchable through default DC
indexing and harvestable via default OAI
15. Disadvantages
• Loss of metadata granularity and inability to
recreate the original records
• Many items of metadata would not be
meaningful without contextual information
provided by their native tags
• Does not support provision of a traditional
field-based advanced search effective of the
granularity of the original records
16. Considering Options for Metadata
Management
• Map native metadata to DC elements and
create new custom qualifiers for standard DC
tags
– Native metadata records are mapped to DC and
transferred to the repository as standard DC
records. The granularity of non-DC elements is
retained through mapping to customized qualifiers
of standard DC tags
17. Advantages
• Retains the granularity of the native records,
supporting recreation of the original metadata
records. Also retains contextual information
conveyed by the original tags
• Requires no configuration or maintenance of
DSpace index keys, customized metadata
schemas, or OAI crosswalks
• Records would be fully searchable via default
DC indexing and harvestable via default OAI
18. Disadvantages
• Higher submission and maintenance costs
than option 1, requiring additional and
ongoing recordkeeping and maintenance
procedures
• As DC qualifiers proliferate, management of
the central registry may pose challenges
19. Considering Options for Metadata
Management
• Create a custom schema identical to the
native metadata set
– A custom schema separate to DC is implemented
within the repository. Metadata records are
transferred to the repository in their native
formats
20. Advantages
• Avoids the DC registry management problems
of option 2, by enabling partitioning and
separate maintenance of each custom schema
• May enable future provision of a collection-,
community-, or schema-level traditional field-
based advanced search reflective of the
granularity of the original records
21. Disadvantages
• Requires configuration and ongoing maintenance
of DSpace index keys, customized metadata
schemas, and OAI crosswalks
• May result in a proliferation of project-specific
schemas requiring ac-companying recordkeeping
and maintenance
• Will not assist in the management of hierarchical
metadata schemas, as DSpace does not support
these
22. Considering Options for Metadata
Management
• Generate DC records as abstractions of the
native metadata records and submit the
native metadata records as digital object bit-
streams
– DC records act as bibliographic descriptions of the
native metadata records. The original records are
submitted as accompanying bit-streams
23. Advantages
• Relatively low submission cost and low ongoing
maintenance cost
• Requires no configuration or maintenance of
DSpace index keys, customized metadata
schemas, or OAI crosswalks
• Depending on how much of the original metadata
is mapped to standard DC, records could be
keyword searchable via default DC indexing
24. Advantages
• DC versions of the records would be
harvestable via default OAI
• Avoids the DC registry management problems
of option 2 and the schema proliferation
issues of option 3
• Retains the original metadata records in their
native format
25. Disadvantages
• Would not support future provision of a
collection-, community-, or schema-level
traditional field-based advanced search
reflective of the granularity of the original
records
– Would require indexing of the accompanying
native metadata file
• Would not readily enable harvesting of native
metadata records
26. Considering Options for Metadata
Management
• Generate DC records as abstractions of the
native metadata records and submit the
native metadata records as digital object bit-
streams
– DC records act as bibliographic descriptions of the
native metadata records. The original records are
submitted as accompanying bit-streams
Selected
27. Metadata Mapping
• Metadata from the source databases was
mapped to DC to enable simple keyword
searching within DSpace and DC-based OAI
harvesting
28. Metadata Transfer
• Records were exported as CSV files, each record
comprising a row in the file.
• The author created a Python script, which wrote each
row to two files.
– One was a DC XML file and the other a native metadata file
– The script also packaged the metadata and associated data
files in a format suitable for submission to DSpace
• A selection of records were manually sampled and
compared and additional scripting ensured that all
records were correctly transferred
30. JScholarship
• JScholarship (http://jscholarship.library.jhu.edu), the
Johns Hopkins institutional repository, is the
home for research materials created by faculty
& staff from the university, the medical
institutions, and other affiliates such as the
Applied Physics Lab
• Launched in 2008
31. Management Structure
• This DSpace-based repository is a service
developed and operated jointly by the
Sheridan Libraries and the Welch Medical
Library
– Directors of both libraries and several key staff
members serve as the Oversight Group for
Jscholarship
• They establish high-level policies for the repository and
provides guidance to the IR manager in areas such as
content recruitment and assessment
32. Creating Metadata
• The Oversight Group decided to leave the
submission process and metadata creation to the
various research communities, with library staff
acting only in a training and advisory role
• Each community has created its metadata at the
time of submission, but the library is
experimenting with harvesting existing metadata
to use for batch ingestion of digitized library
collection
33. Policies
• Each research community establishes many of
the policies for its collections
– Including policies for both content & metadata
generation
– Allows for personalization in each community
• How has this affected metadata in two of the
communities in JScholarship?
34. Center for Africana Studies
• Created collections for center research, faculty
articles, and working papers
• Researchers contributing content are
decentralized – belong to many dpts
• An administrative assistant gathers research,
uploads files, and creates the metadata for
each of the Center’s collections
35. Center for Africana Studies
• The interdisciplinary nature of the collections
does not lend itself to using a specialized
controlled vocabulary for subject terms
• Although a wide-ranging thesaurus would
work with these materials, the Center has
opted to use keywords from the articles
themselves
36. Hopkins Population Center
• Faculty associates produce most of the
research in working papers, conference
proceedings, and journal articles
• Instead of having a single person perform the
submission, metadata creation, and approval,
they had students perform some of the
submission and basic metadata tasks
37. Hopkins Population Center
• The submissions were then checked and
enhanced by a liaison librarian from the Welch
Medical Library
• The only community to use a controlled
vocabulary for subject terms
– Already have their own thesaurus for their
POPLINE database, they decided to use those
terms in the JScholarship