Scholars Portal, a program of the Ontario Council of University Libraries (OCUL), provides the technical infrastructure to store, preserve, and provide access to shared digital library collections in Ontario - including hosting a local instance of Dataverse since 2011. As part of a national project known as Portage (a project of the Canadian Association of Research Libraries), Scholars Portal is partnering with Artefactual Systems, Dataverse, the University of British Columbia, the University of Alberta, and others, to integrate Dataverse with preservation software Archivematica. When completed, this project will facilitate the long-term preservation of research data according to the Open Archival Information System (OAIS) Reference Model.
Preservation of Research Data: Dataverse / Archivematica Integration by Allan Bell and Leanne Trimble
1. Preservation of Research Data:
Dataverse / Archivematica Integration
Allan Bell | Associate University Librarian,
The University of British Columbia
Leanne Trimble | Data & Geospatial Librarian,
OCUL Scholars Portal
3. University of British Columbia Digital
Preservation Strategy
● Digital Preservation Program
○ cIRcle, DSpace-based repository
○ Digitized collections in CONTENTdm
○ New and legacy born digital archival material
○ Websites (Archive-IT)
○ Soon, Abacus Dataverse, Research Data
4. University of British Columbia Digital
Preservation Strategy
● Use Archivematica as a tool to apply OAIS-compliant
preservation processes
● Integrate Archivematica with existing systems used
to manage digital objects
● Build internal technical and staff capacity
6. Archivematica
● “a free and open source digital
preservation system that is
designed to maintain standards-
based, long term access to
collections of digital objects”
http://www.archivematica.org
● micro-services provide
integrated suite of software
tools in compliance with ISO-
OAIS model
10. Digital Preservation Program TRAC Self
Audit
• Trustworthy Repositories Audit and Certification
(evolved into ISO 16363)
• Widely accepted criteria for assessing
trustworthiness of digital repositories
• TRAC checklist is an auditing tool to assess the
reliability, commitment and readiness of
institutions to assume long-term preservation
responsibilities
11. What is TRAC?
• The TRAC metrics assess three areas:
a. Organizational Infrastructure - the repository's
administrative, staffing, financial, and legal functions
b. Digital Object Management - the handling of digital
objects from ingest to access
c. Technology, Technical Infrastructure and Security - the
technology used to handle ingested objects
• These criteria represent best practices and current thinking
about the organizational and technological needs of
trustworthy digital repositories.
12. TRAC Compliant Repositories
Centre for Research Libraries has audited and
certified five repositories:
•Chronopolis Report
•CLOCKSS
•Hathitrust Report
•Portico Report
•Scholars Portal
13. Digital Preservation Program
Conclusions
• Greater comfort with and
understanding of the challenges
around archiving digitized and
born digital material
• Establishing a comprehensive
digital preservation program is
complex!
• Having tools is important, also
need policies and procedures for
certification (if desired)
14. Abacus Dataverse: Research Data
Management
● UBC hosted instance for four Research Universities in British
Columbia since 2014
○ Abacus DSpace launched in 2009
● 1,700 studies (more than 28,000 files)
● Actively used by researchers
● Each school has full control and added discoverability for their
data
○ Licensed data but also growing institutional research data
collections
○ Each institution has its own subnet with
■ OAI export to Summon (common Library Discovery Layer)
■ Separate Dataverses for institutional research data
16. OCUL & Scholars Portal
Who?
• 21 university libraries in Ontario
What?
• Collective purchasing
• Shared digital infrastructure
• Collaborative planning and
assessment
How?
*Scholars Portal*
• OCUL’s shared technology
infrastructure, housing shared
collections
More information:
http://www.ocul.on.ca/
17. OCUL/SP & Research Data Management
Dataverse (OCUL hosted instance)
– Hosted for OCUL since 2011
– 330 studies (about 4,000 files)
– Actively used by researchers from 7-8 institutions
– Many in social science disciplines but some in
sciences (agriculture, polar research, geophysics,
nursing…)
18. OCUL/SP & Research Data Management
• Services are evolving at each institution
• Still trying to get a handle on:
– RDM support services required by researchers
– RDM infrastructure requirements
– RDM costs
– Role of regional consortia in RDM services
19. OCUL/SP & Digital Preservation
• Trustworthy Digital Repository (TDR) certified
for electronic journal content (since 2013)
• Currently working on Ontario Library Research
Cloud (OLRC) project (2015 completion)
•Data Preservation: strong interest
21. ‘Portage’
Canadian Association of Research Libraries led
project aimed at building a library-based
research data management network
2 aspects:
• Network of expertise for research data
management
• A national preservation and discovery network
for research data
24. Dataverse/Archivematica Integration
Dataverse
• Data
• Metadata (DDI &
other)
Archivematica
• Accept data and
metadata
• Perform preservation
functions
• Create Archival
Information Packages
(AIPs)
Archival
storage
?
Local Data Repository
(e.g. at SP or UBC)
Preservation Infrastructure (Portage)
Integration Middleware
• Harvest content via Dataverse API (no
SWORD client capability ATM)
• Package and submit to Archivematica
using SWORD
25. Project Participants
• Artefactual – Evelyn McLellan, Justin Simpson
• Dataverse – Phil Durbin, Eleni Castro (& others)
• Scholars Portal – Leanne Trimble, Alan Darnell
• UBC – Allan Bell, Eugene Barsky
• University of Alberta – Geoff Harder, Chuck
Humphrey, Larry Laliberte, Peter Binkley
• Simon Fraser University – Alex Garnett
26. Functional Requirements
● Develop “middleware” which can transfer
studies from Dataverse to Archivematica
- Detect newly published studies & “major” new
versions
- Harvest released studies from Dataverse
- Utilize SWORD protocol
- Submit to Archivematica
- One Dataverse study = 1 SIP = 1 AIP
27. Functional Requirements (2)
● Investigate Archivematica pipeline decisions
for data formats coming from Dataverse
- File format normalization?
- Connecting versions of the same dataset to one
another?
- Handling DDI (and other) metadata records?
28. Possible features for future stages
• Dataverse as a SWORD client
• Mechanism within Dataverse for researchers
to specify which datasets they want to target
for preservation
• Returning information from Archivematica
back to Dataverse (indication of preservation
status within Dataverse)
29. Next Steps
• University of Toronto procurement process
underway to contract the development work
to Artefactual
• Develop the middleware (2015)
• Recruit researchers to contribute data to ingest
(concurrent with development work)