A presentation given by Jenny Mitcham at the iPRES conference on 6th November 2015 in Chapel Hill, North Carolina. It describes work underway in the "Filling the Digital Preservation Gap" project using Archivematica to preserve research data
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
"Filling the digital preservation gap" with Archivematica
1. “Filling the digital preservation
gap” with Archivematica
Jenny Mitcham
Digital Archivist
Borthwick Institute for Archives
University of York
6 November 2015
2. Filling the digital preservation gap:
Project aim
“…to investigate
Archivematica and explore
how it might be used to
provide digital preservation
functionality within a wider
infrastructure for Research
Data Management.”
3. What gap?
• This project is looking specifically at the
Research Data that is being created in
Universities
– Many UK Universities have a repository
– Many UK Universities have a digital storage facility
– Many UK Universities collect metadata about the
research that is being carried out
– ….but very few are actively preserving the data
4. This is a collaboration
University of Hull:
• Chris Awre – Head of Information Services, Library
and Learning Innovation
• Richard Green – Independent Consultant
• Simon Wilson – University Archivist
University of York:
• Julie Allinson – Manager,
Digital York
• Jen Mitcham – Digital Archivist
Artefactual Systems
5. Project structure
• Phase 1 – explore: testing, research,
thinking -produce a report (3 months)
• Phase 2 – develop: make
Archivematica better for RDM, plan
implementation (4 months)
• Phase 3 – implement: set up proof of
concepts at York and Hull (6 months)
6. Why Archivematica?
• Standards-based
• Open Source
• Flexible and customisable
• Compatible with hundreds of file formats
• Advanced search and storage management
• Integrated with third-party systems
From https://ww.archivematica.org
8. Why would we recommend
Archivematica for RDM?
• It is flexible and can be configured in different ways for
different institutional needs and workflows
• It allows many of the tasks around digital preservation
to be carried out in an automated fashion
• It can be used alongside other existing systems as part
of a wider workflow for research data
• It is a good digital preservation solution for those with
limited resources
• It gives institutions greater confidence that they will be
able to continue to provide access to usable copies of
research data over time
9. …and don’t forget the community
• It is an evolving solution that is continually
driven and enhanced by and for the digital
preservation community
– Moving target…but moving in the right direction
– Some really interesting developments underway
– Engaged communities
• International community
• UK user group (includes National Library of Wales, Tate
Britain, Museum of London, Arkivum, several HEIs and
some European institutions)
10. What are the downsides?
• It isn’t a magic bullet
• There is no guarantee your data will be
readable in the future
• It can only be as good as current digital
preservation practice
• It can be fiddly to install correctly
• The GUI isn’t that intuitive
• You need staff who understand it
11. How are we enhancing Archivematica?
1. Enable better workflows for RDM (producing a
DIP on request)
2. Allowing the DIP (access copy of data) to be
usable by different repository systems
3. Helping reduce bottlenecks for big data
4. Workflows for unidentified files
5. Enabling easier querying of data within
Archivematica by third party applications
6. Better documentation
12. What next?
• Finish off phase 2:
enhancements to
Archivematica available to
whole community
• Bid for funding for phase 3:
…to lead to proof of
concept implementations
13. Read all about it!
http://digital-archiving.blogspot.co.uk/