Presentation given by Stuart Kenny and Kathryn Cassidy, Software Engineers with the Digital Repository of Ireland, at Open Repositories 2016 in Dublin.
2. Stuart Kenny
Research IT
Trinity College Dublin
The Fairy Tales of Charles Perrault. Illustrated by Harry Clarke.
Intro. Thomas Bodkin. London: George G. Harrap, [1922].
Internet Archive version of a copy in the New York Public Library.
Web. 25 December 2012.
My what a big collection you
have!
3. About DRI (https://repository.dri.ie/)
● DRI is an interactive trusted digital repository for
contemporary and historical, social and cultural
data held by Irish institutions
● RIA (lead), NUIM, TCD, DIT, NUIG, NCAD
● Partners: academic, cultural, social, government
4. Outline
• What’s our problem?
• Example collections
• Ingest solutions
• Current ingest process
• Possible future process
5. Ingesting Objects
• Ingest form
o Suitable for single
objects/small collections
o Flat hierarchies
o Simple metadata standards
• Multiple standards
o e.g., MARC, EAD
o XML upload
• How to handle complex
standards, many
objects?
6.
7. Example Collection: Clarke Stained Glass
• MODS metadata
• 10,025 objects
• 42 sub-collections
• 20,047 files, 2.82 TB
• Problems:
o Large number of objects
o Data transfer
8. Example Collection: TCD Children’s Books
• MARC metadata
• 207,889 objects
• 16 sub-collections
• Problems:
o Large number of objects
o Very slow to ingest
o Timeouts and errors
9. Example Collection: Kilkenny Design Workshop
• EAD metadata
• 2,040 objects
• 2,734 series/files
• 2,231 files, 1.2GB
• Problems:
o Very complex metadata standard
o Hierarchical structure
10. EAD, and why I don’t quite hate it as much as I did...
• Single XML file upload
• Structure encoded in metadata
• URLs to files
• But
o One-shot ingest
o How to edit/update?
o Slow to ingest
o Requires a lot of resources
11. Sufia Batch Upload
• Add multiple files
• New work for each
• Metadata for each
work
• How to handle
multiple standards?
• Different metadata
for each work?
12. Avalon Batch Ingest
• Ingest package
o Manifest file
o Plus content files
• Manifest file is spreadsheet
o Metadata for items
o Names of content files
• Ingest package uploaded to Avalon DropBox
13. Approach up to now
• Command line client
o Enter text commands at ‘command prompt’
• Written in Ruby
• Run locally by user
• Metadata and asset files arranged in fixed directory structure
• Client iterates over directory creates each object as single
ingest
14.
15. Problems
• Lack of user familiarity with command line
• Multiple platform support
o i.e., Windows
• Difficulty of installing
• Multiple single ingests
o Slow
o Error prone
• Required lots of user support
• Mostly in the end ingests performed by dev team
16. Current Attempt
• Web-based UI
• Borrow heavily from Avalon approach
• Upload metadata XML plus assets to online storage
• Add manifest spreadsheet
o Each row contains path to metadata
o Paths to zero or more asset files
o Paths relative to online storage directory
• Backend processes manifest and ingests as background task
• UI updates status
18. • Hydra BrowseEverything
o Gem to access cloud storage
o DropBox, Google Drive…
• User uploads files
• In UI selects collection
and manifest to ingest
• Everything handled
server side in
background
• Can view status in UI
19.
20.
21. Outstanding Issues
• Online storage
o Dropbox type storage size limits
• Creating spreadsheet less easy than directory structure
• Possible solutions
o Provide online storage
o Has to be per user
o Generate required manifest from uploaded directory structure