1. Goobi in the Wellcome Library
Digitisation Roadshow, Linz, Feb 2013
Dave Thompson
Digital Curator, Wellcome Library
2. Goobi in the Wellcome Library
• In production March 2012.
• 6 Servers running Goobi – test & production.
• 11 staff users, some part time.
• 1.2 million images processed & available via Library website.
• Can upload maximum of <1000 objects into SDB per 24 hrs.
• Total space allocated to Goobi is 40tb.
6. A strategic approach
• Library transformation strategy, physical to digital.
• From ‘project’ to ‘production’.
• Digitisation as a sustainable end-to-end process.
• 18 month pilot/implementation project.
• Just taken into production.
7. Diverse sources of content
• In-house digitisation.
• External contractors.
• Contractors working in-house.
• External organisations digitising their content for
us.
8. Where did Goobi come from?
• Late 2010 early 2011 as plans for developing SDB
grew realised that we needed a means of mass
import of digital content.
• Began to think about high volume production &
the management of that.
• Early modelling of our systems suggested that we
needed a tool to manage production of content.
• Began looking at workflow tracking systems.
10. Perceived benefits of Goobi
• Web based distributed access to concurrent
users.
• Flexible workflow based processing, managed
through ‘Projects’.
• Workflow process enforced, ensures accuracy &
efficiency.
• Adaptable to different types of content.
• Initiates & manages esternal processes via
Intranda task manager (ITM).
• METS as basis of access & access control.
11. Rapid evolution of Goobi
• Goobi we have now quite different to what we
bought.
• Initial configuration to import MARC XML DMD &
to automate ingest into SDB.
• Initially Goobi didn’t scale to met our ambition.
• Initial install monolithic, now running Goobi as
distributed services.
• Developed new features with Intranda, e.g.
Jpylyzation.
12. Working with DMD
• Upload MARC XML DMD exported from Sierra
using standard Goobi features.
• MARC fields edited to provide a consistent Goobi
process title, e.g. using shelf mark.
• MARC Leader 6 field identifies content type, e.g
‘Archive’ or ‘Monograph’.
• Content ‘type’ used by Goobi to set default METS
access conditions.
• DMD not delivered to end user, that comes from
live catalogue.
13. Uploading content
• Content upload using the Sync2Goobi Tool for
bulk import.
• Drag ‘n drop interface.
• Can be either TIFF or JP2.
• Project based workflow templates manage either
format.
• Use Goobi Mount Tool (GMT) to access/manage
content already uploaded.
14. Using METS Editor
• Main point of human interaction with Goobi. Goobi
automates METS creation.
• METS basis for access control & usage conditions
for material.
• Basis for retrieval of content from SDB by using
SDB PUIDs.
• Goobi automates ingest of content into SDB &
receives AMD in return.
15. How we use METS
• Setting material type & default values for access
based on DMD.
• Access restrictions can be at the item level.
• DMD in METS not delivered to end user, serves
only to help a human identify content when
snagging.
16.
17. Shared development
• Wellcome Trust is not a development house. Rely
on Intranda to provide development support.
• Developed specifc requirememnts for extensions
to Goobi, e.g. Jpylyser for JPEG2000 validation.
• Development proposals from both sides. We have
idea, Intranda helps us make that idea a reality.
• Benefit from community developments
commissioned by others.
18. Additional Tools
• Lurawave for converting TIFF to JPEG2000.
• Jpylyzer for validating JPEG2000 files.
• Sync2Goobi Tool for bulk upload of content.
• Goobi Mount Tool/MS Windows File Explorer for
access to ‘Home’ folders.
19. Goobi – the future
• Built in OCR & creation of ALTO files.
• Further refinement of Sync2Goobi Tool.
• Further development/integration of validation
tools.
• Integration of ftp with Goobi for 3rd party direct
upload of content.
• Establishment of separate database server for
Goobi.
20. Lessons learned - systems
• We were ambitious but underestimated what
capacity we would require.
• Underestimated storage requirements.
• Underestimated the desirability of high levels of
automation.
• Focus human interaction at as few points as
possible.
21. Lessons learned - Intranda
• Have relied heavily on input & support from
Intranda.
• Share information with Intranda & trust them to
provide answers.
• Be prepared to share development. But be
prepared to accept some pain.
22. Lessons learned - Goobi
• In less than a year Goobi has become key to
delivering the Library’s content.
• Centralised user activities in one system – Goobi
– less to learn, more efficient.
• Streamline & automate. High volume efficient
production essential.
• Streamline other digitisation & access processes
to match Goobi.
• METS an efficient single place for access related
metadata.
23. Thank you
Questions now, questions later…?
Dave Thompson, Digital Curator
Wellcome Library
d.thompson@wellcome.ac.uk
http://wellcomelibrary.org/