Exploring the Future Potential of AI-Enabled Smartphone Processors
Applying Repository Systems to Audiovisual Preservation
1. Applying Repository Systems to
Audiovisual Preservation
Jon W. Dunn, Indiana University Libraries
Karen Cariani, WGBH Media Library and Archives
#OR2017
3. Who we are: Indiana University
• 3 million+ special collections items
• Large focus on AV:
• Music and other performing arts
• Ethnomusicology, anthropology
• Public broadcasting stations
• Archival film collections
• Athletics
• Media Digitization and Preservation
Initiative:
• 300,000 AV items
• 25,000 reels of film
• 80 campus units + other IU campuses
• 27 PB by the year 2020
• 30+ TB per day peak
• http://mdpi.iu.edu/
4. Challenges of AV
• Large files
• Individually and in aggregate
• Multiple related files
• Much metadata
• Esoteric and ephemeral formats
• Physical and digital
• Lack of clear standards
• Especially for video and film
5. Storage Strategy: WGBH
• Difficult history with commercial
DAM and HSM system
• Issues of cost, capacity,
performance, network issues,
vendor lock-in
• Using LTFS-formatted LTO-6 tape
• HP LTO-6 Ultrium 6250 drives
6. Storage Strategy: Indiana Universty
• Nearline storage in university-
supported HSM environment
• IBM HPSS software
• Enterprise tape (IBM TS1140)
• Typically accessed via hsi tool
• Mirrored between Bloomington
and Indianapolis
• Centrally-funded
• Very fast research network
(10, 20, 40Gbit connections)
7. Need for a preservation repository
• Track preservation master files in local and external storage
• Connect metadata
• Descriptive, technical, process history, preservation
• Ensure fixity
• Regular fixity checking, logging
• Support retrieval/delivery of master files to authorized users
• Future: support file format migration
• We are separating concerns of preservation and access
8. HydraDAM1
• Developed by WGBH with previous support from NEH
• Based on Sufia and Fedora 3.x
• Focused on user self-deposit
• Adapted to add bulk ingest, bulk edit, characterization of files,
transcoding of proxies
• Limitations:
• Assumed full workflow pipeline for ingestion of A/V materials
• Processing performance problems
9. HydraDAM2: Goals
• Move to Fedora 4
• Develop Fedora 4 / Hydra content models for AV preservation
• Support multiple storage strategies: offline, online, nearline
• Integrate with access systems: Avalon, OpenVault
• January 2015 – December 2017
17. Pre-Ingest Steps (IU)
• Master file and metadata uploaded by Memnon or IU facility
• Manifest contents verified
• Files pushed to tape storage
• Checksums verified
• File characterization / technical metadata extraction
• Transcoding of derivatives for Avalon
• Files and metadata pushed to Avalon via Switchyard for access
• SIP created for ingest into Phydo
19. Apache Camel Routes
Asynchronous Storage Proxy
Rails application with AS UI gem
Local Tape
Storage
Services Large files
on Disk
Notify
Cloud
Storage
Services
Service
translation
blueprint
Service
translation
blueprint
Service
translation
blueprint
Asynchronous aware
user interface provides
interactions
Proxy provides API
with common
endpoints and
responses
Translations map
from common
API to specific
storage APIs
Should be able to
be an API-X
sharable service
Fedora 4 Asynchronous Storage: Proof of Concept
20. Fedora 4
RDF resource container
node
Non-RDF resource node
URL redirect
Asynchronous Interactions UI
Apache Camel Routes
Asynchronous Storage Proxy
Slow storage
service
Invoking from asynchronous interactions from Fedora 4 API
Redirecting node via
external-body MIME type;
can be set using Fedora 4
API and also via Hydra
Works file behaviors
The URL to redirect to would be
wherever the Asynchronous
Interactions UI is deployed,
immediately invoking interactions for a
unique identifier (preferably using
persistent URLs)
Access to redirecting nodes
via Fedora 4 API invokes
immediate redirect to stored
URL
32. Where We’re Going
• Continue development
• Rebuilding on Hyrax
• Build out WGBH storage implementation
• Additional user functionality
• Build out descriptive metadata / PBcore support
• Batch ingest
• Batch ingest
• Feed to/from Avalon Media System
• Pilot implementation
• Production implementation
Who are we? WGBH is Boston’s Public television station. We produce fully one third of the content broadcast on PBS, including the series you see here, as well as Downton Abbey and Sherlock. In addition to television, we have 2 radio stations and a large, award winning Interactive department that is the number one producer for the sites you’ll find on PBS.org. As you can see, we produce a wide variety of programming from public affairs, to history and science, to children’s program, arts, culture, drama and how to’s. We have been on the air since 1951 with radio and 1955 with television.
At heart and through our mission we are an educational and cultural institution. We originated out of a consortium of academic universities in the Boston area. Because we have produced so much we have a large archive of educational programming that is of interest to scholars and researchers, in addition to the public.