The Adventures of Digi: Ideas, Requirements and Reality
1. The Adventures of Digi:
Th Ad t f Di i
Ideas, Requirements
deas, equ e e ts
and Reality
David Pearson
National Library of
Australia
Future Perfect 2012
Digi
By Imogene Pearson (7 years)
(March 2012)
3. From a preservation point of view, the Library’s digital collections present:
• A mix of materials needing to be kept in perpetuity, along with materials that can be
discarded after specified periods or events;
• Mixed levels of complexity in terms of object structure, relationships and dependencies;
Mixed levels of complexity in terms of object structure, relationships and dependencies;
• Mixed levels of intellectual control;
• A wide range of file formats (and carrier formats);
• Different levels of complexity in preservation planning and processing;
• Different timetables for preservation action;
• A need for different preservation approaches, often at different scales; and
• A need for recurring – and possibly changing ‐ preservation action cycles over time, using a
changing suite of tools.
changing suite of tools
6. Ecology
Ecology or Layers of consciousness for the need for digital preservation intervention
(Given some need to access content over time)
Unaware:
• I am unaware if I have any digital content; or
• I am unaware if I may have a problem accessing any of my digital content
I am unaware if I may have a problem accessing any of my digital content.
Aware ‐ no response:
• I don’t think that I have a problem accessing any of my digital content;
p g y y g
• I recognise that I have a problem accessing some of my digital content;
• I recognise that I have a problem accessing some of my digital content. However, the problem
is not my problem; or
• II recognise that I have a problem, but have no response in place ‐ not even a limited one.
i th t I h bl b th i l t li it d
Aware – taking some action:
• I accept that I may have a problem accessing some of my digital content. I am taking limited
I accept that I may have a problem accessing some of my digital content. I am taking limited
actions to manage this problem; or
• I accept that I may have a problem accessing some of my digital content. The preservation
mandate is a part of my enterprise or system ecology.
9. Preservation responsibilities:
Preservation of the Library's digital collections involves three main goals:
• Maintaining access to reliable data at bit‐stream level;
• Maintaining access to content encoded in the bit streams; and
Maintaining access to content encoded in the bit streams; and
• Maintaining access to the intended and available meaning of the content.
While specific preservation activities may focus on one or more of these goals, the Library’s
p p y g y
preservation responsibility is only fulfilled when all three goals have been adequately addressed.
This responsibility applies across all digital collections, subject to curatorial and policy decisions
for specific groups of digital objects.
for specific groups of digital objects
10. Mission: The primary objective of preservation activities within the NLA is to maintain the
ability to meaningfully access digital collection content over time.
‘Logical on ‘Logical on
Physical Physical
Stuff’ Stuff’
A B
Contextual Dependency
Information – About Information – About
time Content Formats etc.
Systems to Ingest,
Systems to Ingest
Manage, Report and
take Actions
time
Systems to Access –
Master or Derivative
‘Stuffed?’ David Pearson 2012
Google Images
11. Required preservation processes
The Library must be able to:
• Understand what it holds in its collections;
• Understand what its preservation intentions are for every digital object and what it is entitled
Understand what its preservation intentions are for every digital object and what it is entitled
to do to realise its intentions;
• Understand what is required to provide access, existing inhibitors to access, and the current
level of support the Library is able to provide;
• Evaluate and monitor the degree of risk arising from collection composition, preservation
intentions and available level of support within the Library for digital collection content, and
monitor for risk conditions arising during general Digital Library operations;
• Anticipate the effects of changes in support;
p g pp
• Recognise planning triggers, and plan and take appropriate action on a scale appropriate to
the size of the target; and
• Audit the effectiveness of its preservation arrangements and modify the arrangements if
necessary.
necessary
12. Risk or ‘Risk‐on’ (are you a splitter or a lumper?)
• ‘parameter‐based’ risks: a match against a criterion defined by Library staff to indicate a
preservation risk – for example, video encoded with a codec considered to be problematic;
• ‘exception’ risks: the value of a monitored parameter is outside a set of acceptable values;
exception risks: the value of a monitored parameter is outside a set of acceptable values;
• ‘change’ risks: there has been a change in status for a monitored parameter for content – for
example, the confidence in format identification for a particular file has changed;
• ‘conflict’ risks: conflicting values for the parameter are reported by one or more tools – for
example, file format identification returns conflicting values;
l fil f t id tifi ti t fli ti l
• ‘unknown value’ risks: undetermined values for defined parameters – for example,
undetermined values for file format and version; and
• ‘access support’ risks: changes in level of support which affect the Library’s ability to access
pp g pp y y
to content in accordance with preservation intent and significance – for example, reduction
below an acceptable threshold in the availability of supporting software for a particular file
format.
• ‘content‐based’ risks: characteristics of content that may not be identifiable from metadata –
content based risks: characteristics of content that may not be identifiable from metadata
for example, presence of deprecated HTML tags.
13. Likely preservation treatment actions
Broad preservation action approaches that are likely to be required will include:
• Format migration at the point of collecting;
Format migration at the point of collecting;
• Format migration on recognition of risks;
• Format migration at the point of delivery;
• Emulation of various levels of software and hardware environments;
• Maintenance or supply of appropriate software or hardware;
• Documenting known problems for which no other action can be taken; and
• Deaccessioning or deletion.
15. Preservation intent – indicates the expectations for preservation for content:
• whether content is to be preserved;
• who is responsible for preservation of the content;
• the period over which content must be preserved;
the period over which content must be preserved;
• the required level of support for access to the content over time; for example, that the
Library intends to actively maintain the ability to both present and modify content, or only to
present content, or does not intend to actively maintain access to content beyond its
expected useful life.
t d f l lif
• Preservation intent may also extend to include more specific characteristics to be supported,
based on curatorial input or constraints imposed by rights policies or agreements with rights
p p y g p g g
holders.
20. Reference knowledgebases (General)
Enable staff to create, update and maintain reference information
E bl ff d d i i f i f i
knowledgebases on:
• File formats and versions
• Software and hardware components that support access to
file formats and versions, for maintaining access to managed
file formats and versions for maintaining access to managed
content; and
• The level of support available for particular file formats and
versions:
– i sets of software or hardware components available to
i. sets of software or hardware components available to
support access to formats;
– ii. functions supported, both for providing access to
content and for use in preservation action – for example,
presentation, modification, batch processing;
– iii. fidelity of support – how well functions are
supported; and
– iv. known risks, including potential inhibitors to
preservation, associated with formats or supporting
software or hardware.
f h d
• Preservation intent descriptions and parameters for sets of
content.
39. Prioritising preservation treatment based on level of support
Prioritising preservation treatment based on level of support
In evaluations of risk and prioritisation for preservation planning and action, we must take into
the Level of Support/Access Risks and:
• Any constraints imposed by rights policies or agreements; and
• The amount of resources available.
Based on these factors, the Library (Management, Collections and Digi Pres) should be able to
prioritise material to be preserved.
41. Options for preservation actions
We would like to be able to enable staff to:
• define types of preservation actions for use within preservation planning and evaluation.
define types of preservation actions for use within preservation planning and evaluation
• update and delete reference information on options for preservation action, both in general
and for particular formats or format types.
• link to information able from the software KB which provides information on what actions
specific software might be useful for and the proximity of the software to the format.
• Link to other linked data sources.
44. Preservation options evaluation
• support import and integration of preservation‐treated content and metadata, from either
internal or external processes, including:
– a. Verifying that preservation‐treated digital content conforms to acceptance criteria for
a. Verifying that preservation treated digital content conforms to acceptance criteria for
preservation outcomes for designated sets of digital content;
– b. Enabling staff to quality assure and approve preservation‐treated digital content for
incorporation into the collection; and
– c. After approval, send to preservation action scheduler for treatment of file/s, metadata
Aft l dt ti ti h d l f t t t f fil / t d t
and associated relationships.
• support ‘rollback’ of updated versions of content, metadata and associated relationships to
pp p p
restore previous versions, if necessary.
• enable staff to define and approve acceptance criteria for preservation action outcomes for
sets of managed content.
sets of managed content
45. 10.) So what!
10 ) So what!
Currently, these ideas and requirements
Currently, these ideas and requirements
have become ‘partially real’. They still need
to be implemented.
They formed the basis for the preservation
Th f d th b i f th ti
requirements in a subsequent:
• RFP (Request for Proposal) process; and
• RFT (Request for Tender) process.
q p
http://www.wildsound-filmmaking-feedback-events.com/images/austin_powers_dr_evil.jpg
46. RFP
So all of these ideas where consolidated as
requirements for a Request for Proposal which
q q p
went to the market in July 2011.
A number of responses were received for:
• C Core systems
t
• Preservation
• Digitisation
• Other Workflows
Other Workflows http://www.melbournesumos.com.au/pics/twister/Twister078.jpg
http://www melbournesumos com au/pics/twister/Twister078 jpg
These were evaluated and some of the vendors
were invited to participate in the next stage.
48. What version of reality
have we decided upon?
Everything, for Everyone
Forever
Digi
By Imogene Pearson (7 years)
(March 2012)
http://www.flickr.com/photos/ricksmit/15671245/