1) The document discusses a practical course on digital preservation tools for repository managers presented by the KeepIt project.
2) The course covers organizational issues, costs, description standards, and preservation workflow tools like EPrints and Plato.
3) Module 4 focuses on format management, risk assessment, storage, and linking preservation planning with tools like EPrints and Plato.
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Digital Preservation Tools for Repository Managers
1. Digital Preservation Tools for
Repository Managers
A practical course in five parts
presented by the KeepIt project
in association with
Module 4, Putting storage, format management and preservation planning in the repository
University of Southampton, 18-19 March 2010
Twitter hashtag #dprc(digital preservation repository course)
2. Course structure
• Module 1. Organisational issues
Scoping, selection, assessment, institutional parameters (19
January)
• Module 2. CostsLifecycle costs for managing digital objects, based
on the LIFE approach, and institutional costs (5 February)
• Module 3. Description Describing content for preservation:
provenance, significant properties and preservation metadata (2
March)
• Module 4. Preservation workflow tools available in
EPrints for format management, risk assessment
and storage, and linked to the Plato planning tool
from Planets (TODAY)
• Module 5. Trust (by others) of the repository’s approach to
preservation; trust (by the repository) of the tools and services it
chooses (30th March)
3. Tools this module
• Eprints preservation apps, including the
storage controller, Dave Tarrant and
Adam Field, University of Southampton
• Plato, preservation planning tool from the
Planets project, Andreas Rauber and
HannesKulovits, TU Wien
4. Steve Jobs launches Apple iPad
Picture by curiouslee http://www.flickr.com/photos/curiouslee/4320074421/
5. Steve Jobs launches Apple iPad
“75 million people
already own iPod
Touches and
iPhones. That's all
people who
already know how
to use the iPad.”
Picture by curiouslee http://www.flickr.com/photos/curiouslee/4320074421/
7. Preservation workflow
Check Analyse Action
•Format Preservation planning •Migration
identification, version Characterisation: • Emulation
ing Significant properties and • Storage selection
• File validation technical
• Virus check characteristics, provenance, for
• Bit checking and mat, risk factors
checksum calculation
Risk analysis
Tools
Tools
e.g. DROID
Plato (Planets)
JHOVE
PRONOM (TNA)
FITS
P2 risk registry (KeepIt)
INFORM (U Illinois)
8. Format risks
1000 Ubiquity: degree of adoption of the format
1001 Support: number of tools available which can access the format
1002 Disclosure: extent to which the format documentation is publicly
disclosed
1003 Document Quality: completeness of the available documentation
1004 Stability: speed and backwards-compatibility of version change
1005 Ease of identification: ease with which the format can be identified
1006 Ease of validation: ease with which the format can be validated
1007Lossiness: does the format use lossy compression
1008 Intellectual property rights: whether or not the format is
encumbered by IPR
1009 Complexity: degree of content or behavioural complexity supported
From PRONOM documentation (The National Archives), July 2008
9. Format risks
Word vs PDF TIFF vs JPEG XML vs PDF
1000 Ubiquity 1 1 1
1001 Support 1 1
1002 Disclosure
1003 Document Quality
1004 Stability 1 1
1005 Ease of
identification
1006 Ease of validation 1 1
1007Lossiness 1 1
1008 Intellectual 1
property rights
1009 Complexity 1 1 1
The WINNER is PDF TIFF XML
10. A group task on format risks
1. Choose two formats to compare (e.g. Word vs PDF,
Word vs ODF, PDF vs XML, TIFF vs JPEG)
2. By working through the (surviving) list of format risks
select a winner (or a draw) between your chosen
formats for each risk category (1 point for win)
3. Total the scores to find an overall winning format
4. Suggest one reason why the winning
format using this method may not be
the one you would choose for your
repository
11. Some revision from KeepIt Module 3
• Preservation workflow
– Recognised we have digital objects with formats and other characteristics we
need to identify and record. These can change over time, or may need to be
changed pre-emptively depending on a risk assessment, using a preservation
action. Risk is subjective.
12. Some revision from KeepIt Module 3
• Preservation workflow
– Recognised we have digital objects with formats and other characteristics we
need to identify and record. These can change over time, or may need to be
changed pre-emptively depending on a risk assessment, using a preservation
action. Risk is subjective.
• Significant properties
13. InSPECT SP Assessment
Framework
•Builds on Gero’sFunction-Behaviour-Structure framework
•FBS developed to assist engineers/designers to create &
redesign artefacts
Three categories:
• Function: The design intention or purpose that is
performed.
• Behaviour: The epistemological outcome derived from the
function & structure obtained by the stakeholder
• Structure: The structural elements of the Object that
enables stakeholder to perform behaviour.
•Artefact construction is product of designated function.
•Behaviour is result of interaction between Function & Structure
13
14. Exercise overview
• Analyse the content of an email
• Analyse structure of email message
• Determine purpose that each technical
property performs
• Consider how email will be used by
stakeholders
• Identify set of expected behaviours
• Classify set of behaviours into functions for
recording
14
15.
Select object type Identify purpose of Determine expected Classify behaviours Associate structure
Analyse structure Review & finalise
for analysis technical properties behaviours into functions with each function
Behaviour Structure
subject
Determine expected behaviours Message text
• What activities would a user – any type of Line break
stakeholder – perform when using an
email? Paragraph
• Draw upon list of property descriptions underline
performed in the previous step, formal strikethrough
standards and specifications, or other Body background
information sources.
Body text colour
In-reply-to
Task 2: references
Message-id
Identify the type of actions that a user Trace-route
would be able to perform using the Sender display-name
email (Groups. 15 mins). Sender local-part
Sender domain-part
• E.g. Establish name of person who sent Recipient display-
email name
• E.g. May want to confirm that email Recipient local-part
Recipient domain-
originated from stated source. part
15
16. 1.3 cont. Categories of
properties
Five high-level categories
•Content e.g. character count
•Context e.g. date of creation
•Rendering e.g. bit depth
•Structure e.g. e-mail attachments
•Behaviour e.g. hyperlinks
16
17.
Select object type(s) Determine actual Classify behaviours into Assign acceptable
Identify stakeholder Cross-match functions Review & finalise
for analysis behaviours set of functions value boundaries
•Identify Stakeholders
•Creator – view, annotate
• Researcher corresponds during research with
colleagues, peers, administrators etc.
•Recipient – reuses content
• Student wants to understand research lifecycles by
studying real-world practice
•Custodian – evidential chain
• Maintains permanent email record for externally-
funded projects, alongside data and eprint outputs
17
18. Some revision from KeepIt Module 3
• Preservation workflow
– Recognised we have digital objects with formats and other characteristics we
need to identify and record. These can change over time, or may need to be
changed pre-emptively depending on a risk assessment, using a preservation
action. Risk is subjective.
• Significant properties
– We considered which characteristics might be significant using the function-
behaviour-structure (FBS) framework, and classifying the functions of
formatted emails
– We recognised that assessment of behaviour, and so of significance, can vary
according to the viewpoint of the stakeholder – e.g. creator, user, archivist
19. Some revision from KeepIt Module 3
• Preservation workflow
– Recognised we have digital objects with formats and other characteristics we
need to identify and record. These can change over time, or may need to be
changed pre-emptively depending on a risk assessment, using a preservation
action. Risk is subjective.
• Significant properties
– We considered which characteristics might be significant using the function-
behaviour-structure (FBS) framework, and classifying the functions of
formatted emails
– We recognised that assessment of behaviour, and so of significance, can vary
according to the viewpoint of the stakeholder – e.g. creator, user, archivist
• Documentation
– We looked at two means to document these characteristics, and the changes
over time
1. Broad and established (PREMIS)
2. Focussed, and work-in-progress (Open Provenance Model)
20. Some revision from KeepIt Module 3
• Preservation workflow
– Recognised we have digital objects with formats and other characteristics we
need to identify and record. These can change over time, or may need to be
changed pre-emptively depending on a risk assessment, using a preservation
action. Risk is subjective.
• Significant properties
– We considered which characteristics might be significant using the function-
behaviour-structure (FBS) framework, and classifying the functions of
formatted emails
– We recognised that assessment of behaviour, and so of significance, can vary
according to the viewpoint of the stakeholder – e.g. creator, user, archivist
• Documentation
– We looked at two means to document these characteristics, and the changes
over time
1. Broad and established (PREMIS)
2. Focussed, and work-in-progress (Open Provenance Model)
• Provenance in action: transmission and recording
21. Provenance: a numbers game
• Transmission: recording vs word-of-mouth
• Identifying what is significant about the information to be transmitted
• Can be self-correcting!
22. Some revision from KeepIt Module 3
• Preservation workflow
– Recognised we have digital objects with formats and other characteristics we
need to identify and record. These can change over time, or may need to be
changed pre-emptively depending on a risk assessment, using a preservation
action. Risk is subjective.
• Significant properties
– We considered which characteristics might be significant using the function-
behaviour-structure (FBS) framework, and classifying the functions of
formatted emails
– We recognised that assessment of behaviour, and so of significance, can vary
according to the viewpoint of the stakeholder – e.g. creator, user, archivist
• Documentation
– We looked at two means to document these characteristics, and the changes
over time
1. Broad and established (PREMIS)
2. Focussed, and work-in-progress (Open Provenance Model)
• Provenance in action: transmission and recording
– Through a simple game we learned that if we don’t recognise the necessary
properties at the outset, and maintain a record through all stages of
transmission, the information at the end of the chain will likely not be the
same as you started with