SlideShare une entreprise Scribd logo
1  sur  47
P res erva tio n P la nning :
 Choosing a suitable preservation
            approach
          Long-term Archiving P erspectives of
         E uropean Union P ublications meeting
   Office for Official Publications of the European Communities
               Luxembourg, November 10-11, 2011



Gareth Knight
Centre for e-Research
Preservation Objectives
Authentic - it is what it                    Understandability – what does
    purports to be                              this information mean?


                                                        Content
                                                        preservation



                                                          Bitstream
                                                         preservation




              Priscilla Caplan's revised Preservation Pyramid
Identity
   • The exact sameness of things.
   • Leibniz's law indicates that 2 items that share
     common attributes are not only similar, but are the
     same thing
   • Can two things be the same? “ultimately nothing is
     the s ame as something else” (Paskin, 2003)                  A painting of Leibniz

Questions:
   • Both images are a pictorial representation of Leibniz
       • Image A is constructed using paint on a canvas
       • Image B is constructed as 0s and 1s
   • Do they share the same identity?
   • Is it necessary for all object attribute to be same, or is
     it acceptable to have some degree of granularity?
   • How much is identity based upon ability to measure
     attributes?

                                                           Scanned copy of painting
Integrity
Is integrity maintained = Yes/No
• Linked to notions of consistency, wholeness and truth
• There has not been deliberate or accidental damage/change
  that has caused meaning to be altered or lost, in part or
  entirety.
• Checksum algorithm applied to a file generates a distinct
  (possibly unique) alphanumeric value




• Commonly used to check for accidental/deliberate data
  change/corruption
   • Generate checksum on October 1st
   • Generate checksum on October 14th & compare to Oct 1st value –
     are they the same? Y E S /N O
Is Integrity maintained
                          = 0- 100%
If one chunk became corrupted, the hashes for other chunks,
which hadn't changed, could be used to prove its integrity.

P iec ew is e ha s hing :
•divides an input file into sections and checksums each chunk
separately.
•Intended to measure integrity of disk images (dcfldd).
• However, Insert or delete changes all subsequent hashes

•R o lling ha s h:
Looks at each point of file in semi-random order
Depends only on last few bytes
Example of Piecewise hashing (1)
                   19e33h213a7865b2b664348b




                   ea3fe191227a4eg933bc41ge




                  2d839db2996b412e84h77a33



                   872e73ab867c883e7391ae65
Example of Piecewise hashing (2)
                   19e33h213a7865b2b664348b
                            SAME!


                   ea3fe191227a4eg933bc41ge
                            SAME!


                   a73921e173c94e8232fa91bb
                      DIFFERENT TEXT


                   7894af8211c12bb123ah9912
                       INCOMPLETE
Renderability
Data Interpretation in practice
OAIS Reference Model




NAA Performance Model




                                                    =
          +              +        +



   data       computer       OS       application       information
                                                          content
Information Object
                      Information Properties
Some definitions:
  • Information P roperty/ D escription:
                          IP
     • A description of part of the information
       content (OAIS RM v2, 2009)
  • P roperty:
     • An abstract attribute, trait or peculiarity
       suitable for describing preservation
       objects, actions or environments
       (Dappert, 2009)

Observations:
  • No interpretation of significance –
    merely exists
  • May be held in different locations and
    different levels of detail
Information Property categories (1)

Rothenberg & Bikson (1999) identify five types of
Information Property:
  • C ontent: the author’s intellectual work, e.g. text, still image,
    audio waveform, etc.
  • C ontext: Information that affects the content’s intended
    meaning and establishes its provenance
  • Appearance: Information that contributes to the recreation of
    the performance, e.g. font type/colour/size, bit depth
  • S tructure: Relationship between 2+ types of content, e.g. e-
    mail attachments, internal hyperlinks
  • Behaviour: information that establishes how content interacts
    with the user, or other objects or components, e.g. hyperlink
    handling

                                    http://www.panix.com/~jeffr/Prof/digilong.html
Context


 Content     Image & Text
                 link

                            Content and
                             Context?       Structure
Appearance


                                          Behaviour
Information Property categories (2)
PLANETS Digital Object Properties WP use different
classification based upon ability to identify:
•E x tra c ta ble properties :
   • Properties that can be extracted from or calculated
     on the fly, e.g. file size, image dimensions, MD
•O bs erva tiona l properties :
   • Can only be determined by human observation, e.g.
     licence restriction(?)
•P erform a nc e P ro perties :
   • Properties that emerge through combination of HW,
     SW & Data Object
Source: PLANETS Digital Object Properties WG
Performance
              Observational Property     Property
Extractable
information
Preservation Metadata: Documenting
    the technical encoding and
         intellectual content
PREMIS


                              • "things that most working repositories are
                                likely to need to know in order to support
                                digital preservation“
                              • Core metadata that defines “viability,
                                renderability, understandability,
                                authenticity, and identity in a preservation
                                context"
                              What metadata assists with rendering?
                              •   Format
                              •   Size
                              •   Fixity
                              •   Creating Application: Name, version, date
PREMIS DD 1.0 (May 2005)          data was created
PREMIS DD 2.0 (March 2008)
                              •   Inhibitors: Features intended to inhibit
                                  access, use, or migration.
Technical Metadata for still images




                            http://www.flickr.com/photos/k4chii/200303113/

                      Standards: Z39.87, MIX
                      and others
                      Information on
                         •Image characteristics
                         •Encoding scheme
                         •Metadata
Document MD

    Applicable to formats that are primarily text, allow choice of font,
    support embedded multimedia & page layouts

    Example elements
       
           Page Count
       
           Word Count
       
           Character Count
       
           Paragraph Count
       
           Line count
       
           Table Count
       
           Graphics Count
       
           Language
       
           Fonts (list of each font in document)
       
           Features (additional document features, e.g. hasTransparency,
           hasOutline, hasAnnotation)
Third party services: Representation
             Information Registries
•Require trusted third party
services capable of identifying
formats
  • PRONOM, UDFR


•Providing information on
rendering data
  • OpenWith, various RI services
Preserving your object across
   changing technologies
Change in process over time
SOURCE                            PROCESS                      PERFORMANCE
                              Intel PC, 2000


                             +               +                  =
                             Mac laptop, 2006


                             +               +                  =

                        X64 Ubuntu laptop, 2010


                             +               +                  =
                                 operating        software          information
                  hardware
                                  system         application          content

          Potential for changing to ‘Performance’ over time
Change is a necessity… and a risk
“traditionally, preserving things meant keeping them unchanged; however
… if we hold on to digital information without modifications, accessing the
information will become increasingly more difficult, if not impossible.”
(Su-Shing Chen, 2001)

“The fundamental challenge of digital preservation is to preserve the
accessibility and authenticity of digital objects over time and domains, and
across changing technical environments” (Wilson, 2008)
Authenticity
Authenticity
“the degree to which a person
(or system) may regard an
object as what it is purported to
be”
(OAIS RM v2)


Questions:
•How do you distinguish the
authentic original from the
imitators?
•What is authenticity in the digital
realm?                                       Which is the real Elvis?
                                       Img src: http://www.flickr.com/photos/mymollypop/2904798835/
                                       http://www.flickr.com/photos/blahflowers/3827096787/
                                       © 1973, Elvis Presley Enterprises, Inc. and RCA Records
                                       http://en.wikipedia.org/wiki/File:ElvisPresleyAlohafromHawaii.jpg
What do we need to keep for information
              Object to be authentic?
“Understanding, defining and assessing the individual
properties… important.. for informing decisions about which
characteristics of that object should be preserved over time,
in circumstances where it is not possible, for reasons such as
cost, practicality or technical constraints, to preserve all the
elements of that object”
(Montague et al. The Concept of Significant Properties. 2010)

“Unless such properties can be defined in a rigorous and
measurable manner, cultural memory institutions have no
objective framework for identifying, implementing, and
validating appropriate preservation strategies, nor for
asserting the continued authenticity of their digital collections”
(Dappert, 2009)
Acceptable Vs Unacceptable change

•Easy to identify when preservation gone wrong, but how do you
decide when it goes right?
   • Interpretation is a value judgement – often influenced by different
     criteria
   • Uncertainty on level that evaluation should be performed – technical
     encoding, object type (e.g. still image), object sub-type (e.g. business
     document, research paper)
   • How do you measure attributes that are considered significant?
       • Technical properties may vary between formats
       • Observational properties require manual identification
Planning your strategy; strategising your plan

  • P res erva tio n P la n:
    defines a series of preservation actions to be taken
    by a responsible institution due to an identified risk
    for a given set of digital objects or records”
   http://www.dlib.org/dlib/november09/kulovits/11kulovits.html



  • P res erva tio n s tra teg y
    indicates commitment to preservation and high-level
    approach adopted – organisational mission, applied
    principles (e.g. use lifecycle approach), sequence of
    actions (immediate, medium term, long-term), risk
    management
Why develop a preservation plan?
Assists decision-making process
            •   Evaluate different strategies
            •   Evaluate different tools
Determine which is the most effective approach for your needs
• Transparency of operation – enable others to view and
  understand approach adopted – inspire confidence and trust
• Provide evidence of decision-making – decisions may be
  questioned. How do you prove that approach taken was
  appropriate for circumstances?
Evaluation frameworks
Various approaches may be adopted to develop preservation plan:
•Produce internal decision tree
   • Fit intrinsic needs of organisation, but requires staff time to develop &
     may be limiting when considering new approaches
•Perform informal “bottom-up” object analysis & develop bespoke
plan
   • Fit requirements of object type, but may be time intensive to produce
     & may be incompatible with broader policies
•Adopt 3rd party standardised plan (aka copy and paste)
   • Adopting existing plan saves time, but may be inappropriate for
     context
•Use analysis frameworks and toolkits
   • Structured process by which organisation can identify objectives &
     develop plan to address them
      • DRAMBORA/DIRKS – analyse environment & practices, identify risks and
        brainstorm methods of mitigating or avoiding them
      • Data Asset Framework – identify data held, assess management practices & make
        recommendations for improvement
      • PLANETS Preservation Planning –define requirements, evaluate alternative
        approaches, analyse and compare results, recommend preferred approach, and
        develop plan
Preservation Planning workflow

•Developed as part of DELOS
project & adopted by PLANETS
Consortium
•Conforms to the ‘General COTS
(Commercial-Off-The-Shelf)
selection process (GCS)
•Abstract steps: Define criteria,
Search for products, Create
shortlist, Evaluate candidates,
Analyze data & Select product
•Uses utility analysis approach
PLANETS Planning workflow




        http://olymp.ifs.tuwien.ac.at:8080/plato/
Define Requirements:
              Factors to consider
•Identify & analyse environment in which
decisions are made (e.g. assumptions &
constraints) to determine context:
  • Organisational/dept objectives (e.g. mission
    statement, mandate)
  • National/local policy framework (e.g. acquisition,
    legal framework)
  • Codes of practice
  • Financial limitations – what can you afford?
  • Object types to be maintained
  • Expertise & needs of key stakeholders, e.g.
    Designated Community
Whose views do you need to take into
              account?
D ig ita l a rc hive pers pec tive
  • General trend to simplify object to make it (speculatively) easier to
    manage in future:
     • Reduce cost of preservation process
     • Limit risk that accessibility/preservation issues will emerge
     • Increase number of preservation options available
C rea to r pers pec tive
  • Author intent difficult to establish
  • Differs for each object – do you seek to treat each object individually
    or identify broad classes?
  • When do you ask them? On creation, after 5 years? May have
    different views on value.
U s er pers pec tive
  • How do you analyse interpretation of current user community?
  • How do you predict needs of future users?
InSPECT Requirements Analysis
            Framework (2008)
• Adopted a design method used to assist engineers &
  designers to create & re-design artefacts
• Based upon theory that artefact construction is a product
  of designated function(s)
• Assessment upon two philosophical approaches:
   1. Teleology: study of design and purpose of object – why was
      it created?
   2. Epistemology: Understand meaning and process by which
      knowledge is acquired
• In combination, these encourage evaluation of context of
  creation and information needed to communicate intrinsic
  knowledge to a new audience (designated community)
Requirements Analysis activities
S tep 1: O bjec t A na lys is
Interpret context of creation:
1. Analyse object to find out what it contains
2. Identify original audience and functions that object was created to
      perform
3. Determine info. properties necessary to achieve each function

S tep 2: S ta k eholder A na lys is
Determine future requirements of digital object
1. Identify Stakeholders that will use object
2. Determine function set they may perform when using object
3. Identify quality thresholds for each information property that must be
     met to allow each function to be achieved – what is acceptable loss?
Define Requirements:
       PLANETS Requirement Categories
•   Produce list of criteria that will be used to evaluate diff. preservation
    strategies in specific domain
•   May take top-down (organisational) or top-down (object) approach
•   PLANETS identify four groups of characteristic to be evaluated:

    1. O bject: Attributes of information content itself, e.g. behaviour, context
    2. R ecord: Attributes of record including context, relationships & MD -
       potential overlap with Obj in some cases
    3. P rocess : Attributes of preservation process, e.g. processing speed,
       usability of tool, ability to batch process, etc.
    4. C os t: Set-up of process, cost per object, H/W & S/W, personnel

•   Non-prescriptive - evaluator may identify further top-level & sub-
    categories or ignore existing criteria (e.g. technical characteristics for
    format evaluation)
•   May be expressed as spreadsheet, list, mind-map, post-it notes & other
    forms
Record requirements as Evaluation Tree

•Set of requirements may be
expressed as mind map,
spreadsheet, or other form
•Define structure of
evaluation process, grouping
similar items together
•Assign a measurement
value to each ‘leaf’
  • Objective measure: E.g.
    colour depth, duration
  • Subjective measure:
    Acceptable variance,
Define Requirements:
              Measure each criterion
•Assign a measurement value to each ‘leaf’
•Objective measures:
  • Unambiguous, automated (possibly), E.g. seconds to process
    object, colour depth, cost value
•Subjective measures:
  • Acceptable, but often require manual evaluation, e.g. degree
    of format support
•Type of scale
  • Numeric measure (e.g. 15 bit)
  • Boolean (Yes/No)
  • Controlled vocab
  (e.g. Yes/Acceptable/No)
  • Ordinal numbers (controlled list)
  • Subjective criteria (0-5)
Objective tree for web sites
Define Alternatives

• On basis of object type and expressed
  requirements, what strategies are feasible?
• Many different approaches available, e.g. TIFF
  images could undergo following actions:
   •   Format conversion to JPG2k
   •   Format conversion to PNG (to save space)
   •   Format conversion to PDF (though would not recommend)
   •   Emulation/virtual machine
   •   Do nothing!
• For each alternative strategy, may wish to define:
   • Tool to be tested (e.g. name, version, OS)
   • Configuration parameters
   • Function to be tested
Trial the preservation approaches

Develop a set of experiments to trial the
 preservation approach
     
         Define workflow
     
         Select representative test files
     
         Perform evaluation
     
         Evaluate the outcome according to
         your objective tree
           
               Were there undesired/unexpected
               results?
PLATO conversion tool/format comparison




 Definition of alternative approaches to preserve GIF image (conversion to alt.
     formats) and identification of tool services available to perform action
Compare results
Require common basis for comparing different strategies
N o rm a lis e dis pa ra te res ults

    Each evaluation factor is measured differently (Y/N, cost, speed
    of conversion)

    Can make them comparable by converting them to a uniform
    scale
S et I m porta nt Fa c to rs

    Not all assessment criteria is equal – do you wish to prioritise
    specific reqs. (e.g. scalability, cost)
C om pa re outc o m es & s elec t m os t a ppropria te
   pres erva tion s tra teg y
Conclusions

Preservation is an iterative process – must climb many
  steps to reach the top of the pyramid
Preservation Planning enables organisation to
  understand and document their requirements
Demonstrate decision making – inspires confidence &
  trust
Not a perform once, forget process. Must be repeated
Discussion points
• Are traditional checksum techniques acceptable
  for measuring integrity, or do we need a more
  granular approach?

• How should we utilise & build upon third party
  services, such as RI Registries & preservation
  plan tools, to achieve our preservation
  objectives?

• What would a preservation plan for our scanned
  images, documents, metadata look like?
Thank You for your attention




          QUESTIONS?

            Gareth Knight
       gareth.knight@kcl.ac.uk

Contenu connexe

En vedette

Conference Engineering mechanics 2007
Conference Engineering mechanics 2007Conference Engineering mechanics 2007
Conference Engineering mechanics 2007Jaroslav Broz
 
Seminary of numerical analysis 2010
Seminary of numerical analysis 2010Seminary of numerical analysis 2010
Seminary of numerical analysis 2010Jaroslav Broz
 
Workshop 4 audiovisual digital preservation strategy
Workshop 4 audiovisual digital preservation strategyWorkshop 4 audiovisual digital preservation strategy
Workshop 4 audiovisual digital preservation strategyRichard Wright
 
Basic Principles of Digitisation
Basic Principles of DigitisationBasic Principles of Digitisation
Basic Principles of DigitisationRichard Wright
 
Digitisation
Digitisation Digitisation
Digitisation L-Monk
 
20yrs: 2004 jisc cni-brighton
20yrs: 2004 jisc cni-brighton20yrs: 2004 jisc cni-brighton
20yrs: 2004 jisc cni-brightonNeil Beagrie
 
Brief Introduction to Digital Preservation
Brief Introduction to Digital PreservationBrief Introduction to Digital Preservation
Brief Introduction to Digital PreservationMichael Day
 

En vedette (8)

Conference Engineering mechanics 2007
Conference Engineering mechanics 2007Conference Engineering mechanics 2007
Conference Engineering mechanics 2007
 
Seminary of numerical analysis 2010
Seminary of numerical analysis 2010Seminary of numerical analysis 2010
Seminary of numerical analysis 2010
 
PhD defence
PhD defencePhD defence
PhD defence
 
Workshop 4 audiovisual digital preservation strategy
Workshop 4 audiovisual digital preservation strategyWorkshop 4 audiovisual digital preservation strategy
Workshop 4 audiovisual digital preservation strategy
 
Basic Principles of Digitisation
Basic Principles of DigitisationBasic Principles of Digitisation
Basic Principles of Digitisation
 
Digitisation
Digitisation Digitisation
Digitisation
 
20yrs: 2004 jisc cni-brighton
20yrs: 2004 jisc cni-brighton20yrs: 2004 jisc cni-brighton
20yrs: 2004 jisc cni-brighton
 
Brief Introduction to Digital Preservation
Brief Introduction to Digital PreservationBrief Introduction to Digital Preservation
Brief Introduction to Digital Preservation
 

Similaire à Preservation Planning: Choosing a suitable digital preservation strategy

Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Recordspbajcsy
 
Establishing the significant properties of digital research
Establishing the significant properties of digital researchEstablishing the significant properties of digital research
Establishing the significant properties of digital researchGarethKnight
 
Spectrum Scale - Cognitive
Spectrum Scale - CognitiveSpectrum Scale - Cognitive
Spectrum Scale - CognitiveSmita Raut
 
High Availability HPC ~ Microservice Architectures for Supercomputing
High Availability HPC ~ Microservice Architectures for SupercomputingHigh Availability HPC ~ Microservice Architectures for Supercomputing
High Availability HPC ~ Microservice Architectures for Supercomputinginside-BigData.com
 
Introducing Significant Properties (SPs part 1), by Stephen Grace and Gareth ...
Introducing Significant Properties (SPs part 1), by Stephen Grace and Gareth ...Introducing Significant Properties (SPs part 1), by Stephen Grace and Gareth ...
Introducing Significant Properties (SPs part 1), by Stephen Grace and Gareth ...JISC KeepIt project
 
ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides DuraSpace
 
DACS - The Internet of Things (IoT)
DACS - The Internet of Things (IoT)DACS - The Internet of Things (IoT)
DACS - The Internet of Things (IoT)Steve Posick
 
Multimedia Database
Multimedia Database Multimedia Database
Multimedia Database Avnish Patel
 
Agile Data Rationalization for Operational Intelligence
Agile Data Rationalization for Operational IntelligenceAgile Data Rationalization for Operational Intelligence
Agile Data Rationalization for Operational IntelligenceInside Analysis
 
Relational
RelationalRelational
Relationaldieover
 
Future Cities Conference´13 / Peter Steenkiste - "The eXpressive Internet Arc...
Future Cities Conference´13 / Peter Steenkiste - "The eXpressive Internet Arc...Future Cities Conference´13 / Peter Steenkiste - "The eXpressive Internet Arc...
Future Cities Conference´13 / Peter Steenkiste - "The eXpressive Internet Arc...Future Cities Project
 
Entity framework introduction sesion-1
Entity framework introduction   sesion-1Entity framework introduction   sesion-1
Entity framework introduction sesion-1Usama Nada
 

Similaire à Preservation Planning: Choosing a suitable digital preservation strategy (20)

Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Records
 
Establishing the significant properties of digital research
Establishing the significant properties of digital researchEstablishing the significant properties of digital research
Establishing the significant properties of digital research
 
I say emulate
I say emulateI say emulate
I say emulate
 
Infos4
Infos4Infos4
Infos4
 
Spectrum Scale - Cognitive
Spectrum Scale - CognitiveSpectrum Scale - Cognitive
Spectrum Scale - Cognitive
 
NISO Webinar: Metadata for Preservation: A Digital Object's Best Friend
NISO Webinar: Metadata for Preservation: A Digital Object's Best Friend NISO Webinar: Metadata for Preservation: A Digital Object's Best Friend
NISO Webinar: Metadata for Preservation: A Digital Object's Best Friend
 
Metadata For Preservation Delos
Metadata For Preservation DelosMetadata For Preservation Delos
Metadata For Preservation Delos
 
Database Management
Database ManagementDatabase Management
Database Management
 
High Availability HPC ~ Microservice Architectures for Supercomputing
High Availability HPC ~ Microservice Architectures for SupercomputingHigh Availability HPC ~ Microservice Architectures for Supercomputing
High Availability HPC ~ Microservice Architectures for Supercomputing
 
Introducing Significant Properties (SPs part 1), by Stephen Grace and Gareth ...
Introducing Significant Properties (SPs part 1), by Stephen Grace and Gareth ...Introducing Significant Properties (SPs part 1), by Stephen Grace and Gareth ...
Introducing Significant Properties (SPs part 1), by Stephen Grace and Gareth ...
 
ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides
 
DACS - The Internet of Things (IoT)
DACS - The Internet of Things (IoT)DACS - The Internet of Things (IoT)
DACS - The Internet of Things (IoT)
 
Multimedia Database
Multimedia Database Multimedia Database
Multimedia Database
 
Electronic Records
Electronic RecordsElectronic Records
Electronic Records
 
Agile Data Rationalization for Operational Intelligence
Agile Data Rationalization for Operational IntelligenceAgile Data Rationalization for Operational Intelligence
Agile Data Rationalization for Operational Intelligence
 
Relational
RelationalRelational
Relational
 
報告
報告報告
報告
 
Future Cities Conference´13 / Peter Steenkiste - "The eXpressive Internet Arc...
Future Cities Conference´13 / Peter Steenkiste - "The eXpressive Internet Arc...Future Cities Conference´13 / Peter Steenkiste - "The eXpressive Internet Arc...
Future Cities Conference´13 / Peter Steenkiste - "The eXpressive Internet Arc...
 
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
 
Entity framework introduction sesion-1
Entity framework introduction   sesion-1Entity framework introduction   sesion-1
Entity framework introduction sesion-1
 

Plus de GarethKnight

Supporting Open Science in Research
Supporting Open Science in ResearchSupporting Open Science in Research
Supporting Open Science in ResearchGarethKnight
 
Making Sense of a Digital Collection
Making Sense of a Digital CollectionMaking Sense of a Digital Collection
Making Sense of a Digital CollectionGarethKnight
 
Building Sustainability: Preserving research data without breaking the bank
Building Sustainability: Preserving research data without breaking the bankBuilding Sustainability: Preserving research data without breaking the bank
Building Sustainability: Preserving research data without breaking the bankGarethKnight
 
GIS: A project by project prospective
GIS: A project by project prospectiveGIS: A project by project prospective
GIS: A project by project prospectiveGarethKnight
 
Complying with EPSRC policy: An LSHTM case study
Complying with EPSRC policy: An LSHTM case studyComplying with EPSRC policy: An LSHTM case study
Complying with EPSRC policy: An LSHTM case studyGarethKnight
 
Data Management for Librarians: An Introduction
Data Management for Librarians: An IntroductionData Management for Librarians: An Introduction
Data Management for Librarians: An IntroductionGarethKnight
 
Challenges in setting up an RDM Support Service
Challenges in setting up an RDM Support ServiceChallenges in setting up an RDM Support Service
Challenges in setting up an RDM Support ServiceGarethKnight
 
Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...GarethKnight
 
Doing research better: The role of meta‐data
Doing research better: The role of meta‐dataDoing research better: The role of meta‐data
Doing research better: The role of meta‐dataGarethKnight
 
Laying the Foundation: Establishing an institutional RDM Support Service for ...
Laying the Foundation: Establishing an institutional RDM Support Service for ...Laying the Foundation: Establishing an institutional RDM Support Service for ...
Laying the Foundation: Establishing an institutional RDM Support Service for ...GarethKnight
 
Watching the Detectives: Using digital forensics techniques to investigate th...
Watching the Detectives: Using digital forensics techniques to investigate th...Watching the Detectives: Using digital forensics techniques to investigate th...
Watching the Detectives: Using digital forensics techniques to investigate th...GarethKnight
 
Introduction to digital curation
Introduction to digital curationIntroduction to digital curation
Introduction to digital curationGarethKnight
 
Digital Forensics in the Archive
Digital Forensics in the ArchiveDigital Forensics in the Archive
Digital Forensics in the ArchiveGarethKnight
 
Keep Calm and Curate
Keep Calm and CurateKeep Calm and Curate
Keep Calm and CurateGarethKnight
 
Same as it ever was? Significant Properties and the preservation of meaning o...
Same as it ever was? Significant Properties and the preservation of meaning o...Same as it ever was? Significant Properties and the preservation of meaning o...
Same as it ever was? Significant Properties and the preservation of meaning o...GarethKnight
 
Who Decides? Reinterpreting archival processes for the management of digital ...
Who Decides? Reinterpreting archival processes for the management of digital ...Who Decides? Reinterpreting archival processes for the management of digital ...
Who Decides? Reinterpreting archival processes for the management of digital ...GarethKnight
 

Plus de GarethKnight (16)

Supporting Open Science in Research
Supporting Open Science in ResearchSupporting Open Science in Research
Supporting Open Science in Research
 
Making Sense of a Digital Collection
Making Sense of a Digital CollectionMaking Sense of a Digital Collection
Making Sense of a Digital Collection
 
Building Sustainability: Preserving research data without breaking the bank
Building Sustainability: Preserving research data without breaking the bankBuilding Sustainability: Preserving research data without breaking the bank
Building Sustainability: Preserving research data without breaking the bank
 
GIS: A project by project prospective
GIS: A project by project prospectiveGIS: A project by project prospective
GIS: A project by project prospective
 
Complying with EPSRC policy: An LSHTM case study
Complying with EPSRC policy: An LSHTM case studyComplying with EPSRC policy: An LSHTM case study
Complying with EPSRC policy: An LSHTM case study
 
Data Management for Librarians: An Introduction
Data Management for Librarians: An IntroductionData Management for Librarians: An Introduction
Data Management for Librarians: An Introduction
 
Challenges in setting up an RDM Support Service
Challenges in setting up an RDM Support ServiceChallenges in setting up an RDM Support Service
Challenges in setting up an RDM Support Service
 
Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...
 
Doing research better: The role of meta‐data
Doing research better: The role of meta‐dataDoing research better: The role of meta‐data
Doing research better: The role of meta‐data
 
Laying the Foundation: Establishing an institutional RDM Support Service for ...
Laying the Foundation: Establishing an institutional RDM Support Service for ...Laying the Foundation: Establishing an institutional RDM Support Service for ...
Laying the Foundation: Establishing an institutional RDM Support Service for ...
 
Watching the Detectives: Using digital forensics techniques to investigate th...
Watching the Detectives: Using digital forensics techniques to investigate th...Watching the Detectives: Using digital forensics techniques to investigate th...
Watching the Detectives: Using digital forensics techniques to investigate th...
 
Introduction to digital curation
Introduction to digital curationIntroduction to digital curation
Introduction to digital curation
 
Digital Forensics in the Archive
Digital Forensics in the ArchiveDigital Forensics in the Archive
Digital Forensics in the Archive
 
Keep Calm and Curate
Keep Calm and CurateKeep Calm and Curate
Keep Calm and Curate
 
Same as it ever was? Significant Properties and the preservation of meaning o...
Same as it ever was? Significant Properties and the preservation of meaning o...Same as it ever was? Significant Properties and the preservation of meaning o...
Same as it ever was? Significant Properties and the preservation of meaning o...
 
Who Decides? Reinterpreting archival processes for the management of digital ...
Who Decides? Reinterpreting archival processes for the management of digital ...Who Decides? Reinterpreting archival processes for the management of digital ...
Who Decides? Reinterpreting archival processes for the management of digital ...
 

Dernier

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Dernier (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Preservation Planning: Choosing a suitable digital preservation strategy

  • 1. P res erva tio n P la nning : Choosing a suitable preservation approach Long-term Archiving P erspectives of E uropean Union P ublications meeting Office for Official Publications of the European Communities Luxembourg, November 10-11, 2011 Gareth Knight Centre for e-Research
  • 2. Preservation Objectives Authentic - it is what it Understandability – what does purports to be this information mean? Content preservation Bitstream preservation Priscilla Caplan's revised Preservation Pyramid
  • 3. Identity • The exact sameness of things. • Leibniz's law indicates that 2 items that share common attributes are not only similar, but are the same thing • Can two things be the same? “ultimately nothing is the s ame as something else” (Paskin, 2003) A painting of Leibniz Questions: • Both images are a pictorial representation of Leibniz • Image A is constructed using paint on a canvas • Image B is constructed as 0s and 1s • Do they share the same identity? • Is it necessary for all object attribute to be same, or is it acceptable to have some degree of granularity? • How much is identity based upon ability to measure attributes? Scanned copy of painting
  • 5. Is integrity maintained = Yes/No • Linked to notions of consistency, wholeness and truth • There has not been deliberate or accidental damage/change that has caused meaning to be altered or lost, in part or entirety. • Checksum algorithm applied to a file generates a distinct (possibly unique) alphanumeric value • Commonly used to check for accidental/deliberate data change/corruption • Generate checksum on October 1st • Generate checksum on October 14th & compare to Oct 1st value – are they the same? Y E S /N O
  • 6. Is Integrity maintained = 0- 100% If one chunk became corrupted, the hashes for other chunks, which hadn't changed, could be used to prove its integrity. P iec ew is e ha s hing : •divides an input file into sections and checksums each chunk separately. •Intended to measure integrity of disk images (dcfldd). • However, Insert or delete changes all subsequent hashes •R o lling ha s h: Looks at each point of file in semi-random order Depends only on last few bytes
  • 7. Example of Piecewise hashing (1) 19e33h213a7865b2b664348b ea3fe191227a4eg933bc41ge 2d839db2996b412e84h77a33 872e73ab867c883e7391ae65
  • 8. Example of Piecewise hashing (2) 19e33h213a7865b2b664348b SAME! ea3fe191227a4eg933bc41ge SAME! a73921e173c94e8232fa91bb DIFFERENT TEXT 7894af8211c12bb123ah9912 INCOMPLETE
  • 10. Data Interpretation in practice OAIS Reference Model NAA Performance Model = + + + data computer OS application information content
  • 11. Information Object Information Properties Some definitions: • Information P roperty/ D escription: IP • A description of part of the information content (OAIS RM v2, 2009) • P roperty: • An abstract attribute, trait or peculiarity suitable for describing preservation objects, actions or environments (Dappert, 2009) Observations: • No interpretation of significance – merely exists • May be held in different locations and different levels of detail
  • 12. Information Property categories (1) Rothenberg & Bikson (1999) identify five types of Information Property: • C ontent: the author’s intellectual work, e.g. text, still image, audio waveform, etc. • C ontext: Information that affects the content’s intended meaning and establishes its provenance • Appearance: Information that contributes to the recreation of the performance, e.g. font type/colour/size, bit depth • S tructure: Relationship between 2+ types of content, e.g. e- mail attachments, internal hyperlinks • Behaviour: information that establishes how content interacts with the user, or other objects or components, e.g. hyperlink handling http://www.panix.com/~jeffr/Prof/digilong.html
  • 13. Context Content Image & Text link Content and Context? Structure Appearance Behaviour
  • 14. Information Property categories (2) PLANETS Digital Object Properties WP use different classification based upon ability to identify: •E x tra c ta ble properties : • Properties that can be extracted from or calculated on the fly, e.g. file size, image dimensions, MD •O bs erva tiona l properties : • Can only be determined by human observation, e.g. licence restriction(?) •P erform a nc e P ro perties : • Properties that emerge through combination of HW, SW & Data Object Source: PLANETS Digital Object Properties WG
  • 15. Performance Observational Property Property Extractable information
  • 16. Preservation Metadata: Documenting the technical encoding and intellectual content
  • 17. PREMIS • "things that most working repositories are likely to need to know in order to support digital preservation“ • Core metadata that defines “viability, renderability, understandability, authenticity, and identity in a preservation context" What metadata assists with rendering? • Format • Size • Fixity • Creating Application: Name, version, date PREMIS DD 1.0 (May 2005) data was created PREMIS DD 2.0 (March 2008) • Inhibitors: Features intended to inhibit access, use, or migration.
  • 18. Technical Metadata for still images http://www.flickr.com/photos/k4chii/200303113/ Standards: Z39.87, MIX and others Information on •Image characteristics •Encoding scheme •Metadata
  • 19. Document MD  Applicable to formats that are primarily text, allow choice of font, support embedded multimedia & page layouts  Example elements  Page Count  Word Count  Character Count  Paragraph Count  Line count  Table Count  Graphics Count  Language  Fonts (list of each font in document)  Features (additional document features, e.g. hasTransparency, hasOutline, hasAnnotation)
  • 20. Third party services: Representation Information Registries •Require trusted third party services capable of identifying formats • PRONOM, UDFR •Providing information on rendering data • OpenWith, various RI services
  • 21. Preserving your object across changing technologies
  • 22. Change in process over time SOURCE PROCESS PERFORMANCE Intel PC, 2000 + + = Mac laptop, 2006 + + = X64 Ubuntu laptop, 2010 + + = operating software information hardware system application content Potential for changing to ‘Performance’ over time
  • 23. Change is a necessity… and a risk “traditionally, preserving things meant keeping them unchanged; however … if we hold on to digital information without modifications, accessing the information will become increasingly more difficult, if not impossible.” (Su-Shing Chen, 2001) “The fundamental challenge of digital preservation is to preserve the accessibility and authenticity of digital objects over time and domains, and across changing technical environments” (Wilson, 2008)
  • 25. Authenticity “the degree to which a person (or system) may regard an object as what it is purported to be” (OAIS RM v2) Questions: •How do you distinguish the authentic original from the imitators? •What is authenticity in the digital realm? Which is the real Elvis? Img src: http://www.flickr.com/photos/mymollypop/2904798835/ http://www.flickr.com/photos/blahflowers/3827096787/ © 1973, Elvis Presley Enterprises, Inc. and RCA Records http://en.wikipedia.org/wiki/File:ElvisPresleyAlohafromHawaii.jpg
  • 26. What do we need to keep for information Object to be authentic? “Understanding, defining and assessing the individual properties… important.. for informing decisions about which characteristics of that object should be preserved over time, in circumstances where it is not possible, for reasons such as cost, practicality or technical constraints, to preserve all the elements of that object” (Montague et al. The Concept of Significant Properties. 2010) “Unless such properties can be defined in a rigorous and measurable manner, cultural memory institutions have no objective framework for identifying, implementing, and validating appropriate preservation strategies, nor for asserting the continued authenticity of their digital collections” (Dappert, 2009)
  • 27. Acceptable Vs Unacceptable change •Easy to identify when preservation gone wrong, but how do you decide when it goes right? • Interpretation is a value judgement – often influenced by different criteria • Uncertainty on level that evaluation should be performed – technical encoding, object type (e.g. still image), object sub-type (e.g. business document, research paper) • How do you measure attributes that are considered significant? • Technical properties may vary between formats • Observational properties require manual identification
  • 28. Planning your strategy; strategising your plan • P res erva tio n P la n: defines a series of preservation actions to be taken by a responsible institution due to an identified risk for a given set of digital objects or records” http://www.dlib.org/dlib/november09/kulovits/11kulovits.html • P res erva tio n s tra teg y indicates commitment to preservation and high-level approach adopted – organisational mission, applied principles (e.g. use lifecycle approach), sequence of actions (immediate, medium term, long-term), risk management
  • 29. Why develop a preservation plan? Assists decision-making process • Evaluate different strategies • Evaluate different tools Determine which is the most effective approach for your needs • Transparency of operation – enable others to view and understand approach adopted – inspire confidence and trust • Provide evidence of decision-making – decisions may be questioned. How do you prove that approach taken was appropriate for circumstances?
  • 30. Evaluation frameworks Various approaches may be adopted to develop preservation plan: •Produce internal decision tree • Fit intrinsic needs of organisation, but requires staff time to develop & may be limiting when considering new approaches •Perform informal “bottom-up” object analysis & develop bespoke plan • Fit requirements of object type, but may be time intensive to produce & may be incompatible with broader policies •Adopt 3rd party standardised plan (aka copy and paste) • Adopting existing plan saves time, but may be inappropriate for context •Use analysis frameworks and toolkits • Structured process by which organisation can identify objectives & develop plan to address them • DRAMBORA/DIRKS – analyse environment & practices, identify risks and brainstorm methods of mitigating or avoiding them • Data Asset Framework – identify data held, assess management practices & make recommendations for improvement • PLANETS Preservation Planning –define requirements, evaluate alternative approaches, analyse and compare results, recommend preferred approach, and develop plan
  • 31. Preservation Planning workflow •Developed as part of DELOS project & adopted by PLANETS Consortium •Conforms to the ‘General COTS (Commercial-Off-The-Shelf) selection process (GCS) •Abstract steps: Define criteria, Search for products, Create shortlist, Evaluate candidates, Analyze data & Select product •Uses utility analysis approach
  • 32. PLANETS Planning workflow http://olymp.ifs.tuwien.ac.at:8080/plato/
  • 33. Define Requirements: Factors to consider •Identify & analyse environment in which decisions are made (e.g. assumptions & constraints) to determine context: • Organisational/dept objectives (e.g. mission statement, mandate) • National/local policy framework (e.g. acquisition, legal framework) • Codes of practice • Financial limitations – what can you afford? • Object types to be maintained • Expertise & needs of key stakeholders, e.g. Designated Community
  • 34. Whose views do you need to take into account? D ig ita l a rc hive pers pec tive • General trend to simplify object to make it (speculatively) easier to manage in future: • Reduce cost of preservation process • Limit risk that accessibility/preservation issues will emerge • Increase number of preservation options available C rea to r pers pec tive • Author intent difficult to establish • Differs for each object – do you seek to treat each object individually or identify broad classes? • When do you ask them? On creation, after 5 years? May have different views on value. U s er pers pec tive • How do you analyse interpretation of current user community? • How do you predict needs of future users?
  • 35. InSPECT Requirements Analysis Framework (2008) • Adopted a design method used to assist engineers & designers to create & re-design artefacts • Based upon theory that artefact construction is a product of designated function(s) • Assessment upon two philosophical approaches: 1. Teleology: study of design and purpose of object – why was it created? 2. Epistemology: Understand meaning and process by which knowledge is acquired • In combination, these encourage evaluation of context of creation and information needed to communicate intrinsic knowledge to a new audience (designated community)
  • 36. Requirements Analysis activities S tep 1: O bjec t A na lys is Interpret context of creation: 1. Analyse object to find out what it contains 2. Identify original audience and functions that object was created to perform 3. Determine info. properties necessary to achieve each function S tep 2: S ta k eholder A na lys is Determine future requirements of digital object 1. Identify Stakeholders that will use object 2. Determine function set they may perform when using object 3. Identify quality thresholds for each information property that must be met to allow each function to be achieved – what is acceptable loss?
  • 37. Define Requirements: PLANETS Requirement Categories • Produce list of criteria that will be used to evaluate diff. preservation strategies in specific domain • May take top-down (organisational) or top-down (object) approach • PLANETS identify four groups of characteristic to be evaluated: 1. O bject: Attributes of information content itself, e.g. behaviour, context 2. R ecord: Attributes of record including context, relationships & MD - potential overlap with Obj in some cases 3. P rocess : Attributes of preservation process, e.g. processing speed, usability of tool, ability to batch process, etc. 4. C os t: Set-up of process, cost per object, H/W & S/W, personnel • Non-prescriptive - evaluator may identify further top-level & sub- categories or ignore existing criteria (e.g. technical characteristics for format evaluation) • May be expressed as spreadsheet, list, mind-map, post-it notes & other forms
  • 38. Record requirements as Evaluation Tree •Set of requirements may be expressed as mind map, spreadsheet, or other form •Define structure of evaluation process, grouping similar items together •Assign a measurement value to each ‘leaf’ • Objective measure: E.g. colour depth, duration • Subjective measure: Acceptable variance,
  • 39. Define Requirements: Measure each criterion •Assign a measurement value to each ‘leaf’ •Objective measures: • Unambiguous, automated (possibly), E.g. seconds to process object, colour depth, cost value •Subjective measures: • Acceptable, but often require manual evaluation, e.g. degree of format support •Type of scale • Numeric measure (e.g. 15 bit) • Boolean (Yes/No) • Controlled vocab (e.g. Yes/Acceptable/No) • Ordinal numbers (controlled list) • Subjective criteria (0-5)
  • 40. Objective tree for web sites
  • 41. Define Alternatives • On basis of object type and expressed requirements, what strategies are feasible? • Many different approaches available, e.g. TIFF images could undergo following actions: • Format conversion to JPG2k • Format conversion to PNG (to save space) • Format conversion to PDF (though would not recommend) • Emulation/virtual machine • Do nothing! • For each alternative strategy, may wish to define: • Tool to be tested (e.g. name, version, OS) • Configuration parameters • Function to be tested
  • 42. Trial the preservation approaches Develop a set of experiments to trial the preservation approach  Define workflow  Select representative test files  Perform evaluation  Evaluate the outcome according to your objective tree  Were there undesired/unexpected results?
  • 43. PLATO conversion tool/format comparison Definition of alternative approaches to preserve GIF image (conversion to alt. formats) and identification of tool services available to perform action
  • 44. Compare results Require common basis for comparing different strategies N o rm a lis e dis pa ra te res ults  Each evaluation factor is measured differently (Y/N, cost, speed of conversion)  Can make them comparable by converting them to a uniform scale S et I m porta nt Fa c to rs  Not all assessment criteria is equal – do you wish to prioritise specific reqs. (e.g. scalability, cost) C om pa re outc o m es & s elec t m os t a ppropria te pres erva tion s tra teg y
  • 45. Conclusions Preservation is an iterative process – must climb many steps to reach the top of the pyramid Preservation Planning enables organisation to understand and document their requirements Demonstrate decision making – inspires confidence & trust Not a perform once, forget process. Must be repeated
  • 46. Discussion points • Are traditional checksum techniques acceptable for measuring integrity, or do we need a more granular approach? • How should we utilise & build upon third party services, such as RI Registries & preservation plan tools, to achieve our preservation objectives? • What would a preservation plan for our scanned images, documents, metadata look like?
  • 47. Thank You for your attention QUESTIONS? Gareth Knight gareth.knight@kcl.ac.uk